SI
= Source Index
DI
= Destination Index
As others have indicated, they have special uses with the string instructions. For real mode programming, the ES
segment register must be used with DI
and DS
with SI
as in
movsb es:di, ds:si
SI and DI can also be used as general purpose index registers. For example, the C
source code
srcp [srcidx++] = argv [j];
compiles into
8B550C mov edx,[ebp+0C]
8B0C9A mov ecx,[edx+4*ebx]
894CBDAC mov [ebp+4*edi-54],ecx
47 inc edi
where ebp+12
contains argv
, ebx
is j
, and edi
has srcidx
. Notice the third instruction uses edi
mulitplied by 4 and adds ebp
offset by 0x54 (the location of srcp
); brackets around the address indicate indirection.
Though I can't remember where I saw it, but
this confirms most of it, and
this (slide 17) others:
AX
= accumulator
DX
= double word accumulator
CX
= counter
BX
= base register
They look like general purpose registers, but there are a number of instructions which (unexpectedly?) use one of them—but which one?—implicitly.
One significant difference between LEA
and ADD
on x86 CPUs is the execution unit which actually performs the instruction. Modern x86 CPUs are superscalar and have multiple execution units that operate in parallel, with the pipeline feeding them somewhat like round-robin (bar stalls). Thing is, LEA
is processed by (one of) the unit(s) dealing with addressing (which happens at an early stage in the pipeline), while ADD
goes to the ALU(s) (arithmetic / logical unit), and late in the pipeline. That means a superscalar x86 CPU can concurrently execute a LEA
and an arithmetic/logical instruction.
The fact that LEA
goes through the address generation logic instead of the arithmetic units is also the reason why it used to be called "zero-clocks"; it takes no time to execute because address generation has already happened by the time it would be / is executed.
It's not free, since address generation is a step in the execution pipeline, but it's got no execution overhead. And it doesn't occupy a slot in the ALU pipeline(s).
Edit: To clarify, LEA
is not free. Even on CPUs that do not implement it via the arithmetic unit it takes time to execute due to instruction decode / dispatch / retire and/or other pipeline stages that all instructions go through. The time taken to do LEA
just occurs in a different stage of the pipeline for CPUs that implement it via address generation.
Best Answer
As others have pointed out, LEA (load effective address) is often used as a "trick" to do certain computations, but that's not its primary purpose. The x86 instruction set was designed to support high-level languages like Pascal and C, where arrays—especially arrays of ints or small structs—are common. Consider, for example, a struct representing (x, y) coordinates:
Now imagine a statement like:
where
points[]
is an array ofPoint
. Assuming the base of the array is already inEBX
, and variablei
is inEAX
, andxcoord
andycoord
are each 32 bits (soycoord
is at offset 4 bytes in the struct), this statement can be compiled to:which will land
y
inEDX
. The scale factor of 8 is because eachPoint
is 8 bytes in size. Now consider the same expression used with the "address of" operator &:In this case, you don't want the value of
ycoord
, but its address. That's whereLEA
(load effective address) comes in. Instead of aMOV
, the compiler can generatewhich will load the address in
ESI
.