As others have pointed out, LEA (load effective address) is often used as a "trick" to do certain computations, but that's not its primary purpose. The x86 instruction set was designed to support high-level languages like Pascal and C, where arrays—especially arrays of ints or small structs—are common. Consider, for example, a struct representing (x, y) coordinates:
struct Point
{
int xcoord;
int ycoord;
};
Now imagine a statement like:
int y = points[i].ycoord;
where points[]
is an array of Point
. Assuming the base of the array is already in EBX
, and variable i
is in EAX
, and xcoord
and ycoord
are each 32 bits (so ycoord
is at offset 4 bytes in the struct), this statement can be compiled to:
MOV EDX, [EBX + 8*EAX + 4] ; right side is "effective address"
which will land y
in EDX
. The scale factor of 8 is because each Point
is 8 bytes in size. Now consider the same expression used with the "address of" operator &:
int *p = &points[i].ycoord;
In this case, you don't want the value of ycoord
, but its address. That's where LEA
(load effective address) comes in. Instead of a MOV
, the compiler can generate
LEA ESI, [EBX + 8*EAX + 4]
which will load the address in ESI
.
Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer.
As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you tell them you don't care about numerical accuracy. For example: the -fassociative-math
option of gcc which allows gcc to reassociate floating point operations, or even the -ffast-math
option which allows even more aggressive tradeoffs of accuracy against speed.
Best Solution
Just for some clarification. In the early microprocessor days of the 1970's, CPUs had only a small number of registers and a very limited instruction set. Typically, the arithmetic unit could only operate on a single CPU register, often referred to as the "accumulator". The accumulator on the 8 bit 8080 & Z80 processors was called "A". There were 6 other general purpose 8 bit registers: B, C, D, E, H & L. These six registers could be paired up to form 3 16 bit registers: BC, DE & HL. Internally, the accumulator was combined with the Flags register to form the AF 16 bit register.
When Intel developed the 16 bit 8086 family they wanted to be able to port 8080 code, so they kept the same basic register structure:
Because of the need to port 8 bit code they needed to be able to refer to the individual 8 bit parts of AX, BX, CX & DX. These are called AL, AH for the low & high bytes of AX and so on for BL/BH, CL/CH & DL/DH. IX & IY on the Z80 were only ever used as 16 bit pointer registers so there was no need to access the two halves of SI & DI.
When the 80386 was released in the mid 1980s they created "extended" versions of all the registers. So, AX became EAX, BX became EBX etc. There was no need to access to top 16 bits of these new extended registers, so they didn't create an EAXH pseudo register.
AMD applied the same trick when they produced the first 64 bit processors. The 64 bit version of the AX register is called RAX. So, now you have something that looks like this: