If you're using the windows QueryPerformanceCounter and QueryPerformanceFrequency functions on a system that supports MMX try inserting the femms instruction after querying the frequency/counter and before the computation.
__asm femms
I've encountered trouble from these function before where they were doing 64 bit computation using MMX and not clearing the floating point flags/state.
This situation could also happen if there is any 64 bit arithmetic between the floating point operations.
As others have pointed out, LEA (load effective address) is often used as a "trick" to do certain computations, but that's not its primary purpose. The x86 instruction set was designed to support high-level languages like Pascal and C, where arrays—especially arrays of ints or small structs—are common. Consider, for example, a struct representing (x, y) coordinates:
struct Point
{
int xcoord;
int ycoord;
};
Now imagine a statement like:
int y = points[i].ycoord;
where points[]
is an array of Point
. Assuming the base of the array is already in EBX
, and variable i
is in EAX
, and xcoord
and ycoord
are each 32 bits (so ycoord
is at offset 4 bytes in the struct), this statement can be compiled to:
MOV EDX, [EBX + 8*EAX + 4] ; right side is "effective address"
which will land y
in EDX
. The scale factor of 8 is because each Point
is 8 bytes in size. Now consider the same expression used with the "address of" operator &:
int *p = &points[i].ycoord;
In this case, you don't want the value of ycoord
, but its address. That's where LEA
(load effective address) comes in. Instead of a MOV
, the compiler can generate
LEA ESI, [EBX + 8*EAX + 4]
which will load the address in ESI
.
Best Answer
At least three things can go wrong here. One is the syntax of the assembler. The second is instruction set architecture. The third is the memory model (16 bit vs 32 bit, segmented vs flat). I suspect that the examples provided are targeted at 16-bit segmented architecture as the 8087 is from those ages, but c++ compilers mainly arrived after 386+ protected mode.
The 8087 FPU does not support instructions that move data between general purpose registers (GPR) and floating point stack. The rationale is that floating point registers use 32, 64 or 80 bits, while the GPRs are only 16 bit wide. Instead on moves data indirectly from memory.
The example
fld myRealVar
presupposes that a label (with a width) has been provided:Notice first that these examples assume that data belongs to a segment
.data
and that one has initialized the segment withOnly after that the memory location
0x0004
could possibly contain the constant 10. I strongly suspect that that model isn't available with your inline c++ system. Also here the assembler has to be smart enough to associate each label with the provided width and encode that in the instruction.One way to load the integer into FPU is to use the stack:
In 32-bit architecture one can directly use
esp
to point the top of stack, which is probably the case with your c++ compiler:Some inline assemblers may be able to use local variables and automatically substitute the label with ebp/esp register and the correct offset: