Python includes a profiler called cProfile. It not only gives the total running time, but also times each function separately, and tells you how many times each function was called, making it easy to determine where you should make optimizations.
You can call it from within your code, or from the interpreter, like this:
import cProfile
cProfile.run('foo()')
Even more usefully, you can invoke the cProfile when running a script:
python -m cProfile myscript.py
To make it even easier, I made a little batch file called 'profile.bat':
python -m cProfile %1
So all I have to do is run:
profile euler048.py
And I get this:
1007 function calls in 0.061 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.061 0.061 <string>:1(<module>)
1000 0.051 0.000 0.051 0.000 euler048.py:2(<lambda>)
1 0.005 0.005 0.061 0.061 euler048.py:2(<module>)
1 0.000 0.000 0.061 0.061 {execfile}
1 0.002 0.002 0.053 0.053 {map}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler objects}
1 0.000 0.000 0.000 0.000 {range}
1 0.003 0.003 0.003 0.003 {sum}
EDIT: Updated link to a good video resource from PyCon 2013 titled
Python Profiling
Also via YouTube.
You want the gcc
-specific noinline
attribute.
This function attribute prevents a
function from being considered for
inlining. If the function does not
have side-effects, there are
optimizations other than inlining that
causes function calls to be optimized
away, although the function call is
live. To keep such calls from being
optimized away, put
asm ("");
Use it like this:
void __attribute__ ((noinline)) foo()
{
...
}
Best Solution
If you really want to mess with the register alloation then you can force GCC to allocate local and global variables in certain registers.
You do this with a special variable declaration like this:
Works for other architectures as well, just replace EBX with a target specific register name.
For more info on this I suggest you take a look at the gcc documentation:
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Local-Reg-Vars.html
My suggestion however is not to mess with the register allocation unless you have very good reasons for it. If you allocate some registers yourself the allocator has less registers to work with and you may end up with a code that is worse than the code you started with.
If your function is that performance critical that you get 20% performance differences between compiles it may be a good idea to write that thing in inline-assembler.
EDIT: As strager pointed out the compiler is not forced to use the register for the variable. It's only forced to use the register if the variable is used at all. E.g. if the variable it does not survive an optimization pass it won't be used. Also the register can be used for other variables as well.