I'm benchmarking software which executes 4x faster on Intel 2670QM then my serial version using all 8 of my 'logical' threads. I would like some community feedback on my perception of the benchmarking results.

When I am using 4 Threads on 4 cores I get a speed up of 4x, the entire algorithm is executed in parallell. This seems logical to me since 'Amdhals law' predicts it. Windows task manager tells me I'm using 50% of the CPU.

However if I execute the same software on all 8 threads, I get, once again a speed up of 4x and not a speed up of 8x.

If I have understood this correctly: my CPU has 4 cores with a Frequency of 2.2GHZ individually but the Frequency is divided into 1.1GHZ when applied to 8 'logical' threads and the same follows for the rest of the component such as the cache memory? If this is true then why does the task manager claim only 50% of my CPU is being used?

#define NumberOfFiles 8
char startLetter ='a';
#pragma omp parallel for shared(startLetter)
for(int f=0; f<NumberOfFiles; f++){

I am not including the time using disk I/O. I am only interested in the time a STL call takes(STL sort) not the disk I/O.

A i7-2670QM processor has 4 cores. But it can run 8 threads in parallel. This means that it only has 4 processing units (Cores) but has support in hardware to run 8 threads in parallel. This means that a maximum of four jobs run in on the Cores, if one of the jobs stall due to for example memory access another thread can very fast start executing on the free Core with very little penalty. Read more on Hyper threading. In Reality there are few scenarios where hyper threading gives a large performance gain. More modern processors handle hyper threading better than older processors.

Your benchmark showed that it was CPU bound, i.e. There was little stalls in the pipeline that would have given Hyper Threading an advantage. 50% CPU is correct has the 4 cores are working and the 4 extra are not doing anything. Turn of hyper threading in the BIOS and you will see 100% CPU.

