Memory bandwidth test on Nvidia GPU’s

cudanvidia

I tried to use the code posted by nvidia and do a memory bandwidth test but i got some surprising results

Program used is here : https://developer.nvidia.com/content/how-optimize-data-transfers-cuda-cc

On a Desktop (with MacOS)

Device: GeForce GT 650M
Transfer size (MB): 16

Pageable transfers
Host to Device bandwidth (GB/s): 4.053219
Device to Host bandwidth (GB/s): 5.707841

Pinned transfers
Host to Device bandwidth (GB/s): 6.346621
Device to Host bandwidth (GB/s): 6.493052

On a Linux server :

Device: Tesla K20c
Transfer size (MB): 16

Pageable transfers
Host to Device bandwidth (GB/s): 1.482011
Device to Host bandwidth (GB/s): 1.621912

Pinned transfers
Host to Device bandwidth (GB/s): 1.480442
Device to Host bandwidth (GB/s): 1.667752

BTW i do not have the root privilege..

I am not sure why its less on the tesla device.. Can anyone point out what would be the reason ?

Best Solution

It is most likely that the GPU in your server isn't in a 16 lane PCI express slot. I would expect a PCI-e v2.0 device like the K20C to be able to achieve between 4.5-5.5Gb/s peak throughput on a reasonably specified modern server (probably about 6Gb/s on a desktop system an integrated PCI-e controller). Your results look like you are hosting the GPU in a 16x slot with only 8 or even 4 active lanes.

There can be also other factors at work, like CPU-IOH affinity, which can increase the number of "hops" between the PCI-e bus hosting the GPU and the processor and its memory running the test). But providing further analysis would require more details about the configuration and hardware of the server, which is really beyond the scope of StackOverflow.

Related Question