My current task is to optimise a Monte Carlo Simulation that calculates Capital Adequacy figures by region for a set of Obligors.
It is running about 10 x too slow for where it will need to be in production and number or daily runs required. Additionally the granularity of the result figures will need to be improved down to desk possibly book level at some stage, the code I've been given is basically a prototype which is used by business units in a semi production capacity.
The application is currently single threaded so I'll need to make it multithreaded, may look at System.Threading.ThreadPool
or the Microsoft Parallel Extensions library but I'm constrained to .NET 2 on the server at this bank so I may have to consider this guy's port, http://www.codeproject.com/KB/cs/aforge_parallel.aspx.
I am trying my best to get them to upgrade to .NET 3.5 SP1 but it's a major exercise in an organisation of this size and might not be possible in my contract time frames.
I've profiled the application using the trial of dotTrace (http://www.jetbrains.com/profiler). What other good profilers exist? Free ones?
A lot of the execution time is spent generating uniform random numbers and then translating this to a normally distributed random number. They are using a C# Mersenne twister implementation. I am not sure where they got it or if it's the best way to go about this (or best implementation) to generate the uniform random numbers. Then this is translated to a normally distributed version for use in the calculation (I haven't delved into the translation code yet).
Also what is the experience using the following?

http://www.qlnet.org (C# port of quantlib) or
Any alternatives you know of? I'm a C# developer so would prefer C#, but a wrapper to C++ shouldn't be a problem, should it?
Maybe even faster leveraging the C++ implementations. I am thinking some of these libraries will have the fastest method to directly generate normally distributed random numbers, without the translation step. Also they may have some other functions that will be helpful in the subsequent calculations.
Also the computer this is on is a quad core Opteron 275, 8 GB memory but Windows Server 2003 Enterprise 32 bit. Should I advise them to upgrade to a 64 bit OS? Any links to articles supporting this decision would really be appreciated.
Anyway, any advice and help you may have is really appreciated.
Best Solution
I have found the Mersenne Twister to be quick. The problem may be in the algorithm (BoxMuller) to transform the uniform distrubution to Gaussian distribution. The standard algorithm looks like:
Where x1 and x2 are uniform random numbers and y1 and y2 are the gaussian distribution outputs.
The square roots are slow, but the trig is worse, and it is unstable close to 0. Taygeta's page on the subject gives a faster one (in pseudocode):
If they're not using something like this, you may be able to speed things up quite a bit by avoiding the trig functions or even pregenerating the random numbers.