C# – Migrate a single threaded app to multi-threaded, parallel execution, monte carlo simulation

cmultithreadingparallel-processingthreadpool

I've been tasked with taking an existing single threaded monte carlo simulation and optimising it. This is a c# console app, no db access it loads data once from a csv file and writes it out at the end, so it's pretty much just CPU bound, also only uses about 50mb of memory.

I've run it through Jetbrains dotTrace profiler. Of total execution time about 30% is generating uniform random numbers, 24% translating uniform random numbers to normally distributed random numbers.

The basic algorithm is a whole lot of nested for loops, with random number calls and matrix multiplication at the centre, each iteration returns a double which is added to a results list, this list is periodically sorted and tested for some convergence criteria (at check points every 5% of total iteration count) if acceptable the program breaks out of the loops and writes the results, else it proceeds to the end.

I'd like developers to weigh in on:

  • should I use new Thread v ThreadPool
  • should I look at the Microsoft Parallels Extension library
  • should I look at AForge.Net Parallel.For, http://code.google.com/p/aforge/ any other libraries?

Some links to tutorials on the above would be most welcome as I've never written any parallel or multi-threaded code.

  • best strategies for generating en-mass normally distributed random numbers, and then consuming these. Uniform random numbers are never used in this state by the app, they are always translated to normally distributed and then consumed.
  • good fast libraries (parallel?) for random number generation
  • memory considerations as I take this parallel, how much extra will I require.

Current app takes 2 hours for 500,000 iterations, business needs this to scale to 3,000,000 iterations and be called mulitple times a day so need some heavy optimisation.

Particulary would like to hear from people who have used Microsoft Parallels Extension or AForge.Net Parallel

This needs to be productionised fairly quickly so .net 4 beta is out even though I know it has concurrency libraries baked in, we can look at migrating to .net 4 later down the track once it's released. For the moment the server has .Net 2, I've submitted for review an upgrade to .net 3.5 SP1 which my dev box has.

Thanks

Update

I've just tried the Parallel.For implementation but it comes up with some weird results.
Single threaded:

IRandomGenerator rnd = new MersenneTwister();
IDistribution dist = new DiscreteNormalDistribution(discreteNormalDistributionSize);
List<double> results = new List<double>();

for (int i = 0; i < CHECKPOINTS; i++)
{
 results.AddRange(Oblist.Simulate(rnd, dist, n));
}

To:

Parallel.For(0, CHECKPOINTS, i =>
        {
           results.AddRange(Oblist.Simulate(rnd, dist, n));
        });

Inside simulate there are many calls to rnd.nextUniform(), I think I am getting many values that are the same, is this likely to happen because this is now parallel?

Also maybe issues with the List AddRange call not being thread safe? I see this

System.Threading.Collections.BlockingCollection might be worth using, but it only has an Add method no AddRange so I'd have to look over there results and add in a thread safe manner. Any insight from someone who has used Parallel.For much appreciated. I switched to the System.Random for my calls temporarily as I was getting an exception when calling nextUniform with my Mersenne Twister implementation, perhaps it wasn't thread safe a certain array was getting an index out of bounds….

Best Answer

First you need to understand why you think that using multiple threads is an optimization - when it is, in fact, not. Using multiple threads will make your workload complete faster only if you have multiple processors, and then at most as many times faster as you have CPUs available (this is called the speed-up). The work is not "optimized" in the traditional sense of the word (i.e. the amount of work isn't reduced - in fact, with multithreading, the total amount of work typically grows because of the threading overhead).

So in designing your application, you have to find pieces of work that can be done in a parallel or overlapping fashion. It may be possible to generate random numbers in parallel (by having multiple RNGs run on different CPUs), but that would also change the results, as you get different random numbers. Another option is have generation of the random numbers on one CPU, and everything else on different CPUs. This can give you a maximum speedup of 3, as the RNG will still run sequentially, and still take 30% of the load.

So if you go for this parallelization, you end up with 3 threads: thread 1 runs the RNG, thread 2 produces normal distribution, and thread 3 does the rest of the simulation.

For this architecture, a producer-consumer architecture is most appropriate. Each thread will read its input from a queue, and produce its output into another queue. Each queue should be blocking, so if the RNG thread falls behind, the normalization thread will automatically block until new random numbers are available. For efficiency, I would pass the random numbers in array of, say, 100 (or larger) across threads, to avoid synchronizations on every random number.

For this approach, you don't need any advanced threading. Just use regular thread class, no pool, no library. The only thing that you need that is (unfortunately) not in the standard library is a blocking Queue class (the Queue class in System.Collections is no good). Codeproject provides a reasonably-looking implementation of one; there are probably others.

Related Topic