I am using the following 2 methods. Method called DoMyWork1 does scale well like it takes 6 seconds to run three of them in 3 threads. Whereas DoMyJob method does not scale at all. If one thread takes 4 seconds then it takes 13 seconds to run 3 threads. What am I doing wrong? Does file read and/or write needs special thread handling other than thread pool?
My calling code
public static void Process(MyDelegate md , int threads)
{
int threadcount = threads;
ManualResetEvent[] doneEvents = new ManualResetEvent[threadcount];
DateTime dtstart = DateTime.Now;
List<string> myfiles = GetMyFiles(@"c:\");
for (int i = 0; i < threadcount; i++)
{
doneEvents[i] = new ManualResetEvent(false);
MyState ms = new MyState();
ms.ThreadIndex = i;
ms.EventDone = doneEvents[i];
ms.files = myfiles;
ThreadPool.QueueUserWorkItem(md.Invoke, ms);
}
WaitHandle.WaitAll(doneEvents);
DateTime dtend = DateTime.Now;
TimeSpan ts = dtend - dtstart;
Console.WriteLine("All complete in {0} seconds.", ts.ToString());
Console.ReadLine();
}
public static void DoMyWork1(Object threadContext)
{
MyState st = (MyState)threadContext;
Console.WriteLine("thread {0} started...", st.ThreadIndex);
Thread.Sleep(5000);
Console.WriteLine("thread {0} finished...", st.ThreadIndex);
st.EventDone.Set();
}
private static void DoMyJob(MyState st)
{
Console.WriteLine("I am in thread {0} started...", st.ThreadIndex);
string[] mystrings = new string[] { "one", "two", "three" };
foreach (string s in mystrings)
{
foreach (string file in st.files)
{
if (!(new StreamReader(file).ReadToEnd().Contains(s)))
{
AppendToFile(String.Format("{0} word searching in file {1} in thread {2}", s, file, st.ThreadIndex));
}
}
}
Console.WriteLine("I am in thread {0} ended...", st.ThreadIndex);
}
Best Solution
Threads can improve program perf only if the program is starved for CPU resources. That's not the case for your program, it should be readily visible from the Taskmgr.exe Performance tab. The slow resource here is your hard disk, or the network card. The ReadToEnd() call is glacially slow, waiting for the disk to retrieve the file data. Anything else you do with the file data is easily 3 orders of magnitude faster than that.
The threads will just wait in turn for the disk data. In fact, there's a good chance that the threads actually make your program run a lot slower. They will cause the disk drive head to jump back-and-forth between disjoints tracks on the disk since each thread is working with a different file. The one thing that is really slow is causing the head to seek to another track. Typically around 10 msec for a fast disk. Equivalent to about half a million CPU instructions.
You can't make your program run faster unless you get a faster disk. SSDs are nice. Beware of effects of the file system cache, the second time you run your program it will run very fast when the file data is retrieved from the cache instead of the disk. This will rarely happen in a production environment.