It's not really clear whether this is a networked application or not. If it's networked then you can simply scale the stress test by stealing everyone's desktop over the weekend to run the stress test. This may be the easiest way to scale the test if it's just a few ad-hoc tests.
However, it does sound like there could be some simple improvements. If this is meant to be a long running stress test, instead of creating a new thread for every request, you can create a pool of threads to work from (or even easier, use the thread pool, which will scale automatically). So you would define a test to be say 2000 users, and spin up 2000 threads that hammer the server. Each thread would essentially be in a loop that does the test, and repeats.
Another item that isn't clear is whether all you're threads are trying to share a single file. One way to make this less of a bottleneck would be to keep the information in memory until the program is shutting down. Or spin up a writer thread, that is responsible for the file write, and all you're other threads give it information. If IO does get backed up, you're writer thread will simply hold in memory until IO is available, and you're worker threads can continue to hammer the server in the mean-time. Just keep in mind, that due to the thread synchronization involved, this may not scale well, so you may want to buffer some entries in the worker thread and only synchronize to the file writer thread once every 100 requests. I don't think this will be much of an issue since it doesn't sound like you're tracking anything more than response times.
Edit: Based on comment
I would suggest trying to use a single thread to manager you're IO operations in this case. All of you're worker threads would instead of writing to file, create an object with whatever the details are, and pass it to a queue to be written to file. To cut down on lock / unlocks, use a queue within the worker thread as well, and only sync every so often. Make sure you do lock when you're exchanging the info in the thread. Also, i'd maybe watch the memory usage since this will allow anything pending to build up in memory. If this is still causing you're io to block, i'd look at either writing less, or maybe tuning or adding a faster hard disk.
Spoofing IP addresses is doable on a Local Area Network, but practically impossible over a Wide Area Network (ie: the internet). Too many routers will intercept the TCP packets and recognize that something is fishy with them.
If you need a raw number of unique IPs, you might be better off using a load testing tool that supports a broad number of IPs. These days the lowest cost way to do that is to look to the cloud. You can roll your own solution (200 IPs from Amazon costs less than $20/hour) or you can use a load testing service such as BrowserMob (of which I am the founder, so I'm clearly biased) :)