In a partially distributed network app I'm working on in C++ on Linux, I have a message-passing abstraction which will send a buffer over the network. The buffer is sent in two steps: first a 4-byte integer containing the size is sent, and then the buffer is sent afterwards. The receiving end then receives in 2 steps as well – one call to read() to get the size, and then a second call to read in the payload. So, this involves 2 system calls to read() and 2 system calls to write().
On the localhost, I setup two test processes. Both processes send and receive messages to each other continuously in a loop. The size of each message was only about 10 bytes. For some reason, the test performed incredibly slow – about 10 messages sent/received per second. And this was on localhost, not even over a network.
If I change the code so that there is only 1 system call to write, i.e. the sending process packs the size at the head of the buffer and then only makes 1 call to write, the whole thing speeds up dramatically – about 10000 messages sent/received per second. This is an incredible difference in speed for only one less system call to write.
Is there some explanation for this?
You might be seeing the effects of the Nagle algorithm, though I'm not sure it is turned on for loopback interfaces.
If you can combine your two writes into a single one, you should always do that. No sense taking the overhead of multiple system calls if you can avoid it.