Python – Why would TCP packets consistently fail to arrive at the server when the client happily sent them

cpythonsocketstcptcpdump

I've got a simple client server set up where it seems like TCP packets I'm sending from the client are not arriving at the server.

Normally everything works fine, but when I spin up 50 threads on the client to hit the server "simultaneously" with the same small data packet (which is only 39 bytes), a random number of times the server is not receiving all bytes. Even stranger, is that it is very consistent in how it doesn't receive them… only 5 bytes are received.

I'm using tcpdump and tcpflow to capture what is going on at both ends (if not familiar with tcp flow, it removes the massive amount of TCP SYN/ACK/FIN/etc noise from the TCP stream and just shows you data sent in either direction).

On the client side, for 50 threads firing off the 39 byte packet, it looks perfect. Specifically, tcpflow (which uses libpcap) shows me 50 identical data transfers:

07 B6 00 01 | 00 1E 00 00 | <etc>

As I understand it, libpcap/tcpdump get data from a pretty low level (below the TCP stack) so I take this to mean that the data was sent ok, or at least was not stuck in the kernel buffers.

However, when looking at the server side, all is not perfect. A random number are failing, and it is a high percentage. For example, out of the 50 socket connections, 30 will work fine, but for 20 of them I have a protocol failure where the server's socket.recv times out waiting for bytes (the protocol indicates exact packet length).

It is VERY consistent in how it fails. For the 30/20 case, 30 of the sockets perfectly receive the transmitted 39 bytes. The remaining 20 ALL receive this partial data, after which my socket.recv times out:

07 B6 00 01 | 00

Only 5 bytes are arriving for each of the 20 connections, and it seems to be at the kernel level since tcpdump is only showing 5 bytes arriving as well.

How can this happen?

This 5-byte boundary is not 100% coincidence. It is the first part of a header, and the 34 byte payload comes next, but is not arriving. On the client side it is split like this.

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
sock.sendall(HEADER)  # 5 bytes
sock.sendall(PAYLOAD) #34 bytes

and both sock.sendall calls complete successfully in every thread, as is proven my the tcp logging shows that all 50 runs send 39 bytes "out the door" perfectly.

Any ideas on the root cause of this? What am I missing?

Best Answer

Answering my own question...

The short answer is that, with TCP alone, the client has no way of knowing whether the intended recipient has actually received the bytes sent.

ie: It doesn't matter whether the client "happily" sent the bytes... even with TCP they may never arrive and you definitely have no knowledge as to when they will get to the intended recipient. Not without building in some acknowledgement into the application layer, anyway.

For my particular case, it turns out that the bytes the client sent DID actually arrive at the server, but took ~ 30s (!!!) to arrive, by which time both client and server application protocol code had timed out.

Views of the client and server side logs (for one failed connection) are here:

Those images are wireshark views of one particular TCP stream from the tcpdump capture files. You can see that there were a whole lot of re-transmissions occurring. What was the root cause driving the need for these re-transmissions? I have absolutely no idea (but would love to know!).

The data arrived at the server in the 2nd last entry (#974), ~30s after it was sent, and there were a large number of re-transmission attempts in between. If curious about server-side #793, this is an attempt by my application-layer protocol to send a message back to the client saying "timed out waiting for more data... where is it?".

In addition to inherent delays, one of the reasons the data was not appearing in the tcpdump logs at the server also seems to be my usage of tcpdump. In short: make sure to Ctrl-C out of the tcpdump capture before looking at the capture file (that created with the -w switch), as it seems to make a big difference as to what you see in the file. I expect this is a flush/sync'ing issue, but am guessing. However, without Ctrl-C I was definitely missing data.

More detail for future reference...

Although you often read/hear that the TCP will:

  1. Guarantee that your packets will arrive (vs UDP, which doesn't)
  2. Guarantee that your packets will arrive in order

it is apparent/obvious that the first is not actually true at all. TCP will do it's best to get your bytes to the intended recipient (including retrying for a LONG time), but this is not a guarantee, whether or not the send man page indicates for the send return value that "On success, these calls return the number of characters sent". The latter is not true and is highly misleading (see below).

The root of this comes mostly from the way that the various socket calls (send in particular) behave and how they interacts with the TCP/IP stack of the operating system...

On the sending side of a TCP exchange, the progression is quite simple. First you connect() and then you send().

connect() returning successfully definitely means that you were able to establish a connection to the server, so you at least know that at this time the server was there and listening (ie: the 3-part TCP opening handshake was successful).

For 'send`, although the documentation for the call indicates that the return value (if positive) is the "number of [bytes] sent", this is just plain wrong. All that the return value tells you is the number of bytes that the TCP stack in your underlying OS accepted into its outgoing buffer. After this point, the OS will try its best to deliver those bytes to the recipient that you initially made a connection with. But this may never happen, so it does not mean you can count on those bytes being sent! Somewhat surprisingly, there is also no real way to even determine whether this did (or did not!) happen, at least at the TCP socket layer, even though TCP has built in ACK messages. To verify full receipt of your sent bytes, you need to add some sort of acknowledgement at the application layer. nos has a great answer in another question that talks a bit about this.

Addendum...

One interesting dilemma I'm left with here is whether or not I need to build in some retry capability into my application-layer protocol. Currently it seems like, in the event of a timeout waiting for data at the server, it would be beneficial to close the connection and open a new one with the same request. It seems this way because the low level TCP retries were not successful, but in the mean time there were other client-side threads that were getting through in good time. This feels horribly wrong, though... you would think that the TCP retries should be sufficient. But they weren't. I need to look into the root cause of the TCP issues to resolve this.

Related Topic