I am running a file download test on a Linux board (i.e. Linux board is downloading file from server) and in the resulting PCAP from the board I'm getting negative deltas between consecutive DL packets. For example, packet number 1 is of higher TCP_SEQ than packet number 2, but time delta between them is negative. This happens a lot within the PCAP (see example png, problem starting at packet number 4861). Observations from analyzing the PCAP are:
Trying to understand what's going on, my (not at all firm) conclusions are:
Note that the Linux board is running Debian, on an 1 GHz dual core CPU. There is only 1 interface receiving the packets, so there is no mix up among multiple interfaces. asked 16 May '17, 03:48 dkomna showing 5 of 12 show 7 more comments |
Can you share a capture in a publicly accessible spot, e.g. CloudShark?
Done: https://www.cloudshark.org/captures/3c7193d37d09
looks like its set to private, can you make it public?
My apologies... Should be fixed now.
How was the capture taken? I'm asking because it's a Linux cooked capture, which often indicates that it was taken on more than one interface at the same time, and that also often leads to measurement errors when it comes to timestamps (or, in other words, negative delta times).
Tcpdump has been configured for ANY interface. However, only one interface is enabled at the time of the test. Nevertheless, negative timestamping as a standalone finding is not my main concern: my main concern is that the packets seem to arrive in-sequence when ordered by time (it would be too much of a coincidence for this to happen with "wrong" timestamps), but TCP is messed up. If I have to describe this naively, I can't understand "how can it be that network delivers packets to modem/kernel in-sequence, but these packets reach TCP layer out-of-order". Of course, I may completely off with this interpretation...
I don't see TCP being messed up. So far this looks like a capture problem to me.
If I use reordercap to rewrite the pcap with frames ordered by absolute time, everything is fine. No complains from the Wireshark TCP expert anymore except for a few duplicate ACKs.
Also, the TCP stack has no problem with packets arriving out-of-order, but I don't think that's what's happening here. What you see are typical side effects on capturing on client or sender, and not using a dedicated capture PC on a SPAN port or TAP. That's also why your TCP checksums are broken - if that would be true, the connection wouldn't have worked at all.
I agree with the comments about side effects on capturing on client, as well as about TCP checksums.
However, TCP is messed up I think: after running reordercap ( reordered file uploaded: https://www.cloudshark.org/captures/5e740f0b4dec ), take a look at packet 4875. This DUP ACK acknowledges (within SACK block) packet 4866, which makes no sense. If everything was ok after the reordercap, this ACK should be acknowledging packet 4861. I believe that, indeed, TCP "saw" packet 4866 before 4861, that's why it replies with a DUP ACK with SACK.
Am I missing something?
Yes, you're right, the packet order was apparently mixed up in transit. The receiver gets a later packet earlier, sending DUP ACKs with increasing SACK edges until the out-of-order segments arrive. You don't see the out-of-order being marked in Wireshark because the mixup happens after you've capture the packets (so it's further away towards the receiver).
This is nothing unusual, but it can be confusing not to see the packets out-of-order. If you would have captured this on the other end, you'd have seen them arrive out-of-order.
Thank you Jasper. However, I still do not understand how this happens. This PCAP is captured on the receiver. Does this mean that the packets are received in-sequence by the modem, but delivered upwards to TCP layer out-of-order? How can this be, given the fact that only one interface is involved? Who is responsible internally in the stack for this mix-up? I'm looking for the explanation on what seems to me to be an effect of the way that Linux handles incoming packets. Does dual core have something to do with this?
Right, I forgot to check who is who. Very peculiar... Normally I would expect the packets to arrive out-of-order for the ACK behavior shown. I'm not sure if dual core processing is responsible, but it could be if the incoming packets are distributed across cores and one is working slower than the other.
Looking at the packet timings it's all very close to each other, so maybe there is some sort of "race condition" in the TCP stack when this happens. I haven't seen something like this before I have to admit, so thanks for staying persistent when I thought it was just something out of the ordinary :-)
What I would do is try to replicate that behavior with a dedicated capture device to exonerate the local capture setup as a problem and see what's really going on on the wire. It's quite possible you'll see the exact same thing though, which would mean that the cause is somewhere in the TCP stack.
Thanks once again Jasper, I'll try to do what you suggest.