I host a couple of sites dealing with many file uploads/downloads. Every now and then I see failures in my log files, for reasons I only wish I knew. My visitors are mostly anonymous to me, but today I managed to capture one failed download with tcpdump and I'm asking for help understanding it better. The client (67.189.6.55) was a modern version of Firefox, downloads list showed the files as "Failed". The capture is available at http://www.cloudshark.org/captures/238c31c13426 What conclusions can be drawn from how the attached stream ends?
In my Tomcat logs this manifests itself as
asked 04 Jan '14, 18:45 llehtinen edited 04 Jan '14, 18:59 |
2 Answers:
This looks like a client problem. For some reason it sends ACK packets only a couple of times, mostly packet 16 for packets up to packet 13, and then it gets strange - because in packet 24 it acknowledges packet 15, and does the same ACK again in packet 30. After that in packet 31 a Reset is sent. Clearly the client seems to be confused :-) answered 06 Jan '14, 03:07 Jasper ♦♦ |
(Not sure if I was correct, just for your reference. Looking forward experts' input later) Your web server was doing its job well on file transfer until the client reset the TCP connection. Nothing wrong on tcp side as I can see, it must be a purely application level issue. Live application debugging would be needed to figure out the reason of TCP reset. answered 05 Jan '14, 00:44 SteveZhou edited 05 Jan '14, 03:29 I have to overturn my previous conclusion. This is definitely a server side problem! Apparently, this client is just one of the example to the issue log found on your server, isn't it? It cannot be every client was having problem. I reviewed the trace again and found that #17's tcp length = 1456 which is bigger than the negotiated value 1460 during handshaking. And after that, #24 and #30 only acked #15 as Jasper mentioned. There must be some problem after receiving the weird length packet on the client, and this caused the client send out an reset to end the connection. You should take a look at the web server to figure out why it would send out weird length tcp segment. Maybe a NIC card driver issue. (06 Jan '14, 08:23) SteveZhou |
Jasper, I noticed #15 with psh bit set, it should mean like "hey, 67. guy, please took off all of those bytes, give me an Ack, and i will give you next chunks of data". But #16 only acks #13 rather than 15, under such condition, is 148. allowed to send additional bytes? Is it a standard behavior? Maybe I mis-undertood the psh bit here.
Could you help me to understand psh bit more precisely?
I referred to rfc1122 page 83, which states that the discussion in rfc-793 erroneously implies that a received PSH flag must be passed to the app layer. Passing a received PSH flag to the app layer is now OPTIONAL.
The PSH bit is a notification to process the segment immediately when sending and receiving it, which means that no "waiting for more segments" mechanism is allowed (well, let's say "waiting/buffering SHOULD not happen"). It does not mean that there may not be more data sent afterwards, it only means "process this right away".
A PSH bit also means that the data should be passed towards the application layer on the receiving side immediately (as you quoted), but it may not mean that the TCP stack has to acknowledge it right away. RFCs are one thing, reality is often something else :-)