Hi I had to analyze network packets because of an application that crashed quite a lot. So I stumbled upon TCP retransmissions, and did a simple setup to start 'troubleshooting'. I connected two Laptops running SystemRescueLinux two the same switch (Gigabit Switch, Cat6 cables) and opened an SSH connection from one machine to the other, running 'tcpdump -i <netif> -s65535 -w file.pcap' on both client and server. I was analyzing the packets just before the TCP retransmission happens in both the PCAPs, and if I did not make any mistake, the server sends an invalid Sequencenumber causing the retransmission... ClientIP: 10.41.1.87 ServerIP: 10.41.1.88 Client side looks like: Server side looks like: Client-Side starting with packet no 1890: So,...I'm at a loss here. Is this the network driver? The NIC? The...I don't know? Did I miss something? I suppose it's definitely not the switch in between, at least that's what I'm trying to find out,...but what is it then? Kind regards EDIT Ok, as requested, here's the PCAP of the TCP stream: asked 12 Nov '15, 04:37 esc4rg0t edited 13 Nov '15, 00:02 |
2 Answers:
I don't think the packet size or offloading is the cause, at least not the direct one. The packet with The same situation repeats for the immediately following "jumbo packet" with But there is a significant difference between these two "jumbo packets": the second one, which causes the retransmission, carries the PSH (push) flag in it, which is logically only available in the last packet of the series. So I would vote for some issue in buffer handling (or the ssh client maybe?): when the receiving side (client) gets the Push command, it should send the buffer contents to the application immediately, and in some cases something goes wrong and it takes too long so the stack cannot send the backward packet with ACK on time. A similar situation can be seen with frames 1651, 1652 at client side and 1742, 1743 at server side, and at least one more time (use display filter answered 13 Nov '15, 06:00 sindy Can u tell how to send last packet without push flag ??? what is the best solution ???? (18 Nov '15, 06:49) srinu_bel Not the right place to ask, I'd ask that at Stackoverflow programmers' Q&A, because it is an application and/or driver related question, not wireshark or network traffic analysis one. (18 Nov '15, 06:55) sindy e-mail id please ??? (18 Nov '15, 07:06) srinu_bel |
Can u limit packet size to 1448 Bytes in place of 24616 / 22212 answered 13 Nov '15, 02:28 srinu_bel Ahm...what exactly do you mean by that? Do I have to make a new trace, or...? As I put the pcap files for download now (see first post)... (13 Nov '15, 02:32) esc4rg0t Application server is sending very big TCP segments, Is it possible to limit the size of TCP segment to 1448 insted of 24616 / 22212 ??? (13 Nov '15, 02:36) srinu_bel Application server is sending very big TCP segments, Is it possible to limit the size of TCP segment to 1448 insted of 24616 / 22212 ??? from your application and then take fresh capture (13 Nov '15, 02:39) srinu_bel I will have to check that, as I just default booted into a Linux live image (SysRescueCD)... (13 Nov '15, 02:44) esc4rg0t Don't know if it is a good idea to disable TCP offloading at that moment. Seems to that it is main part of the question maybe with SACK or DSACK. More precise answer could be achieved with sharing us a capture. Like Pavel and Jasper have mentioned earlier. (13 Nov '15, 02:53) Christian_R Please see original post, I already shared the captures. Or here again: http://ianfe.dyndns.org/server_tcp.pcap http://ianfe.dyndns.org/client_tcp.pcap Right, I guess that offloading is active, so the NIC segments everything. I could use a tap, but would need to buy one first ;-) (13 Nov '15, 02:55) esc4rg0t If you want to use the tap only to confirm that the server's NIC segments the packets autonomously, maybe it would be enough to disable tcp offloading only at the receiving machine? (13 Nov '15, 03:41) sindy yes, maybe I will do another dump today with tcp segmentation disabled...if I find the time :-/ But maybe somebody already got an idea with the current dump. I just thought it through again and again,...NIC DOES tcp offloading, cool. So one of these "split" packages could be lost...but then I would no calculate the exact same and perfectly right! sequence number on client and server side dump, would I? Which differs from what the server sent along as a sequence number... (13 Nov '15, 03:48) esc4rg0t Okey, I disabled the offloading on both client AND server, no retransmissions anymore. Whatever that means... (13 Nov '15, 05:56) esc4rg0t So if that's the case (what you described above), I would not have to worry about my network intfrastructure, and TCP retransmissions from time to time seem to be quite normal,...because of things like what's happening here. (13 Nov '15, 06:13) esc4rg0t yes, stop worrying about your network and start worrying about the task scheduler of your OS :-D But for me it is more interesting that the tcp offloading affects the behaviour... maybe it does so indirectly and the application (ssh client in this case) has problems to handle large buffers, not expecting to get 10 pages of text in a single update? Also, this effect, if it exists, might explain your initial issue:
Or you've already found some other reason of that issue and the analysis of retransmissions was just a spin-off project? (13 Nov '15, 07:35) sindy 1) Yes U r right 100% sindy.... 2) To overcome the problem of buffers.... i am tying to send smaller PDUs.. Which reduces the burden on tcp stack... Regards (13 Nov '15, 18:29) srinu_bel Sindy: Well, I came to the conclusion that the application that crashes seems to be crap, yes... But apart from that, I will put more effort into analyzing this behaviour, try other software, other Linux builds and so on. Right now I got an indication that it might be SSH, as I have similar behaviour when connecting to a pfSense box over SSH from the same client... (15 Nov '15, 02:02) esc4rg0t showing 5 of 13 show 8 more comments |
Hi,
as the captures are encrypted and coming from a lab test, would you mind placing them somewhere to the cloud and posting links to them here to allow more comfortable analysis?
BR, Pavel
Or, if paranoid, run them through a sanitization task with TraceWrangler, and tell it to cut payloads after layer 4 :)