Hello... We are having a "latency" problem with a particular application, and it raises some questions about Fast Retransmissions and Congestion Window. Here is a typical scenario... Server 'A' is sending a 10MB file to server 'B'. During the transfer, there is a small amount of packet loss (4 packets dropped out of 1700 sent), but the impact is large: it increases the overall file transfer time from ~0.3 seconds to ~1.7 seconds. My questions have to do with how the sender retransmits the missing packets. It seems to take much longer than necessary. Specifically... 1) 'A' sends about 400 packets before the first of the packet drops occur. Up to this point, the ACK's from 'B' are keeping up with the Sequence Numbers from 'A'. 2) 'A' continues to send more packets, while 'B' is now sending duplicate ACK's, looking for the dropped packet. While in this state, 3 more packets are dropped. 3) After 'A' sends several more packets, it reacts to the duplicate ACK's by doing a Fast Retransmission of the 1st dropped packet. At this point, things seem to be working as they should, and ~290ms have elapsed from the start of the file transfer. 4) After the Fast Retransmission, 'B' sends an ACK that reflects that it has received the first of the dropped packets, but now is looking for the 2nd of the dropped packets. 5) HERE IS WHERE THE SLOWDOWN BEGINS. 'A' stops sending packets at this point, and 'B' stops sending ACK's. 6) AFTER 200ms, 'A' Retransmits the 2nd of the dropped packets. 'B' responds by sending an ACK showing it has gotten the 2nd dropped packet, and is now looking for the 3rd dropped packet. 7) Once again, 'A' stops sending packets, and 'B' stops sending ACK's. 8) AFTER 400ms, 'A' Retransmits the 3rd of the dropped packets. 'B' responds by sending an ACK showing it has gotten the 3rd dropped packet, and is now looking for the 4th dropped packet. 9) Once again, 'A' stops sending packets, and 'B' stops sending ACK's. 10) AFTER 800ms, 'A' Retransmits the 4th of the dropped packets. 'B' responds by sending an ACK showing it has gotten the 4th dropped packet. 11) From this point on, 'A' resumes sending packets without delay, and 'B' keeps up by ACK'g those new packets, and the transfer completes about 80 ms later. The delays above (200ms, 400ms, 800ms) add up to 1.4 seconds. Only the first Retransmission is a Fast Retransmission. The remaining Retransmissions apparently wait for the Retransmission Timers to expire. So here come my questions... After the Retransmissions, why does 'A' stop sending packets? I'm thinking it is because 'A's Congestion Window has shrunk to zero, due to all the Duplicate ACK's. Is that plausible? Is there any way to confirm that? And, is the "retransmission time backoff" (200ms, then 400ms, then 800ms) further evidence of a zero Congestion Window? If the "stop sending after Retransmitting" behavior is due to a zero Congestion Window, are their TCP parameter settings that we should consider changing? Perhaps a larger initial Congestion Window? (These are Red Hat Linux servers.) Well, I know I have said a mouthful. I hope one or more of you have had the patience to go through all of it, and have some sage advice :-). Thx much! feenyman99 asked 04 Sep '15, 21:53 feenyman99 showing 5 of 10 show 5 more comments |
2 Answers:
Maybe you're experiencing buffer bloat. If you have different line speeds from sender to receiver, e.g. going from 10G down to 1G, or from 1G down to 100 Mbit you may experience packet loss when the switch/router buffers fill up. Retransmissions are sent but have to "get in line" at the end of the full buffers, so they'll take a while to get through to the client. answered 10 Sep '15, 02:33 Jasper ♦♦ Sorry for the delay. Lots going on... The sender and receiver are both on the same switch, in fact, and are both on 1Gbps ports. But, thanx for the suggestion - all of this is very helpful! (17 Sep '15, 00:29) feenyman99 |
If you have access to the servers, you can look at live statistics e.g. cwnd, ssthresh ... using:
answered 12 Sep '15, 06:42 Roland Roland - Great info. This will be very telling. I have requested that this query be executed the next time we test, so we can rule in/out a zero CWND. (17 Sep '15, 00:32) feenyman99 Roland, I am using the ss command and loving it. I have a "while true" loop that issues ss every 100ms, and records the values to a file. I can see cwnd's initial value, then watch it grow, and then see it shrink when a Dupe ACK occurs. Very cool! But, I have a question (of course)... Reading RFC 5681, it seems clear to me that cwnd, as used by TCP, is measured in bytes... "...At any given time, a TCP MUST NOT send data with a sequence number higher than the sum of the highest acknowledged sequence number and the minimum of cwnd and rwnd." However, the values returned by the ss command look to be specifying the number of segments that can be sent. (It starts at 4, then increases over time to 300+). Also, the TCP Tuning articles I have seen describe setting the initial congestion window to some number of segments (not bytes). Am I correct in inferring that, although TCP calculates it as a number of bytes, we humans describe it as a number of segments, when we configure it or display it? thx, feenyman99 (22 Sep '15, 11:45) feenyman99 The ss command displays the value of CWND in segments. 1 segment = MSS bytes. The MSS size can vary so it makes more sense to use segments. (22 Sep '15, 12:22) Roland |
Hm, how about sharing a pcap on Cloudshark? It's much better to look at a capture than reading tons of descriptions that may be biased.
If you have privacy concerns use TraceWrangler and replace all IP addresses as well as cutting the payload after the TCP layer.
Jasper,
Your points are very good. But, even after anonymizing the pcap w/ TraceWrangler, it was deemed a security risk, by my mgmt, to post it on CloudShark. VERY security sensitive environment.
Can anyone else comment on my description & questions above?
OR...
What other suggestions might there be for me to securely communicate the TCP behavior I'm seeing, in this forum?
I truly value the collective Wireshar.org expertise, and would love some suggestions/thoughts from the TCP SME's?
Thx, feenyman99
Interesting - can you elaborate on what may still be seen as a security risk? If you cut TCP payloads, replace all IPs and MACs with something else there is nothing specific left unless you're doing really crazy things :-)
Are there no SACKs in your capture?
I don't work with TCP much but IIRC with SCTP (where SACKs were built into it from the beginning) the SACKs coming from 'B' would (eventually) indicate that all 4 packets were missed so 'A' could Fast Retransmit all 4 packets in a timely manner (without having to wait for the ACK for a retransmitted packet in order to find out that some later packet was also missed--and so on one packet at a time). I'm not sure if TCP SACK and FR work the same or not...
I'm not sure if your congestion window idea is correct or not; once the missed packet is retransmitted + acked I think the congestion window should open up again (admittedly the cwnd will be smaller thanks to the packet loss but I think it should still be open).
I assume B's advertised window is open during all of this?
Jeff,
Thx for the response...
SACK is NOT enabled between A and B. (I'll look into enabling it though, thx.)
"once the missed packet is retransmitted + acked I think the congestion window should open up again"...
Good point! After the Fast Retransmission and its ACK, if the sender would send just a few more packets, he would receive 3 more ACK's (triple duplicate ACK) for the next missing packet, which would trigger another Fast Retransmission. But the sender just stops sending after the Fast Retransmission, so he only gets 1 ACK, and then he waits for his retransmission timer to expire. I'm at a loss on why the sender stops sending, if it's not his Congestion Window.
In answer to your other question, during all of this, the receiver's advertised window is HUGE - just under a megabyte.
feenyman99
Jasper,
You and I agree that removing the IP's, MAC's and payloads eliminates an actual security risk. But the politics/process of getting approval to put the anonymized trace on Cloudshark makes it too time consuming.
thx,
feenyman99
As you said after fast retransmission,B sends another dup ack but A took 200 ms to retransmit.I think because A only received one dup ack it didnt do fast retransmit.It looks normal and maybe enabling SACK will solve this.
Feenyman99,
alright, that's something I do understand - those additional non-technical obstacles are hard to get out of the way. No worries; I was just wondering :-)
Cheers, Jasper
Huge receiver window again may point to buffer bloat by the way. The receiver allows the sender to "overload" the infrastructure with a flood of packets it cannot process fast enough.
It's a common misconception that, when that "overload" happens, the speed will still reach the throughput of the slowest link. It doesn't; in most cases I've seen it reaches between 10-40% of the theoretical throughput of the slowest link. This is caused by the long delays of the retransmissions making it through.
TCP Reno can not recover from multiple losses in one window without a timeout. Found this link http://ssfnet.org/Exchange/tcp/test/f14.html Just curious to figure out this behaviour.