TCP SRE keeps increasing

Question

Hi there, I'm running wireshark to try and sort out what is going wrong with my tcp connections to my server. It runs fine for the first while and loads a few small files but for a big file I think it eventually drops a packet. Which should be fine except the SRE value of the DupAck keeps increasing as the server sends more packets, It finally triggers a fast retransmit, after DupAck #12 and he sends back the initial missed packet, bytes 3073 to 4097, but by then its too late as the DupAck already wants SLE 3073 to SRE 15361 so he just keeps DupAck-ing and eventually the server seems to just stop trying, there's a lot of keep-alive and retransmission attempts. Here's a picture of the suspected packet loss point. Any ideas? I've been working on this connection for days and have no idea what's going wrong. alt text

Thanks!

Answer 1

You were losing a segment early ( the 3rd segment tcp.seq==2049 tcp.nxtseq==3073) in a batch of segments coming in from your server.

As your client offered a 64K window_size the server was sending 15 segments in a row 1024 bytes each immediately after your initial GET request. So seeing the SRE value increasing with the SLE value staying at 3073 is normal. The server should eventually retransmit the lost segment tcp.seq==2049 tcp.nxtseq==3073

If it doesn't do this as you indicated - seeing keepalive packets it might not be getting your ACKs at all ...

The IP addresses indicate the server is in your local network, however the server sending 1024 MSS segments indicates that he saw your MSS offering as 1024.

What is the MSS value offered in the client's SYN packet?

If you don't see 1460 then check your MTU size.
If you see 1460 then some box is turning it down in flight.
And this box might not handle SACK very well so it may be better to turn it off then

Reading through this again, you say "after DupAck #12 and he sends back the initial missed packet, bytes 3073 to 4097,"

This segment was never lost, it arrived at the client. If you still see it re-transmitted it the server did/could not read the SLE-SRE information correctly/at all and it might be due to
https://support.microsoft.com/en-us/kb/2525390

Regards Matthias

The latest trace shows SACK no longer being used.
Still there are too many retranmsissions, obviously caused by too much bytes flooding the serial line in too short a time because too many tcp sessions are started each offering a too large receive window.
-> see nice article about COM_OVERRUNS
Here is the IO Graph showing the amount of bytes arriving on all parallel HTTP sessions and the number of bytes in flight in the retransmissions...

alt text

The circumvention will be to slow down the sender by a combination of

reducing the MTU to 576 bytes
reducing rwin in the windows client
limit the number of concurrent http sessions

For windows to reduce the windowsize you can disable auto-tuning to disable window-scaling .

Open elevated command prompt with administrator’s privileges.

netsh interface tcp set global autotuning=disabled

To reduce the rwin to less than 64k you need to change the speed of your adapter to 10M alt text

In firefox multiple requests can be sent before any responses are received.
This is known as pipelining.
In your latest trace there are 7 sessions starting
You can turn this off by setting Network.http.pipelining to false