I'm having a problem in the company network, where server is receiving a high number of Zero Window-packets. Now, I understand that this would be quite normal if the client is too busy to handle the incoming flow. But here's the thing, I captured the traffic from client's end simultaneously with server's end and there were none zero window-packets leaving from client. So somewhere on the path the packet window size had changed to zero which keeps stopping the flow. Could it be that some node on the way is changing that window size to ease its own load? Any help/ideas appreciated. asked 02 Feb '12, 23:46 rakki |
2 Answers:
The only device that comes to mind right away is a traffic shaper/management system that might decide to change some TCP layer values. I've seen that in a couple of analysis cases where a Packeteer appliance (now a product of BlueCoat) fumbled around with the window size - and I say "fumble", because it did pretty stupid things, like dropping a clients real window size from 64k to 200 bytes (a.k.a. a "Silly Window"), which caused huge artificial delay issues. You might want to check if there is any such device in the communication path, and do a capture left and right of it to see if it mangles the packets. Routers/Switches usually do not do this, because they're only working on layers 2 and 3. answered 03 Feb '12, 00:05 Jasper ♦♦ showing 5 of 9 show 4 more comments |
In her Troubleshooting Tips and Tricks for TCP/IP Networks presentation at Sharkfest '11, Laura Chappell mentioned the problem with certain interconnecting devices changing and/or removing altogether certain TCP options, such as window scaling and SACK options. Perhaps this is what is happening in your case. answered 05 Feb '12, 07:04 cmaynard ♦♦ |
Thank you for the quick answer. I'll check on that and start doing captures on the path. I'll let you know if I find the cause of the problem.
Other devices that might be changing the receive window are application layer proxies (including most loadbalancers)
Thanks, SYNbit. There actually is a loadbalancer on the network and I was wondering if that could be the problem.
You might want to go and do two captures simultaneously, recording what goes in and what comes out of the loadbalancer (if it is a device with two interfaces). If you have only one interface where both client request and the same load balanced request goes to the servers you can do it with one single capture. Then you need to find the same packets and compare their TCP headers to see if they were mangled.
You mean doing a capture in the balancer? Stupid question btw: Do these loadbalancers have their own ip? ie. if I'm doing a traceroute, do I see them on the path or are they switch-type components?
That all depends on the way they have been configured. They can operate routed, one-armed or transparant (bridged). Some brands (like F5) let you do capturing on the box itself.
Welcome to the wonderful world of network analysis :-)
We've had issues with certain "features" of the NetScalers doing this to us. I've not seen any of our wan "accelerators" do this yet - I'm not sure why a shaper would decide to insert 0-window adverts into a healthy stream. Perhaps it's configured with "guards" to prevent over saturation of a link, or service?
Beats me. The rate to the servers isn't anything drastic. And also I've noticed that the amount of zero window-packets doesn't relate at all to the data rate.
Well, since a LB can be acting as an application layer proxy, it has TCP buffers on the server and on the client side. Of the connection on the server side is fast while it is slow on the client side, the TCP receive buffers on the serverside of the LB can be filled up pretty quick, resulting in zero-window ACKs.