This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Troubleshooting “TCP Zero Window” Warnings

0

Hi Everyone,

I'm currently having a problem troubleshooting a trading application. Let me give a simple diagram of the current network setup

(Gov't Stock Exchange Network Router)X-->(256kbs Leased Line)<--X(Telco Router)<-(100mbps fast E link)--->Our Network Devices(5 Switches, 1 Firewall)<-->Trading Server.

Our users reports that they are experiencing slowness at around 9:30 to 9:45am. I checked the CPU, Memory, Response Time and Link Utilization of all our Network Devices and Interfaces and all of them reports normal levels.

Part of the trading process is the communication between the Stock Exchange Network and our Trading Servers so if there is any slowness on that 256kbps leased line link, surely it would contribute to the slowness. Unfortunately, the telco router is not being monitored by the Telco and we're still asking for permission if we can add their device to our Solarwinds.

So the closest link I could look at is the 100mbps link from our switch going to the leased line router on our side.

When the traders are experiencing 3ms to 5ms latency in trading, it shows this:

Transmit: 1500bps - 1900bps

Receive: 2000bps - 2400bps

Bytes Transferred per Minute: 44KB-60KB

Wireshark Reports no problem at this time

Special note though on every 9:34 - 9:37 because they experience 10ms - 15ms latency in trading:

Transmit: 1900bps - 2400bps

Receive: 2400bps - 3200bps

Bytes Transferred per Minute: 90KB - 170KB

Wireshark Reports that I'm getting TCP Zero Window(trade server sending the zero window alert to the to stock exchange server) errors but it only lasts for a few milliseconds and only happens at twice or thrice a day.

And there was even one incident when our traders where experiencing crazy latencies of 1min - 3mins delay in trading!:

Transmit: 4000bps

Receive: 5600bps

Wireshark Reports that we were getting TCP Zero Window(trade server sending the zero window alert to the to stock exchange server) errors for the whole trading period of that day. This only happened once and until now, I'm still not available to resolve this issue

The Trading Server team reports that their CPU, Memory and NIC utilization is normal and of course, everyone is blaming the network guys.

So here are my questions:

  1. When TCP Zero Window happens, what things and devices should I check? Because server team reports that the Memory and NIC utilization of their trading server is normal.

  2. Is there a way to graph in wireshark the transmit/receive bps an bytes received? What I currently do is to go to Statistics -> Conversation -> IPV4 -> Check the "Limit Display to this Filter" and the filter I'm using is ip.addr eq X.X.X.X and ip.addr eq Y.Y.Y.Y and (frame.time ge "DATE HH:MM:SS.000000000" and frame.time le "DATE HH:MM:SS.999999999") and go look at the bps and bytes received

  3. Are there other things I could look at or check?

Thanks a lot for all your help guys! :)

asked 13 Aug '14, 20:59

Sharknado's gravatar image

Sharknado
1336
accept rate: 0%

edited 13 Aug '14, 21:01


2 Answers:

1

When TCP Zero Window happens, what things and devices should I check?

Zero window usually means: Give me a break. Don't send me any more data, as I cannot handle them anyway.

So, if your trading server is sending a zero window message, it's more likely that there is a problem on the server and/or with the trading application. Even if the values for cpu, mem, nic look O.K. on that server, there could still be a problem, if the application is waiting for a resource (network share, database) and thus is unable to process the data fast enough.

Is there a way to graph in wireshark the transmit/receive bps an bytes received?

In the GUI

Statistics -> IO graph

or

Statistics -> TCP StreamGraph -> Throughput Graph (or the other graphs in the same menu)

Please read the docs for an explanation of those graphs.

Are there other things I could look at or check?

If you see the zero window messages directly in front of the server (captured on a mirror port of the switch), it's not a network problem. You should then blame it back to the server or application guys ;-)) See my explanation above.

Regards
Kurt

answered 14 Aug '14, 00:00

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

0

Hi,

The key here is correlation; to the nearest second do the slow trades always coincide with the zero window size? The thing that strikes me from the figures you have given is that they could easily be as result of a single TCP Retransmission.

If Network Round Trip Time is approx 3ms a lost trade request packet would be detected after 6ms, and the retransmitted trade request would be responded to after a further 3ms (perhaps a bit more allowing for compute time) giving trading latency of about 10ms.

Let's set aside the 1 to 3 min issue for the moment. What you need to do is look at the time between a trade request leaving the trading server and the response coming back from the exchange (I'm assuming the complaint here is time to trade and not freshness of prices). Most trading protocols like this are very simple; packet to the exchange with the request and packet back with the response.

Identify the TCP port(s) that the Exchange trading process is using, and then filter the traffic to just analyze traffic to and from those ports.

You could export the Packet List data to a CSV and study the response times in Excel. A simpler way would be to use the TRANSUM plugin ( see http://www.tribelabzero.com/resources ) which is freely available. TRANSUM will show you the response times from the exchange - just remember the add the Exchange TCP port numbers to the list of Service Ports in the TRANSUM Preferences (see the TRANSUM Manual for details).

Time sync your capture units to the trading server as best you can, and ask the trading app support people for the precise time of slow trades (there is bound to be a time stamped log - be careful of timezone differences). Once you have these times look at the response times for the exchange requests at those times. If you find a slow one, check if there have been no retransmissions or zero window events. If not the latency is between your trace point and the exchange or in the exchange itself.

One final point, bear in mind that your trading server may be using TCP Segmentation Offload (or other offload functions) and so what you see at the NIC interface may not be what the trading app is seeing.

Best regards...Paul

answered 14 Aug '14, 15:12

PaulOfford's gravatar image

PaulOfford
131283237
accept rate: 11%