Why would NAT cause ZeroWindowProbeAck?

Question

I have an embedded device that is running an http server that returns [TCP ZeroWindowProbeAck] packets if and only if the gateway/router that the server lives on has NAT turned on.

More details:

The embedded device is using Z-World Rabbit Web Server and the router is a Digi ConnectPort - the ConnectPort LAN lives on 10.10.6.1.

In the ConnectPort there is a setting called Enable Network Address Translation (NAT). When NAT is turned off, the browser session between my PC and the embedded device is fine and normal. However, when I turn NAT on (in the ConnectPort), the web server response to the PC slows to a crawl.

Using Wireshark I realized that when NAT is turned on, the session between my PC and the embedded web server is is returning many packets that contain [TCP ZeroWindowProbe] and [TCP ZeroWindowProbeAck] info. Here's a c/p of just a few of those packets (within a span of a few minutes I will notice hundreds of these zero window packets).

Note: 10.10.6.100 is my PC and 10.10.6.106 is the web server.

589 100.690826  10.10.6.100 10.10.6.106 TCP [TCP ZeroWindowProbe] 51294 > http [ACK] Seq=1 Ack=1 Win=64240 Len=1
590 100.692413  10.10.6.106 10.10.6.100 TCP [TCP ZeroWindowProbeAck] [TCP ZeroWindow] http > 51294 [ACK] Seq=1 Ack=1 Win=0 Len=0 MSS=1460
591 100.811036  10.10.6.100 10.10.6.106 TCP [TCP ZeroWindowProbe] 51295 > http [ACK] Seq=1 Ack=1 Win=64240 Len=1
592 100.812883  10.10.6.106 10.10.6.100 TCP [TCP ZeroWindowProbeAck] [TCP ZeroWindow] http > 51295 [ACK] Seq=1 Ack=1 Win=0 Len=0 MSS=1460

When I turn off NAT in the ConnectPort the zero window probe packets stop completely.

I’m fishing for insight on how the NAT setting in the ConnectPort would cause the web server to send zero window probe acknowledgments to the PC client.

Update (response to Kurt)

The Wireshark capture above was taken from the PC (10.10.6.100)
I am implementing port forwarding to allow remote access to the embedded device at 10.10.6.106. To set up port forwarding, the ConnectPort device requires that NAT is enabled.
I am unable to do a Wireshark capture from the embedded device (server). Or rather, I don’t know how to go about doing a capture for the server since the server is an embedded device (there’s no way to install Wireshark on the server).
I’m unable to get those four captures that you speak of as I am away from the devices today

Here’s more details on the network setup:

The Digi ConnectPort is the Gateway with a LAN IP of 10.10.6.1 and a Subnet of 255.255.255.0
The PC and embedded device are hub-connected to the ConnectPort both with static IP’s of 10.10.6.100 and 10.10.6.106 respectively.
The ConnectPort is connected wirelessly to the Internet via a Sim card and has a Public Static IP Address of, let’s just say, 2.2.2.2
The ConnectPort is configured to allow remote access to the embedded device (10.10.6.106) via IP Port Forwarding. The port forwarding is configured as: External Port: 81; Internal Port: 80; IP Address: 10.10.6.106

Update 2

If and only if NAT is enabled do I get the ZeroWindowProbeAck from the server. Strangely enough, the ZeroWindowProbeAck will occur whether or not the NAT is being hit. In other words, the ZWPA messages occur even when I am connected to the server locally (via LAN) (eg. A web browser on 10.10.6.100 connects directly to the HTTP port at 10.10.6.106).

The embedded device, the PC, and the ConnectPort are connected via a simple hub (no smart switch involved).

Update 3

I have uploaded a Wireshark capture to http://www.cloudshark.org/captures/937f5b4667cf (My original capture was greater than 10MB, so I had to remove a few thousand packets. But this truncated version should be enough to show the ZeroWindowProbeAck packets that I am referring to).

Note: I wasn’t able to get a capture with the problematic devices while they were connected to the Digi ConnectPort. However, using the Cradlepoint CTR35 I was able to reproduce the problem.

IP Assignments

Router: 10.10.6.19
T42 Laptop: 10.10.6.111 (computer running Wireshark)
A031: 10.10.6.106 (This is one of devices that are reporting ZeroWindowProbeAck packets)
A041: 10.10.6.107 (This is another device (with different firmware version) that is reporting ZeroWindowProbeAck packets)
iMac: 96.229.53.113 (computer making remote connections via Internet to the devices)

Here is the list of events that I did during the Wireshark session:

Connect to A013 (166.156.159.196:8585 / 10.10.6.106) from iMac via Google Chrome
Connect to A041 (166.156.159.196:8586 / 10.10.6.107) from iMac via Google Chrome
At around Packet Number 9496, I logged into A013
At around Packet Number 11700, I logged into A041
At around Packet Number 14452, I close the browser connections on the iMac
I open up Firefox on T42 laptop
From T42 I connect to A013. The connection is successful
At around Packet Number 15500, I login to A013 from T42
At around Packet Number 16172, I connect to A041 from T42. Connection is successful
At around Packet Number 18207, I login to A041 from T42. Login successful
At around Packet Number 19709, I clicked the Inhibitor_1 (Relay 1) link on A031. Successful
At around Packet Number 20906, I clicked on the PowerRelay_1 (Relay 1) link on A041. Successful.
At around Packet Number 22150, I closed the browsers on T42.
At around Packet Number 22164, I open up browser connections to A031 and A042 from iMac
At around Packet Number 28000, I closed the browser connections from the iMac
At around Packet Number 28083, I open browser connections to A031 and A042 from T42
At around 29600, I connect to the Cradlepoint router (at 10.10.6.19) from T42
At around 31550, I remove the Firewall rules that allow remote connections to A013 and A041 via the Cradlepoint
At around 34805, I re-added the Firewall rule to allow remote connections to A013
At around 36008, I closed the browser that was connected to A013 (10.10.6.106)
At around 36683, I opened a browser WAN connection from iMac to A013
At around 41454, I turned off the power to the A013 controller
At around 41801, I turned on the power to the A013 controller
At around 43129, I logged in to A013 from iMac. Successful.
At around 44741, I closed the browser on T42 that was connected to A041
At around 46016, I tried to open a connection from iMac to A042 (via WAN), but it failed. Reason: I forgot to add the Firewall rule to allow WAN access.
Via the Cradlepoint admin interface, I re-added the Firewall rule to allow WAN access to 10.10.6.107 (A041)
At around 47812, I connected from iMac to A041 (via WAN). Successful.
At around 51300, I closed both WAN connections to the A041 and A013 controllers (from iMac).
At around 51516, I made a LAN connection from T42 to A013
At around 52278, I made a LAN connection from T42 to A041
At around 54111, I closed the browser connections to both A041 and A013 from T42.
END WIRESHARK CAPTURE

Here is how the nodes are connected Here is the network setup

Answer 1

Enabling NAT on the router will not influence the connection between the client and the embedded device, as they are directly connected. So something else must be of influence here.

Since the seq and ack in the packets are both 1, I assume that these are packets following the TCP three-way handshake. This means the server accepts a connection, but says it has no buffer to receive any data. What happens when you do enable the NAT, but don't create the port forwarder? Maybe your embedded device gets swamped with traffic from the internet?

Answer 2

1

answering to your update #2.

I don't see how NAT will have an influence here, especially if the NAT 'rule' is not being used.

What I could imagine: As soon as you enable 'incoming' NAT (port forwarding) for the server, the router also (silently) enables outgoing NAT (masquerading, hide nat, you name it) for the server. As soon as that is enabled, the server might start downloading data from the internet (firmware updates, etc.). While is it busy, it might not have enough resources to answer your internal requests appropriately (full buffer -> zero window).

However, without a full trace (pcap file), it's hard to tell what's going on. Can you post the two pcap files (with/without NAT enabled) captured at the client somewhere (one click file hoster, cloudshark.org, etc. BEWARE of the privacy issues in doing so!).

Can you also try to capture the traffic in front of the server (maybe using a second PC/laptop)?

BTW: Are you sure your 'hub' is a real hub. If so, you should see the traffic from the server to the internet while capturing on the client. Do you? If yes, capturing at the client should be sufficient. Then, please capture the traffic with/without NAT by using the following capture filter:

host 10.10.6.106

Regards
Kurt

answered 21 Dec '12, 13:39

Kurt Knochner ♦
24.8k●10●39●237
accept rate: 15%

edited 21 Dec '12, 14:00

I didn't/wouldn't think NAT would have an influence, either.

I wonder if when NAT is enabled, it will continually update its NAT Table (even if a request hasn't been made from the outside)... if so, could it be that that NAT traffic is causing the 10.10.6.106 server to fill up its buffer.

I'll look into posting the pcap file and the pre-server-traffic capture. Thanks for the offer to look.

@hub - I don't know what you mean by "real". It looks and works like a hub.

(21 Dec '12, 13:59) KTM

@hub - I don't know what you mean by "real". It looks and works like a hub.

well, nowadays switches are sometimes labeled as 'switching' hubs whereas they are really (unmanaged) switches. However, people often just read the hub part ;-) So, it's actually hard to buy a 'real' hub, as they are no longer needed (cheap switch alternatives available) and therefore they are no longer produced in masses.

(21 Dec '12, 14:04) Kurt Knochner ♦

Most devices these days are switches, even though they don't say switch on the box. A switch works much differently from a hub and that has a great impact on what you do and don't see on your capturing system.

Re-reading your update, you state that the embedded system and the PC are "hub-connected" do you mean they are attached to a "hub" and the "hub" is connected to the ConnectPort? Or do you mean they are both connected to the LAN ports of the ConnectPort?

(21 Dec '12, 14:07) SYN-bit ♦♦

I interpreted his update #2 to my question as a separate hub. However, now I'm no longer sure..

(21 Dec '12, 14:11) Kurt Knochner ♦

@Kurt - Sorry, I didn't think your question through regarding "real" hub. You are correct, I'm using an unmanaged switch (Netgear FS605 (http://goo.gl/6pJ4c) .. on the box it was labeled as a "Switch/Hub").

The PC, the embedded device, and the ConnectPort all connect to the switch/hub. So, in total, there are four devices involved - PC, embedded device, switch/hub, and the ConnectPort.

(22 Dec '12, 08:02) KTM

O.K. then a capture in front of the embedded server might help. But let's start with the client captures with/without NAT. Can you upload those somewhere?

(22 Dec '12, 09:15) Kurt Knochner ♦

One other thing to check is the arp entry for the embedded server on your PC, does it point to the embedded device or to the Connectport? And the traffic coming back from the embedded device, what src-mac address does it have in Wireshark.

So, with all 4 devices connected and NAT being enabled, it is slow. What happens immediately after you disconnect the Connectport (after verifying that it was slow when the Connectport is connected)?

I agree with @Kurt, traces traces, we want to look at traces :-)

(22 Dec '12, 12:36) SYN-bit ♦♦

The holiday's will keep me from gathering more data and doing more tests. Hopefully you gents will still be interested in this topic to start off the new year :). Thanks, again.

(22 Dec '12, 17:22) KTM

Usually the questioners loose interest in their question/project ;-)) Anyway, I'm looking forward to your updates.

(23 Dec '12, 02:18) Kurt Knochner ♦

showing 5 of 9 show 4 more comments