This is our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Hi there,

Since 3 or 4 months ago, I'm facing a really strange problem in a network containing around 100 workstations running mixed OSes (Windows 7, MacOS and Linux). This problem consists in a very slow traffic when trying to upload files to some HTTPS website from Windows (and only from Windows!!) workstations. A good reproducible test would be an upload to wetransfer.com or sendspace likely sites, where I get speeds around 5KB/s, taking like 6 hours to upload a 60MB file. Doing the same test (at the same time) from a Mac or Linux station, I get the full link speed (which is 50Mb/s), finishing the upload in less than a minute.

Trying to figure out the issue, I ran Wireshark in the router (a Debian Linux running iptables) where I noticed lots of TCP Retransmission coming from the Windows host as you can check in he link below:

CloudShark Link

I know what does those retransmissions mean and that maybe this is the symptom of some other problem, but I can't understand why this only happen to Windows workstations. This problem is seriously affecting the job of the company since this makes impossible send emails with attachments, use Teamviewer for remote access and so on. Not less important, I tested one of the workstations connecting directly to the internet link with a public ip address, and then this problem simply has gone, as the issue would be directly related to my Linux firewall.

Thanks in advance for any help on that.

Best,

Danilo

asked 29 Dec '14, 06:48

Mussolini's gravatar image

Mussolini
11113
accept rate: 0%

edited 30 Dec '14, 13:40

packethunter's gravatar image

packethunter
2.1k71548

1

I could see that windows workstation(192.168.8.26) is sending packet with tcp len with size more than standard mtu size along with DF bit set and your router is discarding them replying with icmp fragmentation needed message not sure if this icmp messages are delivered to source windows system because it keeps on sending larger mtu size packets.you can try disabling jumbo frame option on this system(192.168.8.26) and then give a try

(29 Dec '14, 20:28) kishan pandey

Hi Kishan, Thanks for the reply. Actually I don't have jumbo frame set on this machine, it's just set to default. Neither all the other Windows workstations have jumbo frame set and they also behave like this.

(30 Dec '14, 04:22) Mussolini

Hi Mussolini,

you got quite an interesting trace file. I try to give my answer first, and then describe, how I came to the summary.

Summary

It could be, that the network card on your Debian box is reassembling TCP segments (i. e. individual IP packets) into one large segment and fails forwarding these packets. Fixing the problem would require a configuration in your Linux firewall / router.

Detailed Analysis

Based on the display filter provided in the link to cloudshark I assume that the windows machine is using the IP address 192.168.8.26. Unfortunately we don't have a packet that could be used as a finger print.

This system is visible with 2 TCP sessions in the trace file:

  • A session on TCP port 3389 to 192.168.8.1 (probably your gateway / router / firewall)
  • A session on TCP port 443 to 54.231.9.49

As a bonus - and already identified by kishan pandey as a potential indicator for trouble - we get 4 ICMP messages "fragmentation needed" (type 3, code 4).

As kishan already pointed out, the "fragmentation needed" messages are triggerd by abnormally large frames. Apply the filter ip.len > 1500 to see that 192.168.9.120 is also generating these messages.

Background: Packet sizes

Unless Jumbo Frames are used the maximum size of an IP packet is 1500 bytes. As 20 bytes are used the IP header and another 20 bytes are used for the TCP header this leaves 1460 bytes for application data. This value is called the "maximum segment size" (MSS).

The packet size might be reduced to accommodate PPPoE, IPsec or other headers. (We ignore TCP or IP options to keep things simple). Both endpoints of the TCP connection have to know how much data can be stuffed into a packet, so they exchange the MSS size during the handshake. You would see the value 1460 within your LAN or 1452 if the remote site is using PPPoE.

Jumbo frames in the trace

Unfortunately, the trace does not show the connection start for the two clients. So we don't know the maximum segment size. Still it is a safe bet to assume that the MSS is 1460 or less.

TCP reassembly

It would be great to have a trace file that is not recorded on a separate device (using a SPAN port or a tap). I am pretty sure that this trace will show, that 192.168.8.26 is sending packets with an IP lengh of 1500 or less. In other words: You do not have jumbo frames in your LAN

Still, your trace shows jumbo frames. These are most likely generated in your Debian Linux box. This could be the work of TCP Offloading in your network card (in the Windows world it is called "TCP chimney"), or a result of the software used in your Linux box.

Artefacts of the TCP reassembly

The reassembly within your Linux box becomes clear when looking at the IP Identification. Try to apply the IP ID and the IP Length as columens (Display filter ip.id and ip.len). You will notice, that the IP-ID should be incremented by one with each packets. Notice, that the IP-ID is incrementing in a non-linear way. Everytime the IP length is exceeded 1500 bytes the IP ID is increased by more than 1.

The jumbo frame now exists within the memory of your firewall / router. When forwarding the packet to the external interface (or maybe, when the NAT process is applied) the IP stack notices that the frame exceeds the maximum packet length for the interface and discards the packet. The source (192.168.8.26) is informed with an "ICMP fragmentation needed".

You don't see the ICMP packet for every single frame, because the kernel throttles the number of ICMP packets.

Why Windows, and not Linux?

I could imagine that the problem also exists with Linux clients. Windows and Linux will probably react in different ways to the ICMP fragmentation method. As the sender did nothing wrong the fragmentation needed message is confusing at best. If you show us a trace file with both Windows and Linux boxes transmitting simultaneously we can see the difference.

How to fix this?

Please check the configuration of your Linux box. At some level (either network card or network stack) multiple incoming TCP segments are combined into the jumbo frame. Either use the offloading mechanism of your external network card to transmit the large packet or disable the segment reassembly for incoming data.

Good luck and happy hunting

PS: Just to still my curiosity: I would be interested to know

  • a) how Linux systems look like in the trace
  • b) what software is involved in your Linux box (Kernel, version, possible firewall / proxy ...)
  • c) what parameter fixes the behaviour
permanent link

answered 30 Dec '14, 05:35

packethunter's gravatar image

packethunter
2.1k71548
accept rate: 8%

edited 30 Dec '14, 12:17

Hi Mr. Hunter!! :) First of all, I would like to thank you for your great explanation about this topic. For sure this improved my tiny knowledge.

Well, making things clear, you were right! Actually, I have the same behaviour on Linux workstations, only on Macs I can do it with no issues. I tried Linux again and got exactly the same slowness. Here is a dump when trying to upload from Linux:

Dump Linux

I could send only this small dump once Cloudshark limit the size to 1.5M only. Is there any other place to upload pcap files?

So, before I read your great post, I found something that got my attention in this link, item 15.7 where it says "Path MTU Discovery doesn't work as well as it should anymore". Then, I tried applying this iptables rule to set the MSS:

iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 128

So then, for the first time the things started working!! The uploads worked as it should but I still notice some unstable behaviour during the process. The progress bar goes to 5MB and stops, sometimes it continue, sometimes it starts again from zero, then going until the end. I guess this makes sense, since we set a very small segment size. Considering your explanation, I set the MSS to 1400 and until now, it's working fine for both Windows and Linux stations. Below I uploaded a dump (in two parts because of the size) after MSS set:

Dump with TCPMSS set - Part1

Dump with TCPMSS set - Part2

Regarding your questions:

a) Sent above

b) Debian 7.6 / 1 SMP Debian 3.2.60-1+deb7u1 x86_64 GNU/Linux / Iptables v1.4.14 / Proxy transparent with Squid

c) The iptables rule I just applied.

Thanks again and let me know if you would like to check any other information.

Best,

Danilo

(30 Dec '14, 10:28) Mussolini
1

Hi Danilo,

your situation will not be solved by changing the MSS. Changing the segment size (or packet size or MTU size) will change the appearance, but it will not fix your problem.

More likely, you have a configuration issue on your Linux firewall / router. Try the ethtool to disable the TCP offloading feature.

Now, let's have a look at your trace files:

The "Dump Linux" shows exactly the same behaviour. Let's look at a few interesting frames:

  • In frame 704 we see a packet of 2960 byte length from the server (IP length is 2676)
  • Clearly, two separate frames received by the system were concatenated either by hardware (TCP offloading) or by a software function
  • This oversized message is most probably split into two frames and transmitted without problems to the workstation.
  • Note that the workstation acknowledges the data in frame 706. Note the time difference between frames 704 and 706: This all happens within 1 millisecond. Typical LAN speed, no retransmission, everything is fine and speedy.

Conclusion: Whatever is happening on your Linux box affects both directions (from inside to outside and vice versa).

Next a look at frame the Linux upload trace. Starting from frame 934 things go down hill. Try the display filter tcp.port == 39553 and set a time reference to frame 934.

  • Note that the round trip time calculated from frames 553 to 619 is 158 milliseconds
  • In 934 we see a large packet from the client. This is probably the TCP offloading engine that combined two packets from the network into one packet for the router to reduce processing and interrupt load. The large packet triggers the "ICMP fragementation needed" message in 935
  • In frames 938 and 939 the same happens for another oversized chunk of data
  • The message displayed in frame 934 was not delivered to the server. It is never acknowledged.
  • In frame 1085 the server sends some data. The ACK-no from this frame tells the client, that data from frame 938 did not make it.
  • In frame 1154 the client finally realizes that the packet was lost, retransmit a single packet and waits for an acknowledgement. As we did not generate the oversized packet an ACK arrives after 191 milliseconds. That is a bit longer than the RTT. The server probably was expecting more data and tried to delay the ACK.
  • Based on the timing I would dare saying, that the workstation is undergoing a TCP slow start. This is, why only a single packet was send and more data only follows after this packet was ACK'd.
  • Now the client tries to send two packets at once, (this is the TCP slow start at work). As seen in frame 1307 these packets get reassembled and trigger another ICMP message.
  • The sender is now assuming that a WAN link is seriously clogged up and wants to give the router a chance to work off the transmission queue. The next retransmission in 1840 follows after 1.4 seconds.

If you had Voice over IP in that trace file we could probably see the phone call going from the user to the help desk. :-)

When comparing the situation to the Windows-trace file you see that Microsoft is not much better: Retransmissions are triggered within 500 milliseconds.

The tracefiles that you provided clearly show, that the problem resides within your Linux router / firewall. The router is sending ICMP messages (and drops frames) when it should not. Both Linux and Windows systems ignore the packets, because they did nothing wrong. As TCP is not aware of the packet drops, the sender has to recovered from the packet loss by using extremly slow TCP mechanisms.

The same problem is discussed at a blog. According to the blog, the Linux command ethtool -K eth0 gso off turns off TCP offloading. This would cause your Linux box to process each packet individually.

Happy hunting

(30 Dec '14, 13:24) packethunter

Hi Packet, how are you ? Hope you had a great new years eve.

Sorry for the delay, but I'm just came back to work and also my emails! ;)

Well, reading your last post (and now testing from inside the company, not remote) I notice interesting outputs regarding the Offload. I have eth1 (Internal) and eth2 (external) interfaces, the ethtool output was like this:

[email protected]:~# ethtool --show-offload eth2
Features for eth2:
rx-checksumming: on
tx-checksumming: off
    tx-checksum-ipv4: off
    tx-checksum-unneeded: off [fixed]
    tx-checksum-ip-generic: off [fixed]
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: off
    tx-scatter-gather: off
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
    tx-tcp-segmentation: off
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]

[email protected]:~# ethtool --show-offload eth1
Features for eth1:
rx-checksumming: on
tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-unneeded: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]

I don't know why, but eth2 was with RX enabled and TX disabled, while the eth1 was both enabled. I don't know if that was the problem, but to be sure and as you recommended, I disabled offload control on both cards and now the things started working!! Even without the iptables rule I mencioned before, the things seems to be working and I can't see jumbo frames in Wireshark anymore . I will test that better during this day but it really sounds good. Is that a problem to work like this ? This router is an Intel Quad-core CPU machine.

Well, I don't know how to thank you for the time you spent on this case and for the great explanations, you are like a monster of TCP. ;) Which makes me wonder how to be a TCP specialist like you.

Best Regards,

Danilo

(06 Jan '15, 10:40) Mussolini
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×752
×104
×7

question asked: 29 Dec '14, 06:48

question was seen: 2,995 times

last updated: 06 Jan '15, 14:42

p​o​w​e​r​e​d by O​S​Q​A