Hi there, Since 3 or 4 months ago, I'm facing a really strange problem in a network containing around 100 workstations running mixed OSes (Windows 7, MacOS and Linux). This problem consists in a very slow traffic when trying to upload files to some HTTPS website from Windows (and only from Windows!!) workstations. A good reproducible test would be an upload to wetransfer.com or sendspace likely sites, where I get speeds around 5KB/s, taking like 6 hours to upload a 60MB file. Doing the same test (at the same time) from a Mac or Linux station, I get the full link speed (which is 50Mb/s), finishing the upload in less than a minute. Trying to figure out the issue, I ran Wireshark in the router (a Debian Linux running iptables) where I noticed lots of TCP Retransmission coming from the Windows host as you can check in he link below: I know what does those retransmissions mean and that maybe this is the symptom of some other problem, but I can't understand why this only happen to Windows workstations. This problem is seriously affecting the job of the company since this makes impossible send emails with attachments, use Teamviewer for remote access and so on. Not less important, I tested one of the workstations connecting directly to the internet link with a public ip address, and then this problem simply has gone, as the issue would be directly related to my Linux firewall. Thanks in advance for any help on that. Best, Danilo asked 29 Dec '14, 06:48 Mussolini edited 30 Dec '14, 13:40 packethunter |
One Answer:
Hi Mussolini, you got quite an interesting trace file. I try to give my answer first, and then describe, how I came to the summary. Summary It could be, that the network card on your Debian box is reassembling TCP segments (i. e. individual IP packets) into one large segment and fails forwarding these packets. Fixing the problem would require a configuration in your Linux firewall / router. Detailed Analysis Based on the display filter provided in the link to cloudshark I assume that the windows machine is using the IP address 192.168.8.26. Unfortunately we don't have a packet that could be used as a finger print. This system is visible with 2 TCP sessions in the trace file:
As a bonus - and already identified by kishan pandey as a potential indicator for trouble - we get 4 ICMP messages "fragmentation needed" (type 3, code 4). As kishan already pointed out, the "fragmentation needed" messages are triggerd by abnormally large frames. Apply the filter Background: Packet sizes Unless Jumbo Frames are used the maximum size of an IP packet is 1500 bytes. As 20 bytes are used the IP header and another 20 bytes are used for the TCP header this leaves 1460 bytes for application data. This value is called the "maximum segment size" (MSS). The packet size might be reduced to accommodate PPPoE, IPsec or other headers. (We ignore TCP or IP options to keep things simple). Both endpoints of the TCP connection have to know how much data can be stuffed into a packet, so they exchange the MSS size during the handshake. You would see the value 1460 within your LAN or 1452 if the remote site is using PPPoE. Jumbo frames in the trace Unfortunately, the trace does not show the connection start for the two clients. So we don't know the maximum segment size. Still it is a safe bet to assume that the MSS is 1460 or less. TCP reassembly It would be great to have a trace file that is not recorded on a separate device (using a SPAN port or a tap). I am pretty sure that this trace will show, that 192.168.8.26 is sending packets with an IP lengh of 1500 or less. In other words: You do not have jumbo frames in your LAN Still, your trace shows jumbo frames. These are most likely generated in your Debian Linux box. This could be the work of TCP Offloading in your network card (in the Windows world it is called "TCP chimney"), or a result of the software used in your Linux box. Artefacts of the TCP reassembly The reassembly within your Linux box becomes clear when looking at the IP Identification. Try to apply the IP ID and the IP Length as columens (Display filter The jumbo frame now exists within the memory of your firewall / router. When forwarding the packet to the external interface (or maybe, when the NAT process is applied) the IP stack notices that the frame exceeds the maximum packet length for the interface and discards the packet. The source (192.168.8.26) is informed with an "ICMP fragmentation needed". You don't see the ICMP packet for every single frame, because the kernel throttles the number of ICMP packets. Why Windows, and not Linux? I could imagine that the problem also exists with Linux clients. Windows and Linux will probably react in different ways to the ICMP fragmentation method. As the sender did nothing wrong the fragmentation needed message is confusing at best. If you show us a trace file with both Windows and Linux boxes transmitting simultaneously we can see the difference. How to fix this? Please check the configuration of your Linux box. At some level (either network card or network stack) multiple incoming TCP segments are combined into the jumbo frame. Either use the offloading mechanism of your external network card to transmit the large packet or disable the segment reassembly for incoming data. Good luck and happy hunting PS: Just to still my curiosity: I would be interested to know
answered 30 Dec '14, 05:35 packethunter edited 30 Dec '14, 12:17 Hi Mr. Hunter!! :) First of all, I would like to thank you for your great explanation about this topic. For sure this improved my tiny knowledge. Well, making things clear, you were right! Actually, I have the same behaviour on Linux workstations, only on Macs I can do it with no issues. I tried Linux again and got exactly the same slowness. Here is a dump when trying to upload from Linux: I could send only this small dump once Cloudshark limit the size to 1.5M only. Is there any other place to upload pcap files? So, before I read your great post, I found something that got my attention in this link, item 15.7 where it says "Path MTU Discovery doesn't work as well as it should anymore". Then, I tried applying this iptables rule to set the MSS:
So then, for the first time the things started working!! The uploads worked as it should but I still notice some unstable behaviour during the process. The progress bar goes to 5MB and stops, sometimes it continue, sometimes it starts again from zero, then going until the end. I guess this makes sense, since we set a very small segment size. Considering your explanation, I set the MSS to 1400 and until now, it's working fine for both Windows and Linux stations. Below I uploaded a dump (in two parts because of the size) after MSS set: Regarding your questions: a) Sent above b) Debian 7.6 / 1 SMP Debian 3.2.60-1+deb7u1 x86_64 GNU/Linux / Iptables v1.4.14 / Proxy transparent with Squid c) The iptables rule I just applied. Thanks again and let me know if you would like to check any other information. Best, Danilo (30 Dec '14, 10:28) Mussolini 1 Hi Danilo, your situation will not be solved by changing the MSS. Changing the segment size (or packet size or MTU size) will change the appearance, but it will not fix your problem. More likely, you have a configuration issue on your Linux firewall / router. Try the Now, let's have a look at your trace files: The "Dump Linux" shows exactly the same behaviour. Let's look at a few interesting frames:
Conclusion: Whatever is happening on your Linux box affects both directions (from inside to outside and vice versa). Next a look at frame the Linux upload trace. Starting from frame 934 things go down hill. Try the display filter
If you had Voice over IP in that trace file we could probably see the phone call going from the user to the help desk. :-) When comparing the situation to the Windows-trace file you see that Microsoft is not much better: Retransmissions are triggered within 500 milliseconds. The tracefiles that you provided clearly show, that the problem resides within your Linux router / firewall. The router is sending ICMP messages (and drops frames) when it should not. Both Linux and Windows systems ignore the packets, because they did nothing wrong. As TCP is not aware of the packet drops, the sender has to recovered from the packet loss by using extremly slow TCP mechanisms. The same problem is discussed at a blog. According to the blog, the Linux command Happy hunting (30 Dec '14, 13:24) packethunter Hi Packet, how are you ? Hope you had a great new years eve. Sorry for the delay, but I'm just came back to work and also my emails! ;) Well, reading your last post (and now testing from inside the company, not remote) I notice interesting outputs regarding the Offload. I have eth1 (Internal) and eth2 (external) interfaces, the ethtool output was like this:
I don’t know why, but eth2 was with RX enabled and TX disabled, while the eth1 was both enabled. I don’t know if that was the problem, but to be sure and as you recommended, I disabled offload control on both cards and now the things started working!! Even without the iptables rule I mencioned before, the things seems to be working and I can’t see jumbo frames in Wireshark anymore . I will test that better during this day but it really sounds good. Is that a problem to work like this ? This router is an Intel Quad-core CPU machine. Well, I don’t know how to thank you for the time you spent on this case and for the great explanations, you are like a monster of TCP. ;) Which makes me wonder how to be a TCP specialist like you. Best Regards, Danilo (06 Jan ‘15, 10:40) Mussolini |
I could see that windows workstation(192.168.8.26) is sending packet with tcp len with size more than standard mtu size along with DF bit set and your router is discarding them replying with icmp fragmentation needed message not sure if this icmp messages are delivered to source windows system because it keeps on sending larger mtu size packets.you can try disabling jumbo frame option on this system(192.168.8.26) and then give a try
Hi Kishan, Thanks for the reply. Actually I don't have jumbo frame set on this machine, it's just set to default. Neither all the other Windows workstations have jumbo frame set and they also behave like this.