Hi, I need help... can't figure out what is the cause of a Web Application Slowness Issue. Web Application Server in Site A. Site A <------------Tunnel-----------> Central <------------Tunnel-----------> Site B Users from Site B, have no issues regardless of Client Operating System. Site A <------------Tunnel-----------> Central <------------Tunnel-----------> Site C Users from Site C, experience slowness on Windows Vista and 7 only. No issues with Windows XP. Initial assumptions: Additional Notes: Troubleshooting tried, but doesn't work Captured traffic from Site C and Site A, using "Time since previous frame" to check delays:
asked 12 Dec '13, 11:17 RebirthX edited 12 Dec '13, 12:15 showing 5 of 6 show 1 more comments |
3 Answers:
The segment size of 536 indicates that you are effectively using the default MTU size of 576. Did you check the SYN and SYN_ACK packet what the negotiated MSS was in the 3-way handshake? If this is a normal offering - 1380 or 1460 then you are suffering from - failing - Path MTU discovery where - if all else fails - windows falls back to 576... Did you see any ICMP packets at the client / server side indicating that fragmentation is required? Additional Information: The 3-way handshakes indicate the both clients offer a MSS of 1260 on their outbound SYN request. The SYN_ACK from the server shows a MSS of 1460 bytes in both cases. As the infrastructure consists of 'tunnels' it is good practice in the network to adjust the MSS values in the SYN packets as they are entering a VPN tunnel to avoid fragmentation problems. This does not happen. Please contact your ISPs and ask them to answered 12 Dec '13, 14:45 mrEEde edited 12 Dec '13, 23:08 Hmm... Yeah... But I don't think this is the cause of the problem actually. The following is the TCP SYN/ACK of the XP
The following is the TCP SYN/ACK the Win7 Machine
(12 Dec '13, 21:35) RebirthX |
Well, maybe the traffic is not so 'passthrough' as you expect it to be due to the config option ;-)) I suggest to capture at several places in parallel to rule out some things. Please sync the clocks of the capture devices.
Start with a capture at [1] and [4] in parallel. Then compare the timestamps of the capture files. If you see the delay in [1] but not in [4], it's not the server, but something in the data path. Continue with: [2] and [3] in parallel. Then compare the timestamps of the capture files. If you see the delay in [2] but not in [3] it is caused by the MPLS and you can hand over the problem to the ISP :-) Continue with: [1] and [3] in parallel. Then compare the timestamps of the capture files. If you see the delay in [1] but not in [3], the problem is WANX[SiteC]. If you see the delay at [3] the problem is WANX[SiteA]. That's just a rough idea how to find the problem and I did not think through all possible combinations. I leave that up to you ;-)) Regards answered 13 Dec '13, 04:52 Kurt Knochner ♦ |
Hi all, The reason why it does not affect Site B but only Site C, is because Site B & C have different Firewall Rules in Site A's Firewall. As for the TCP MSS, which is a separate issue, is not fixed yet. We tried using another Server which is in the same subnet as the Application Server and did a capture, bypassing the Server Firewall. answered 16 Dec '13, 05:08 RebirthX edited 16 Dec '13, 05:12 |
You wrote: Both site has MTU of 1500 (Tested using ping, result = 1472)
Did you specify the -f option on the ping command? What is the largest size that gets through?
"-f : Specifies that Echo Request messages are sent with the Don't Fragment flag in the IP header set to 1"
Has this ever performed well and suddenly started to slow down on Site C?
Yes.
I did ping -f -l 1472 [IP Addr.]
1473 will results in packet needs to be fragmented.
They only realised this when the APP Users recently changed from WinXP to Win7.
Hmm, still very mysterious... What is the largest segment leaving the WinXP client? What is the largest segment leaving the Win7 client? Is the ip.flags.df bit set? Do you see any retransmissions occuring in the client traces?
At the server, what is the largest segment that arrives? Is the IP don't fragment bit set when those packets arrive? What is the TTL value?
Win XP Largest Segment Leave = 536
Win7 Largest Segment Leave = 536
Server Largest Segment Arrive = 536
Win XP's IP DF Flag = 0x02
Win7's IP DF Flag = 0x02
They seems to be all the same, but only Win7 & Vista is slow.
By the way, I've tried accessing another Web server in the same subnet.
The Largest Segment is 1460.
Does that means that the TCP MSS might be a configuration issue in the Server itself?
It is a Win Server 2003 virtualized in a ESXi by the way.