This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

web site works on one ISP but not on another

0

I have two ISP. One is ISP-1 4.5Mbps and the other is ISP-2 50Mbps. When I have my website on ISP-1 all is fine, but when I move the website to ISP-2 there are delays. For example, when navigating to the website while on ISP-2 the page sometimes loads half way or sometimes loads completely, but the browser spinning globe/circle keeps spinning.

I captured packets from the client, web server, and the firewall and the RTT was about 20ms faster than if connected to ISP-1, but still having issues loading. There's no latency on the ISP-2 wire, no latency on the client, and no latency on the server. ISP also checked for asynchronous routing and no issues found. Half way the page loading I start seeing loss packets, but the server quickly sends retransmission packets. The server keeps retransmitting even though the client has received the packet. Then, the connection goes to a lot of keep-alives from the client to the server even though the server has not finished sending the page.

The only difference, other than the ISPs, is that ISP-1 (the one that works) is connected to a 100Mbps port on the firewall and ISP-2 is connected to a 1Gbps port of the firewall.

Is it possible that the site being on a 1Gbps port is receiving data quicker than it could handle and the application cannot process the requests. I say application because the server has more than enough resources and based on performance test done to the server I've eliminated hardware as the issue.

Any ideas?

asked 11 Mar '15, 16:53

alexltk0506's gravatar image

alexltk0506
16236
accept rate: 0%

It's certainly not the bandwidth difference in your 1G link being "too fast", to answer that question. TCP handles the receive buffer of the client to limit the transfer rate to what can be received. Could you provide a link to the packet capture fof your ISP-2 attempts?

You can upload here and just reply with the URL, assuming the data is not confidential: https://appliance.cloudshark.org/upload/

(11 Mar '15, 18:24) Quadratic

2 Answers:

0

From the trace file, it appears that TCP acknowledgements from 192.168.100.100 aren't getting received by 104.67.244.90. For example, take a look at packet 27, where 1380 bytes of TCP payload from packet 26 are being acknowledged with ACK value of (relative) 7190. Despite the acknowledgement, the data is retransmitted in packet 30 after 600ms, then again in packet 154 1200ms later, then in 156, 158 and 160. If you look at the Sequence/ACk numbers of the first TCP stream, for example (tcp.stream eq 0), it looks to me like 104.67.244.90 is not able to see 192.168.100.100's packets.

Are you able to trace this from 104.67.244.90's perspective? If you compare two traces from the client ans server my guess is that the server never actually receives the acknowledgements, thus retransmits the data in packet 26 (for example) repeatedly.

answered 12 Mar '15, 17:14

Quadratic's gravatar image

Quadratic
1.9k6928
accept rate: 13%

Interesting. I will look into that. Here's the server capture. It was capture at the same time as the previous client capture. https://www.cloudshark.org/captures/306defab8f1b

Thank you

(12 Mar '15, 17:27) alexltk0506

I wonder why then this only happens when connected to one ISP, and not on the other ISP. I will probably need to get the ISP involved again, but I don't know how much help they will be.

(12 Mar '15, 17:40) alexltk0506

Have these packet captures been modified? I notice a few things:

  • If you compare the two, they have no common IP ID field values. The IP ID field would normally be preserved across the network (routers would not edit that). That makes me think either they've been modified as traces or there's something in between these two systems beyond just IP routing infrastructure.

  • There's about a ~2 second time gap between the trace files. For example, tracing by TCP source/dest port combination I would call the packet at 16:58:20.487396 in one trace to be the same packet as 16:58:22.419223 in the other trace.

Having said that, it still really looks like packets aren't getting to the server. One simple way to show that is to filter on "eth.addr == 00:e0:4c:20:57:55" with both traces merged together, hit ctrl + shift + m (to mark all packets from one side), then apply the filter "tcp.srcport == 1121&&frame.marked==1". Note the number of packets it displays (either at the bottom bar, or Statistics > Summary). Now, compare that result to the display filter "tcp.srcport == 1121&&frame.marked==0". That should give the same packet count if these really are apples-to-apples, but they do not.

The key is that the server trace isn't seeing all of the client-originated messages. There's packet loss in that one direction.

(12 Mar '15, 18:05) Quadratic

I should note, if you do have access to unmodified traces I suggest using the IP ID field when the two captures are merged into one file. If you right-click the ID field of the IP header itself and "apply as column", sort by that column. You should see each IP ID field appear twice (the same packet, one from client trace, one from server trace), where any IP ID field you see referenced only once would indicate a packet that left one system but was not received by the other.

(12 Mar '15, 18:17) Quadratic

Yes, the trace files were modified in order to hide client IP. The IP ID are consistent on both client and server trace files except when there's packet loss. I would see one IP ID instead of two. When I captured on both client and server I also captured from the firewall. The firewall capture shows all packets sent from the client including the ACK packets from packet 27 that the server didn't see. The packets are getting to the infrastructure, but not the server. It has also been brought to my attention, packets from client to server have different number of hops. Comparing the TTL of both client and server trace files, packet 1, for example uses 11 hops and packet 2 uses 14 hops.

(13 Mar '15, 02:36) alexltk0506

0

Problem solved. The client has a firewall policy for all web sites on ISP-1 and ISP-2. The policy uses ISP-1 as the main link, but if a site that is on ISP-2 comes in it should use ISP-2 link, but those sites were getting stuck/delayed trying IPS-1 first. I created a policy strictly for web sites on ISP-2 and set ISP-2 as the main link and everything is working fine.

Thanks for all your help.

Best regards,

Alex

answered 13 Mar '15, 10:50

alexltk0506's gravatar image

alexltk0506
16236
accept rate: 0%