This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Exchange over VPN suddenly takes a nose dive, packet loss but where

0

Background:

We have a tool that monitors application performance across our network. On Friday 28 Mar we notice a deviation from baseline of the retrains time from 28 ms to ~200 ms for our VPN users using Exchange starting at around 2300. Then on 31 Mar at 0951 it jumped to 1790 ms, needless to say, that’s when our users started complaining about poor email performance. Of note, on the 28, we attempted an IOS upgrade to our core routers but it was rolled back at 2200. We do not host our own exchange servers, they are on a separate network. But, normal LAN users do NOT have the issue that VPN users do, nor did their performance take a hit, it is just VPN users. I am hoping some of the experts will spot something.

So, a user coming in over VPN comes across the internet, connects to this other network, which connects to our VPN concentrator which connects to one of our core routers which in turn connects back to the other network.

I’ve uploaded some packet captures for your viewing pleasure. The client IP is 192.168.1.1 and the server IP is 10.10.10.10. We were seeing some errors on the VPN concentrator and that was replaced. I have only one capture from home before the VPN was replaced.

One thing I noticed in that was that my MSS was 1160. I checked my MTU and sure enough, my physical NIC was set to 1300 and my VPN NIC to 1200. I do have a trace after the VPN change where I had changed my physical NIC to 1500 and my VPN NIC to 1400. With it set that way I can do a ping –f –l 1372 to my exchange server. When I try and connect my MSS is set to 1360 and the server replies with 1160.

All the laptops do have HIPS (McAfee), SEP (Symantec) installed.

Also of note, we do use asynchronous routing.

So my questions are: 1. We see packet loss, what’s the source, our network or the one hosting the Exchange servers? 2. Does MTU come into play? 3. I see a lot of retransmissions coming from the server, I do have a capture that was done at the core router but I couldn’t download it at home, it kept failing as it was a 100 MB file, I will try and get that uploaded tomorrow.

Edit 7:

VPN Anon traffic

Just looking at this trace. If I use expert infos and go to the first "previous segment not captured" (packet 197) and look at 196, I see that it seems to be 2421 that has gone missing from stream 4. It looks like the time from the first loss to the last Dup ACK is 538 ms.

alt text

If I do a display filter of tcp.seq==2421 && tcp.strem==4 then I see one packet, the re-transmit so it appears I am downstream from the packet loss.

alt text

I do have a capture done at the core router and I see the same pattern. Previous segment not captured, followed by a recovery process (Dup ACKs) and finally a re-transmit. And I only ever see the re-transmit.

So to me, given Laura Chappell's wonderful book, that indicates that even at the Core router we are still downstream from the loss and I need to yell at the network guys who also provide the exchange email service? Does that seem fair? Or am I totally lost and wandering in the wilds?

asked 27 Apr '14, 23:25

RTJ10's gravatar image

RTJ10
16449
accept rate: 0%

edited 28 Apr '14, 17:16


One Answer:

0

The VPN client is not the problem here. You probably have the same MTU/MSS values on the VPN gateway, so I would not change them.

Are the packet captures from the client PC?

If yes, every packet from the server arrives at the client twice and it's not a retransmission, it's a duplicate.

Move your capture point closer to the server e.g. a switch or router on the internal side of the VPN gateway.

answered 28 Apr '14, 01:57

Roland's gravatar image

Roland
7642415
accept rate: 13%

Yes, the capture was done at the client. What would cause what you are describing? A routing issue/loop? A network diagram would look like: Client PC -> VPN -> Core Router -> Black Hole -> Exchange server. Inside the black hole there's all kinds of stuff, FWs, IPS, routers, switches etc, none of which I be able to do a capture on or between.

(28 Apr '14, 04:13) RTJ10

I can probably arrange for a capture done at one or both of our core routers, but that the extent to which I have any access to networking devices through which the traffic must flow.

(28 Apr '14, 04:47) RTJ10

I was doing some additional reading, and some of it seemed to indicate that this might not duplicate packet issue. I ran editcap -d on one of the files and it removed 1136 packets as duplicates with the window of 5 packets default setting.

(28 Apr '14, 06:55) RTJ10

If you took the packet capture on the client without mirroring and no VLANs then I would say that the duplicates are genuine. Does the original packet capture before sanitizing it look the same? Packet captures from the core routers would be nice.

(28 Apr '14, 11:47) Roland

I do have one from core router but can't share it as I can't anon it for some reason. I was checking and it seems that the IPID and TTL are all the same, doesn't that indicate duplicates?

(28 Apr '14, 16:53) RTJ10

Do you see duplicate packets on the core router as well? Check the TTL, IP ID and VLAN ID.

(29 Apr '14, 04:35) Roland

@Roland Yes, when done at the core router, there are duplicates, they have the same IPID, TTL etc. I know some Juniper devices have something called packet protector that clones packets in cases where high packet loss is expected as a way to deal with it. Is it possible that there's something cloning the packets? I do know what when I do a capture say, of a plain file copy between my workstation and our datacenter, I do not see the dups.

(12 May '14, 11:53) RTJ10

I don't know if something is cloning the packets, but if all other services are working fine with the VPN connection apart from Exchange, I would also look at the server. If the packet capture looks fine there you can pinpoint the issue to the black hole.

(14 May '14, 09:43) Roland
showing 5 of 8 show 3 more comments