This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Dup Acks and retransmission, only when sending particular data

0

Hi All, I'm relativle new here (since I started troubleshooting this problem, I read a lot of posts :)). I would ask you guys if you could offer some suggestions about what might be going on here.

We have an IP VPN connection between two sites, managed by a provider. Recently I configured a backup job (in Veeam for who's interested) to backup servers from one site to the other. But the job would fail after only 256KB. Changing the job to a lower compression setting made it work. I can't remember why I even changed the compression however. Well, I openend a case with the backup vendor, but in the end they had me use wireshark on both sites because it turned out to be connection error. Wireshark shows a lot of duplicate ACKs and retransmissions.

I am 100% sure that all our firewall's are turned of, the provider claims to even not have any (in our path).

But now comes the strange part, as I tried to upload the wireshark capture from our datacenter site to the backup vendor, the upload failed as well. I used wireshark again during the FTP upload to capture, and it shows (as far as I could interpret) the same pattern, a short datastream and then dup acks and retransmissions and a RST eventually.

The provider that manages our IPVPN, also manages our primary internet connection. Both services are delivered through one 1Gbps ethernet uplink (tagged VLAN's). So by uploading the file, I would use part of the same 'physical' route as the IPVPN traffic. To be sure it was not our firewall I circumvented our firewall completely and plugged a server directly on to our internet VLAN and gave it a public IP. Did the upload again, same failure. Trying other files, no problem! We have a backup internet link, from another provider. I changed to that, and all uploads went fine. So problem is definately with our one provider, or so it would seem. But we can't figure out why we have only problems with uploading this file (and with making the backup). I also have been having networkprinter issues, where spooled jobs would just not print in about 10% of cases, also using their IPVPN. Trying to send the capture file via windows file shares over their IPVPN fails. other copies work fine. I have no clue what could cause this.

We thought of mtu sizes, but I can succesfully do a ping -l 1472 -f over the ipvpn. All my servers involved use a MTU of 1500.

But I remember that we changed the MTU of our printserver to as little as 500 bytes, and that was a huge imrpovement (but still no solution). I'm not sure if this is a related problem of course, but it seems so to me.

Below is screenshot of small part of the capture during the backup job. the capture is full of this. alt text

asked 29 Jun '16, 01:07

Tijs's gravatar image

Tijs
6112
accept rate: 0%

I don't dare to post it as an Answer yet, but what you write resembles me of cases where particular bit sequences were causing some network cards or their drivers to err, causing the packet containing such pattern never to get through.

So I'd suggest to take several captures of transfer failures of different files, look at the first packet which could not get through (needed retransmission) in each of them, and check whether you can find anything similar about the contents of these packets.

If you capture at the sending machine, I would also strongly recommend to capture outside of it in parallel (using monitoring mode of a switch or similar means) to exclude an eventual weird behaviour of the sending machine itself.

(29 Jun '16, 04:15) sindy

sindy may be right.

My first thought was there's some other security device (IDS?) on that primary path which is detecting something "suspicious" and killing the connection on you. Obviously this isn't your firewall but maybe something the IPVPN provider provides (as a service)?

Another next set of steps would be to do hop-by-hop captures: capture outside the sending machine then capture outside the first router then outside the next hop and so on. Obviously you'll need your ISP's help...

(29 Jun '16, 06:12) JeffMorriss ♦

Thanks for your comments! I just got of the phone with our ISP, luckily it's not a big company and I spoke to one of their senior network guys. They are willing to help us troubleshoot. Sadly I only have one file (the wireshark capture) with which I can reproduce the problem, which is a weak case of course. 'I can't upload this one file, all others are working fine', luckily they want to help us.

Sindy, great comment about network cards, I troubleshooted from a VM before, let me try a physical machine or laptop next. On the other hand, when I configured that same VM with the VLAN and subnet of our other provider, everything worked smoothly... hmm.

Next step (with our ISP), is to circumvent our hardware altogether, and plug a laptop or something directly into their equipment and try to reproduce.

JeffMorris, indeed, if it aint a networkcard problem, that it must be a IDP right? But the network guy I just hung up with, swears they don't have anything like that. So that's why I logged this question, what else could it be, or is he mistaken and should we look for a firewall somewhere?

Oh well, will let you know what we find. In the meantime, any suggestion is welcome.

(29 Jun '16, 07:01) Tijs

when I configured that same VM with the VLAN and subnet of our other provider, everything worked smoothly...

if it is the case of "poisoning by bit pattern" which I've suggested, part of the destination IP or MAC address may be part of the bit pattern if the physical NIC of the server is affected. But more likely the issue is not affecting your local NIC - whatever actually happens, it may happen anywhere between the sending and receiving application, including the "poisoning by bit pattern".

Sadly I only have one file (the wireshark capture) with which I can reproduce the problem

No, you don't have just this single case - you can capture another unsuccessful synchronisation attempt, and then try to upload the resulting capture file the same way (important: without any kind of compression) and capture this attempt too. If you find that the "real" packet which breaks the original "sync" session is the same one which then breaks the "capture file upload" session when saved inside the capture file (I hope this description makes sense to you), you almost surely know that the poisonous bit pattern exists. Non-compressed pcap files contain bit-verbatim copies of the captured Ethernet frames (minus the leading sync pattern and the CRC).

As @JeffMorris wrote, to identify the guilty box (whatever the actual issue is), you'll have to slice the path by capturing at different points during repeated attempts to upload the trouble-causing file, each time splitting the path into halves like below:

client - box1 - box2 - ... - boxN-1 - boxN - server
1.     ^                                   ^
2.     ^                   ^

Capturing as per 1. should show that the critical packet has not reached the server.
If capturing as per 2. shows that the critical packet from the client has reached the capturing point closer to the server, then 3., otherwise 4. etc:

client - box1 - box2 - ... - boxN-1 - boxN - server
3.                         ^               ^
4.                   ^     ^
(29 Jun '16, 07:54) sindy

Well thanks again for offering your insights. It appears that both me and our ISP are not that provicient in packet capture, so we decided to first try and find the problem thtough trial and error...

I work for an IT company, and the ISP I mentioned is actually our partner, so many of our customers also use their services for internet connectivity. This allowed me to remotely login on a couple of our customers servers, and try the same upload. The upload failed too with some of our customers, but went ok for others. It turns out that our ISP has equipment mainly in two datacenters, and the customers where it failed go over the same hardware as we. Still, the guy was puzzled why in heavens name this one file could not upload.

@Sindy, could a poisenous bit stream also affect cisco hardware? (I mean, have you ever heard of it), and if so, what could we do about it? Now we have narrowed it down, I'm secretly hoping he discovers some IDP feature is turned on by accident or whatever... I'll see if I can ask him what hardware they use exactly.

(30 Jun '16, 00:02) Tijs

I have never heard about a poisonous bit pattern to affect Cisco hardware, but I've also not heard about many other things that exist :-)

My hands-on experience was an ADSL box which could not survive a transit VPN session (reboot within 2 minutes from start of use of each session established from a software client on PC and a remote sever, so the from the perspective of the affected box the VPN session was just a bi-directional UDP stream) and I have read a well-documented case of a particular NIC model on the internet.

If it is confirmed to be a hardware/firmware issue (i.e. not a configuration one), what to do about it depends on the stage of the lifecycle of the guilty equipment and how much the vendor cares about their reputation. Some boxes can be fixed by vendor within weeks while others can only be trashed.

(30 Jun '16, 02:07) sindy
showing 5 of 6 show 1 more comments