This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

TCP connection caught in retransmission

0

Hello

I'm working on a project where we have an embedded unit using Lwip 1.2. It's old, but it has been working ok for us for quite a while. However, recently we have run into a problem with lost connections. From the log, it seems that we start to get retransmissions on TCP. Those retransmissions happen again and agin without being resolved and we eventually get a problem with buffer allocation on the embedded unit which then hangs the entire stack on the unit.

I'm no TCP expert but looking at the first few lines in the log file found Here is seems that LWIP in the embedded (192.168.0.100) never gets over a lost sequence number (1876719045). From my understanding, the PC (192.168.0.1) resends that sequence number which turns out to be a pure ack, but LWIP won't let that go and keeps insisting on getting an ack for 1876719045.

Can someone please confirm my analysis, or correct me if I'm wrong. Does 192.168.0.1 do something wrong which I don't realize. Or do the retransmissions later on stem from something else?

Thanks a million :)

asked 01 Apr '16, 07:02

FredrikT's gravatar image

FredrikT
6112
accept rate: 0%

You should take a trace as close as possible near the device 192.168.0.1. Well out of a quick look I would say sommething goes wrong with the devive 192.168.0.1 maybe the app or the OS...???? Buffer shortage is a possible cause, too. If I were you I would investigate that device.

At the end it seems not to answer the ARP Request, which would say something goes wrong with this device or we just didn´t capture them.

(01 Apr '16, 13:17) Christian_R

One Answer:

0

It looks like your capture was taken somewhere between the embedded unit and the PC. Based on your capture, I see issues on both sides, but more on the embedded unit. Early on, the PC is not responding to simple SYN requests. But later in the capture, the embedded unit is not responding to simple ACKs.

Is there any non-switch device between the two machines?

Based on IP, they seem to be on the same local network. However, if that's the case, the response times should be fairly quick. There should not be multi-second response times from the embedded unit if it is local to the PC.

I think the buffer error and the hanging of the unit is evident at the end of the capture, when the PC is sending ARP requests and not getting any replies. I think that by that time, the embedded unit is completely hosed and cannot respond. I also think that this is due to the fact that it's embedded, and does not have the appropriate level of buffer storage to process all those retransmissions.

So I would do a couple things to troubleshoot this further:

  1. Capture data on or as close to both the embedded unit and the PC. This way, you will be able to verify whether those high response times from the unit are not because of the connectivity between the two, but instead are from the unit itself. And if that's the case, it likely is some OS or application-related issue on the unit.
  2. Take a look at the performance of any network devices between the PC and embedded unit. You want to verify that no such device is causing a degradation in the performance.
  3. Verify that there were no recent changes to the embedded unit.

Let us know what you find.

answered 04 Apr '16, 12:09

jeantunis's gravatar image

jeantunis
213
accept rate: 0%