This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Lots of retransmissions and out of order frames

I have taken a sniffer trace today of an entry point to a NetApp SAN. Within the trace, we see an extreme amount of Out of Order frames as well as TCP retransmissions.

Since there are multiple devices talking to the SAN, how should I approach this to determine why the retrans and OOO frames are occuring?

Thank you KMNRuser

retransmissions

asked 20 Oct '10, 11:21

kmnruser
26●6●6●8
accept rate: 0%

2 Answers:

First of all, if you're capturing a SAN (and a professional device like a NetApp solution, as opposed to low end SOHO boxes) chance are, that there were less retransmissions than you think, because your capture might have dropped lots of frames for performance reasons. In fact if you don't have a real monster as your sniffing device you will most certainly have a lot of drop outs. Those are frames that the capture device could not record because it couldn't write them fast enough before the next came in.

Wireshark looks at sequence numbers to determine out-of-orders and retransmissions, so if you have lots of drops you will get lots of those messages. A good way to determine if there really was packet loss or just a dropped packet is to look at acknowledges. If you see that a packet was not seen by Wireshark but an acknowledge for it arrives within the RTT of the connection you probably experienced a dropped packet.

If you have real retransmissions and out-of-orders you should try to determine in which direction they occur - are the packets lost on their way to the SAN or to the client? Does it affect one communication or many? What communications have the highest count of lost packets (easy to determine: filter on tcp.analysis.lost_segment, open Statistics/Conversations, select TCP tab and check "limit to display filter", then sort by packets).

Ususally (if there are no capture drops) my money is on the typical situation where the SAN attached to a Gigabit (or 10G) Line is transmitting lots of data towards a client on a 100MBit link, resulting in massive congestion of the poor access switch that has to break the 1G/10G line down to 100MBit and gets slammed by the sheer amount of data the SAN fires at it. Meaning: packet loss towards the client, often at a ratio of 60-90%.

answered 20 Oct '10, 13:45

Jasper ♦♦
23.8k●5●51●284
accept rate: 18%

Did this just start happening? Were there any changes made to your network? Are the retransmissions happening with all of the IP addresses communicating with the SAN, or just a few. I would try to isolate where the problem is in terms of a switch or router. That is, are the devices that are serving the storage on the same switch as the SAN, or are there different switches involved?

It could be something simple, such as a NIC in the SAN not connected at the proper duplex and speed, a bad NIC in the SAN, a bad switch port, bad switch, or a switch that needs to be rebooted. Do you have a diagram of your network so you can start looking for components that are common to the devices that are experiencing the retransmissions?

answered 20 Oct '10, 13:54

robert obrinsky
31●1●1●3
accept rate: 0%