We have a couple of mitel phones connected to a brocade switch stack having issues. The phones are resetting and dropping randomly. When I do a wireshark sniff on the traffic between the PBX and phones I see lots of TCP DUP ACK messages, which seems to be corresponding to phone outages. Anyone have ideas what could be causing it? Capture here: https://www.cloudshark.org/captures/5ea68460fb66 I stripped it down to just the traffic between one set and the PBX. Multiple random phones seem to be affected. The phone and PBX or connected to the same L2 switch. asked 26 Sep '14, 16:09 webman |
2 Answers:
How was this capture taken? I guess via SPAN port? Did the phones reset during the time of capture? I'm asking because there is no TCP session reset in the trace. If you filter for "tcp.stream==1" you'll see that there is a pattern of of duplicate ACKs with suspicious delta timings of almost 10 seconds for every second ACK. Maybe this is some weird way of the devices doing a session keep alive. The other stream ("tcp.stream==0) has some curious retransmissions, starting in frame 126. Curious, because they start coming in 48 seconds (!!!) after the original packet (which does not get acknowledged for some reason), which is a retransmission time out I have never seen before. From what I see the TCP stacks of the devices do strange things - if the captures where created on a SPAN port my next move would be to capture the PBX with a real full duplex TAP, because when things are weird you need to make sure it's not the SPAN session that is responsible for some of it. answered 26 Sep '14, 17:48 Jasper ♦♦ |
Sorry Jasper, but I don't see the 48 seconds retransmission timer and I don't agree with the TCP stream 1 is an idle connection with the PBX sending zero length keepalives so all those duplicate acks are a false alert and can be ignored. The phone is acking with tcp.ack 2 but PBX keeps on sending empty packets with tcp.seq 1 The problem starts in packet 125 with the phone sending 176 bytes which doesn't get through so slow retransmission kicks in with increasing rxmit intervals (0.239 - 0.719 - 1.679 - 3.593) So this is an indication that we had a connectivity issue in both directions between 16:03:38.188 and 16:03:41.782 UTC. While this is not nice on a local LAN it is certainly not a reason to 'reset or drop' the phone. answered 27 Sep '14, 06:14 mrEEde If its a connectivity issue in both directions and its on the same switch, then that would have to be a overloaded port on either the PBX or phone, or a problem with the L2 switch. Correct? (27 Sep '14, 06:53) webman Right @mrEEde, I overlooked that packet 112 has an empty TCP payload, I just looked at the sequence number - it was probably just too late in the night for me :-) (27 Sep '14, 09:13) Jasper ♦♦ |
It was done with a span port. Will try a tap and see if the results change. Thanks