I have a file server/domain controller that is painfully slow to install applications from within the same network segment. Running packet captures on both the server and a workstation, I get a flood of NBSS Continuation Message and TCP Out-Of-Order messages in Wireshark while I try to run an application install from the server on the workstation. I'm not the best network analyst and really don't know where to go from here but clearly there is a problem. I run the same install on another workstation in another location on our network from another server, capture the same traffic and I do not see any of the NBSS Continuation messages and only a handful of Out-Of_Order messages which appear to be normal. The install kicks off within seconds at this location while the troublesome location takes up to 5 minutes. Any help would be awesome and I can give more details upon request! Examples:
asked 09 Dec '15, 12:08 kmart472 edited 09 Dec '15, 12:54 sindy showing 5 of 7 show 2 more comments |
One Answer:
OK, so the answer is yet another time "broken antivirus/security software causes a mysterious misbehaviour of the system" (which is several severity steps above the other popular one, "some kinds of antivirus/security software prevent Wireshark from capturing traffic"). Details available in the comments to the question. answered 10 Dec '15, 13:45 sindy |
Please publish the captures somewhere (like google drive or a similar service) and provide links to them here, or use cloudshark. You cannot expect useful advice based on text dump of a few packets' headers.
If you are afraid of leaking sensitive information, try using TraceWrangler, but it is possible that SMB level information will be necessary to identify the issue.
When you compare the captured files from the server and the client, can you see any differences?
You may also want to look here whether it wouldn't give you some hint.
Thank you for your reply. I was unaware of the procedure of sharing information here but here are the links to the captures, named accordingly.
When I compare the two, I see the same captures but I'll go back to when I said I'm not the best analyzer...
Client: https://drive.google.com/file/d/0B7UB6_r51Gd5NzVfYnhoVTEwOHc/view?usp=sharing Server: https://drive.google.com/file/d/0B7UB6_r51Gd5UjZJMWNnZEdIaXc/view?usp=sharing
Just some points, not an answer:
tcp out of order messages indicate (at least in your case) packets which (at least according to the contents of the capture) have arrived, for the first time ever, later than packets which should have normally arrived after them. As an example, we may take the sequence starting from frame 141 in "client.pcapng". This packet is the last "normal" one. The following one in the same direction of the same session, frame 144, is marked with "previous packet not captured" because there is a gap between the sequence number of the last payload byte in packet in frame 141 and the sequence number of the first payload byte in packet in frame 144. This means that the packets carrying the payload bytes in between have been lost. As the ACK to frame 144 still acknowledges the sequence number of the last payload byte of frame 141 (and also informs the sender about reception of contents of frame 144 using SACK (SLE and SRE fields)), it is clear that the loss really happened and that the packets were not just missed by Wireshark capturing process. Now based on the information in the ACK of the following packets in opposite direction, the sending side starts retransmitting the missing part of the stream, but as Wireshark can see those (actually retransmitted) packets for the first time, it does not mark them as "retransmitted" but as "out-of-order" ones. This goes on until the gap is filled completely.
when you look into the server capture, you can see there that the same gap is visible, which means that the packets have not been lost on the network between the two machines but already the sending machine has not sent them actually. That is not a common thing to be seen.
look at the timestamps. While it took the sender under a millisecond to retransmit the missing part, the gap between frames 141 and 144 (where the missing part was supposed to be transmitted) is 200 ms long, but it actually took the recipient those 200 ms to send the first ACK to frame 141. Normally this would mean to me that the sender "thought" that the packets we can see to have been lost were sent but after a while it stopped sending until it'd get at least a single ACK. However, in this case, the sender's reaction to first packet with ACK after the 200 ms pause was to send the next packet in its queue instead of retransmission of the first unacknowledged packet. This seems to me as if the sender has lost the MAC address of the destination (to be continued)
but in such case, I would expect to see some arp requests.
Time gaps like this one explain the slowness of file transfer, but now it's time to find out what causes them. So questions:
are the machines physical or virtual?
did you capture directly on the machines involved in the communication or on some other machine connected to SPAN ports of a switch?
do the machines have only the ethernet ports on which you were capturing?
Thank you for such a detailed, investigative reply to this issue! I greatly appreciate it!
Answers: -These are both physical machines with Cisco switches in between them. -I captured directly on the file server and a client workstation. -The server has a second NIC that is disabled and not teamed with the primary connection.
So the server is not retransmitting the packets because it thinks the client got them due to the time the first ACK took to send back to the server? Would adjusting frame size on the server do any good or would you suspect a network issue?
Thank you again for the time you are taking with this!
This is something I cannot explain exactly, but the two explanations which seem most likely to me are:
that you have some kind of anti-virus or firewall software which is rotten and damages some tcp transmissions.
that the network card driver itself is rotten.
Wireshark (actually, WinPcap) gets the packets about to be sent from the driver, from a place where the higher software layers gave them to the driver for delivery. So if Wireshark does not get the packet, it can be a problem of the capture, but in this case, the whole exchange confirms that the packets really haven't been sent.
So I'd recommend to identify any firewall or anti-virus software running on the machine which has sent the frames 141 and 144 as mentioned above (I don't know the mapping of the IP addresses to the "client" and "server" machines), deactivate it temporarily and try the installation again while capturing at both ends just for the case.
Yes, the server is not retransmitting the packets because it wasn't informed that they have been lost; the weird thing is that after the server gets the ACK informing it that the packets have been lost (low seq), the first packet is sends is not a retransmission of the lost packets but some packet following them, and only then it starts the retransmission. It may be related to the issues above, if deactivating the software won't help, and if the network card driver is up-to-date, come back.
Thank you so much Sindy! This appears to be an issue with a particular version of Symantec Endpoint Protection. Same exact model servers in other offices with different versions of SEP clients seem to run fine so this was rather confusing, yet so obvious at the same time. Thank you again for all your help! I learned a lot about packet capturing from you!