I'm trying to diagnose some issues between an Azure VM and a Google Compute Engine VM. Every now and then, the Azure server reports it cannot connect over HTTP to the GCE machine. There's no errors logged on the GCE machine.
I ran a PCAP for a while, and if I filter with _ws.expert.severity >= note, over 2% of all packets are flagged. 0.1% of all TCP packets are flagged as a retransmission. Apart from that, it seems that there's a repeating pattern of "TCP Previous segment not captured, TCP Dup ACK, then TCP Out-Of-Order". I see those groups of 3 packets repeated all over, with apparently no real effects, like increased http.time.
Does this sound typical? Could the fact that it's a VM under KVM on GCE be causing some confusion here?
asked 11 Feb '15, 16:52
in an ideal world there should be zero, but in the real world there are always errors, no matter what type of connection it is (btw. you did not mention that). Without knowing the link type, I'd say 0.1% retransmissions are more than O.K.
Regarding the rest of your reported problems:
To troubleshoot your problem, I suggest to run dumpcap with ring buffer files (see man page of dumpcap) and with a capture filter for the destination IP address and port 80. Then monitor the error logs of the Azure server (with a script) and as soon as you see the error messages stop dumpcap. Then take a look at the last capture file and try to find failed TCP connections (RESET, etc.) and/or HTTP error messages.
answered 12 Feb '15, 06:15
Kurt Knochner ♦