Hi everyone, I am performing a file transfer using some off the shelf COTS TCP libraries in a C# program. I am transferring a 5Mb randomly generated file. The transmission path traverses a virtual NIC (TAP/TUN), over a radio, and back out another virtual NIC. Sometimes the file is transmitted correctly but sometimes the file is missing one contiguous chunk of data. It could be in the middle somewhere, and can be of varying size. The example I am using has a missing chuck of 16141 bytes. I have wireshark captures of both the sender and receiver. I would love to know if this might indicate an issue with the underlying virtual NIC and radio hardware, or something that happening in the stack. Here are the notes I have taken regarding my observations of the issue and the wireshark captures. I hope to be able to post the captures on here once this gets posted. Notes: The file is contiguous until: 46 46 41 85 C4 ED 24 9D @ Offset 1883080, - Packet #1799 in wireshark sender Capture (01D0) - Note: This is a smaller packet than the norm. THe packet data is truncated right after the 9D Then... next bytes Should be: 3B 89 04 83 B6 34 9C 14 @ Offset 1883087 - Packet #1799 in wireshark sender Capture (01E0) - No Receive PAcket with that data but in the received file it is: A3 D5 0F D9 10 B6 43 F3 @ Offset 1883087 - Packet #1814 in wireshark Sender Capture (0030) - Packet #6804 in Wireshark Receive Capture (0030) This received string of bytes is found in the original file @ offset 1899228 1899228 - 1883087 = 16141 bytes of contiguous missing data File size 5242880 byte (Sent) - 5226739 bytes (Received) = 16141 bytes, so this lines up too. Is it looks like just his one big chunck is missing. Everything else, before and after, lines up. asked 20 Jun '13, 10:58 Tom Kuhn |
3 Answers:
It the transport is TCP and there is data missing in the middle, then it must be the application. TCP is a connection oriented protocol with mechanisms to make sure that data is received properly. If there was data lost in transport or the vNIC, then TCP would make sure it's retransmitted. Or the TCP/IP stack is broken... but then you would have other applications having problems too. answered 20 Jun '13, 11:05 SYN-bit ♦♦ Come to find out I can not add attachments. (20 Jun '13, 11:10) Tom Kuhn @Tom Kuhn, As long as the contents aren't sensitive (and as you were going to post them here they shouldn't be) you can upload the capture to CloudShark and post a link to the capture back here. (20 Jun '13, 11:36) grahamb ♦ That interesting. Since I do not have access under the hood of the actual socket , it is hard to see. My application is not catching any exceptions and there do not seem to be any error events. As for other applications, this file tranfser SW is the only network application running on each of these laptops. The laptops only communicate between each other. One other thing of note is that the sender transmitted 15 packets (That were not recieved) befor this final packet that started the continuation of incoming data (247 Bytes) to the receiver... 1814 14.981036000 20.0.1.22 20.40.1.12 TCP 301 [TCP Window Full] syscomlan > zenginkyo-1 [ACK] Seq=1899229 Ack=1 Win=17520 Len=247 (20 Jun '13, 12:08) Tom Kuhn |
Are we looking at the same tcp session here? The timestamps differ but there are more curiosities. Sender Trace: The client is running a windows platform (TTL=128) and so is the server at an CaeElect MAC address. The client port number is 1065, the offered windows size by the client is 16384, the 'MSS' option is 1460 (02:04:05:b4), followed by NOP,NOP,SACK Receiver Trace: The client port number is 2025, it is 5 hops away (TTL=123), the advertized windowsize is 32768, MSS option is now in little endian: b4:05:04:02, sack is removed and 2 EOLs... So if the traces were really taken at the same time, it must be a proxy sitting at the CaeElect MAC address that is terminating the tcp session towards the client and starting a new one towards the server. And it is probably this proxy that is dropping the packets. answered 20 Jun '13, 14:20 mrEEde2 edited 20 Jun '13, 14:26 The send and receive captures were running on separate machines at the same time. I guess you could say there is a proxy of sorts, but I am not sure what it really is... More of a Serial to TCP Converter... Since this data goes over a UHF radio, a third party hardware and software solution is used. Take the incoming capture. The hardware acts as a bridge between the radio's serial interface and the computers USB (I think of it as a USB Serial Interface) In addition, there is a driver with a TUN/TAP virtual adapter. When Receiving, the thought is... Serial data from the radio to the Hardware -> convert to USB -> Laptop USB -> Some stack coverts De-serializes Serial data back into TCP data -> pushes out the stack. Unfortunately we do not have access to the source for this, (20 Jun '13, 17:52) Tom Kuhn I have finally had the chance to perform some additional captures. I am looking through the first new capture and wanted to report what I am seeing in this one. I would like some thoughts to see if this is is another manifestation of a possible bad TCP stack design. In this instance, there was no missing data chink, but there was an error in some of the data in a single packet. I did a small, random data, file transfer of only 100kB. This time, all the data transferred except that one small portion of data (15 bytes), inside a single packet, seems to have changed. The entire rest of the files is identical. Sender File : http://www.cloudshark.org/captures/63e4629d73b1 Receiver File : http://www.cloudshark.org/captures/5b0a0c21a339 It is packet #45 on the sender file and #206 on the receiver file. Sequence number 39421. If anyone could shed light on this, that would be helpful. (03 Jul '13, 06:32) Tom Kuhn |
modified payload bytes are never a good sign ;-) So, now you need to find out which component in your setup modifies the bytes.
There are several bad signs in your capture file, as mentioned by others.
Looking at that list, I think it's save to say that the TCP/IP implementation of one of the components it 'kind of broken' ;-)) It could be the virtual NIC (rather unusual), it could the USB driver or the RF radio part. Hard to tell... Regards answered 03 Jul '13, 07:00 Kurt Knochner ♦ edited 03 Jul '13, 07:04 Thanks Kurt. I have a few different vendors to get in touch with it seems. Unfortunately I do not know what is under the hood at the USB to TCP side of things. But this does provide a great amount to additional help for someone like me who could tell the difference between TCP window Size and a glass window size. (03 Jul '13, 12:42) Tom Kuhn
I'm not sure, but I believe one is measured in bytes and the other one in inches ;-) (03 Jul '13, 12:53) Kurt Knochner ♦ |
Ah Thanks! Good to Know...
Sender Capture: http://cloudshark.org/captures/45f3ed36d2a5
Receiver Capture: http://cloudshark.org/captures/1a8fff96948c
I moved your comment to a more appropriate place under your question.
Here is a graphical representation of whats going on. All traffic on the linux virtual NIC is forwarded out of the physical port (and Vice Versa).
Thanks for providing the picture. The data flows through 2 cascaded TCP connections, both of which seem to be happy with the amount of data flowing as they both orderly terminate with 2 FIN packets that get ack'ed, which means all lost packets have been dutifully retransmitted. That leaves only one conclusion: The missing data actually gets lost in the IP stack 'in the middle' that is receiving the data on socket_left (from 20.0.1.22 (port 1065) and is sending the data over socket_right towards 20.40.1.12:5020 (using client port number 2025). Whatever this TCP/IP stack is, it doesn't seem to be very mature given the wrong 'endianess' of the MSS option in its SYN packet.
Thank You. Your thoughts echo mine also. Thank you for your time in looking at this.