First let me state I an extremely new to networking. My situation is I am using an embedded ARM device with some custom hardware and Ethernet driver chip and a third party TCP-IP stack as well as a third party SMTP client application for that stack. My ultimate goal is just to be able to send emails. I was experiencing previously odd behavior where connecting to a 'hostmonter.com' email server and sending emails to myself would work sometime and not others. I started my search by forgetting SMTP and just issuing a TCP connect sequence with the mail server. I saw similar behavior where it would work fine once but after resetting my embedded device I started getting lots of TCP retransmissions and out of orders warnings and other oddities. I setup hMailServer on my laptop to take 'hostmonster.com' out of the equation but proceeded to get the same TCP re-transmissions and out of orders warnings and even some duplicate ACKs. The device always connects to the server but it just seems to me that for a hardwired connection all on my desk I should be experiencing 0 re-transmissions or other oddities. But is it possible that having a slow (<50MHz) ARM processor could be the cause of those re-transmissions etc when connected to a much faster server and much faster internet in general? Here is the condensed log from wireshark: http://lookoutportablesecurity.com/tcp.txt asked 30 Apr '14, 14:45 wes000000 |
3 Answers:
I don't think that these are real re-transmissions but just traced twice. Looking at the delta times of the suspected packets - they 'retransmissions' occur in the same ms.
I suggest yo look at the ip.id of those packets to verify whether it's the same packet traced twice or a real re-transmission answered 01 May '14, 00:50 mrEEde |
I guess your ARM processor might be too slow when talking to your mailserver. If you look at packet 170 you see that the mail server sends "S: 220 WESLEY-LAPTOP ESMTP" to the client", but there is no acknowledge or answer coming back. I guess this triggers a retransmission after 200ms in packet 171, because the server assumes the packet 170 got lost since it didn't get acknowledged. The only reason I can think of in a network as small and simple as the one you're describing is that the client is having trouble answering in time. The same happens again in packets 1086 and 1087, again with over 200ms delta. So again, my money is on the ARM processor being too slow (or the code not fast enough, however you want to look at it :-)) answered 30 Apr '14, 16:54 Jasper ♦♦ |
As @mrEEde indicated, there are duplicate frames in the capture file coming from 10.0.0.65 (the SMTP server). I don't think they are re-transmissions. I believe they are just duplicates due to the way the capture file was taken (probably on the SMTP server itself or on a mirror port of the switch duplicating the server traffic). I believe the capture file was taken on the server, because of the duplicate frames 125,126. Frame 125 was captured before padding (on the server), frame 126 was captured after padding.
So, those duplicate frames are not going to cause a problem on the embedded device, because there are no duplicates (re-transmission) on the wire. After you remove the duplicate frames, you'll get this:
Everything is well until frame 173, where the client ACKs frame 170. But then, the server does not answer, which makes the client close the connection in frame 614. The same thing happens in the second attempt.
According to the hint of @mrEEde, the client should send the HELO/EHLO after it received frame 170. However, it does not. It simply ACKs the data in frame 173 and then closes the connection in frame 614. So, the problem is (probably) within the SMTP library of the embedded devices or the software that uses the SMTP library.
There isn't much we can analyze in the capture file. The best thing is to enable debugging on the embedded device or check the software logic if possible. Regards answered 01 May '14, 16:56 Kurt Knochner ♦ edited 02 May '14, 00:47 the capture file was generated using the same computer the server is running on. Is that bad practice generally? And I will post the pcap file next week (sorry for the delay) but the computer with the server on it as well as my ARM device is unavailable until then. (01 May '14, 20:14) wes000000 1 "Everything is well until frame 173, where the client ACKs frame 170. But then, the server does not answer, which makes the client close the connection in frame 614." Wouldn't the next expected packet be the client's HELO packet? (01 May '14, 23:04) mrEEde 1
Ah, sure. It was very late when I answered the question. Thanks for the hint. So, yes the client should send the HELO. As a result, it's a problem with the embedded smtp client, instead of the server as I said in my answer. Sorry for the confusion. (02 May '14, 00:36) Kurt Knochner ♦ 1
Not in general. Sometimes there is no other option than to capture on one of the involved systems. But sometimes you'll get false results, as in your case. We have had similar problems in other questions. Duplicate frames when captured on the server, but there was no common reason observable (yet). Could be a driver issue or some software on the capturing system (VPN, Firewall, IDS/IPS, Endpoint Security, etc.). Anyway, please see the very good comment of @mrEEde and my comment above. The problem is most likely not the server, but the client, because it does not send the HELO/EHLO. (02 May '14, 00:41) Kurt Knochner ♦ Thank you guys for all the help! We are currently in the process of debugging and working with the client software and are in communication with the support team responsible for the third party smtp client and stack. Thanks again! (03 May '14, 17:19) wes000000 |
200ms delay between two packets is way too slow for duplicates, they usually have a delta of only a couple of microseconds. But without a capture and a description of the capture setup it can't be completely ignored as an option.