I have a web server hosted on AWS sending email via an email server hosted in a separate data center. I can telnet into the email server with no issues from a server outside AWS but when connecting from my AWS web server I have two issues.
webserver trace 10.2.2.2 https://www.cloudshark.org/captures/d17da6802b39 email server trace 10.1.1.1 https://www.cloudshark.org/captures/d6c1e750ddf8 asked 22 Dec '16, 22:23 vze80 |
One Answer:
could you please try after turning off timestamps answered 23 Dec '16, 00:32 soochi In this TCP implementation timestamp (RFC 1323) is enabled. it must be disabled at the OS. Like in Windows with the command -> netsh int tcp set global time=disabled (23 Dec '16, 04:25) soochi Soochi, Turning off TCP timestamps on the server indeed cleared up the issue instantly. on linux the command is "echo 0 > /proc/sys/net/ipv4/tcp_timestamps" To disable permanently add "net.ipv4.tcp_timestamps = 0" to /etc/sysctl.conf. I'm not entirely sure what the problem was but this post and the associated links were helpful in beginning to understand the issue. http://serverfault.com/questions/235965/why-would-a-server-not-send-a-syn-ack-packet-in-response-to-a-syn-packet Thanks. (25 Dec '16, 19:31) vze80 glad to hear that the workaround worked! comparing the traces email-server_anon and web-client_anon, i observed the following. 1, The timestamp set by server is changed by someone in the path before arriving at client. This could be clearly seen in the tcp streams 2-9. All these streams were reseted by client as the syn-ack arrived after 30 seconds! 2, From tcp stream 10-13 the syn-ack arrived after 16 seconds! These sessions were also reseted by client. 3, At stream 14 the syn-ack arrived at 8 seconds, then at stream 15 the syn-ack arrived at 4 seconds and at stream 16, the syn-ack arrived after 2 seconds. 4, At stream 16 the timestamp from server is increased by some device. this only happens to syn-ack packets and not further. 5, it seems that the timestamp modification device resides near to the client. each time the timestamp is modified as per the arrival of the corresponding syn-ack. i measured it a value arround 8000 when the syn-ack arrived around 30 seconds delay. I believe the underlying problem is the delay caused due to some other reason which should be further investigated. i also noticed that the client supports mtu upto 9k, but when the syn arrives at server the mss is reduced (most probably by internet connecting device at client location) Please provide a new trace with timestamp disabled. (26 Dec '16, 07:56) soochi Please ignore my previous comment. the statements are wrong. the timestamp is not modified on the path. the server retransmitts packets with different timestamps and only the last retransmission from each session arrives at the client. anyways it would be interesting to look at a capture with timestamps disabled as the issue does not exist. it also seems the packets are lost from server to client and not in the other direction. (26 Dec '16, 13:57) soochi The client resides on AWS which has a lot of unique networking architecture. Each instance has a public and a private ip address (each with different MTU). AWS also blocks all inbound ports for all services (including ICMP) unless specifically opened up in the security group (firewall). This could potentially cause problems when the MTU size changes in the network path. (http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-drop-issues.html). One of the first fixes I tried (before disabling tcp timestamps) was allowing ICMP packets to the client which had no effect. I'm running another capture now. I'll upload shortly. (27 Dec '16, 08:36) vze80 Successful SMTP/TCP packet capture from client and server Mail-Server (10.1.1.1) https://www.cloudshark.org/captures/b92ad0f5933c Mail Client (10.2.2.2) https://www.cloudshark.org/captures/ca3ef01a45e4 Since disabling TCP timestamps on the AWS client all of the SMTP connection and MYSQL connection hangs seem to have been resolved. There does seem to be an occasional RST from the AWS mail-client in the most recent SMTP packet capture linked here but the connection seems to recover and continue. Not sure what would be causing that. (27 Dec '16, 12:21) vze80 could you please anonymize the capture which then includes the complete TCP header. The packets are cut at 54 Bytes which removed the options. (27 Dec '16, 13:40) soochi Sorry about that. I'm getting "Access Voilations" with TraceWrangler so I was only able to re-anonymize the mail-server capture. Hope that is more helpful. Mail-Server (10.1.1.1) https://www.cloudshark.org/captures/f13d1636d8ee (27 Dec '16, 18:19) vze80 showing 5 of 8 show 3 more comments |
Did you try after disabling TCP timestamps?
Please google to find how to disable it for your corresponding operating system.