We are having random connection crashes on one of the http clients (X.X.X.X) connecting to an app server (Y.Y.Y.Y). There are 2 ASA firewalls in between performing NAT. Data flow is normal for a while but suddenly it stops. Packet capture shows that from client's perspective the connection was never terminated. Here is what I found so far:
Has anyone experienced this kind of mysterious RST's earlier? Is it possible that the firewall would send a RST in response to FIN, at the same time not forward it to the other host? Why is it not triggered for all connections except for few connections (a couple of hours after the app is restarted) asked 13 May '13, 21:07 xkgt |
One Answer:
Sounds to me like a timeout on the session on the ASA firewall. As the ASA keeps state of each session, it also needs to manage these state records in order to not fill up the tables. This is why every firewall will have a timeout associated with each session. When a session is idle for too long, it will be removed from the session table. The timeout can be configured (either globally or port specific). What is the time difference between the last packet of the client and the FIN packet of the server? You can either solve this by increasing the timeout on the firewall or you can use TCP KeepAlives on the client and server to make the session on the firewall not go idle. answered 13 May '13, 22:49 SYN-bit ♦♦ |
That was my first suspect too, but the whole stream lasts just a couple of seconds. There is 1 second gap between the last bunch of ACK's received by the server before it decides to close the connection by sending out FIN,ACK.
Also, isn't the ASA expected to gracefully close the connection by sending out RST to both sides. It beats me why I didn't get it in client side.
And this RST doesn't happen for every connection. There were 2605 new connections made in 10 minute window, 5211 FINS (note 1 missing FIN back from client) and only 1 RST. The client side capture doesn't show RST at all.
Say the firewall does send out this RST in both directions, what happens if this packet fails to reach the client?
--edit-- Merged two comments to one
OK, if it is not the timeout, there must be something else "special" about this session. That would require further analysis of the whole tracefile. Are you able to share the file on www.cloudshark.org or does it contain sensitive data?
What does the firewall log say? Can you raise the logging level? Are you running the latest firmware?
When a session has timed out on the ASA and it receives a packet that matches a flushed session. It is not capable to send the client a RST as it does not know who the client is (that information was kept in the session table entry that has been flushed).
I knew I would have to dive in to the firewall logs. I was just post-poning the inevitable looking for other answers. The firewall is maintained by a 3rd party and it doesn't even have logging configured. Need to persuade them turn on logging and setup a log server. I will post back here, if there are any findings.
Well, the traces should be able to tell you more as well. Can you scrub the ip addresses and payload (if necessary) and share (either public or privately, see my profile for address)?