Hello people, I have a strange issue. I have an apche mod_proxying an application server. It works, but at any time randomly I get a connection refused from the server resulting a 503 to the user. It doesn't matter the traffic size, it can happen with 1 user too with just a request to a page. It happens like one request every 10000. Wiresharking from the client I see the following, I send a SYN, then a TCP Retransmission after 3 seconds, and suddenly RST,ACK.
From the server, there is no connection seen. There is a firewall in the middle separating dmz from private network. Supposedly that firewall doesn't filter packets, just let it through. The application server is working fine. OS doesn't show transmission error. It doesn't occur on full GC. Can anybody give me a hint what else can I look? Thank you. asked 17 Sep '13, 15:25 Meli |
One Answer:
If someone have this problem, be sure to check the ethernet, not only ram, cpu, ulimit, tcp values. In our case we missed ifconfig, as networking guys said they didn't see anything strange. However, surprise! we found RX error in frames which are CRC failure due to something wrong with the network interface. Thank you. answered 18 Sep '13, 16:03 Meli |
Can you provide the actual packet capture including the full TCP headers of these messages (http://www.cloudshark.org/)? Also can you capture between the server and the firewall? Firewall in between always makes me suspicious, especially with a client-side capture (where the RSTs are not necessarily originated from the server). It's possible some kind of logic on the firewall (application-layer rules, or even session limiting) could be causing this. It is odd to see one not responded to and an RST to the other, though.
I'll try to cut/filter and post it, the file is huge. We capture the server tcp too, in that moment we should see an incoming connection with port 50854 but we couldn't find it. As I read, a retransmission would continue 3, 6, etc, seconds, and we don't see a pattern either because of the reset. Me mitigate this issue by decreasing the retry=0 in the mod_proxy, but it's not the solution. However, once in a while a 503 still occurs.