Hi All, I'm Facing huge number of TCP Reset Errors at my network. I am not in depth familiar with TCP and its behavior so i've attached a screenshot of packet capture. I want you to analyze it and tell me what you guys observe any abnormalities that could lead to any abnormal behavior. The Problem is our web server getting 6000-10000 hits per seconds, yes it is that high. and around 20000 connection goes to tcp Time-wait state right now. So problems we are facing is Client getting RTO at number of requests (is should be responded under max 3 seconds), we are re-transmitting numerous number of packets and this RESET we are getting from client. although our bandwidths are ideal for this kind of traffic so there is no problem. Here is the packet trace ! At first i want to cut down the RESETs. Please help guys, i need to solve it very soon. if you need anymore information i'm ready to share it .. Thanks Arjun asked 20 Nov '15, 05:42 Arjun Singh edited 20 Nov '15, 05:43 |
One Answer:
From the screenshot alone it seems that the http server assumes that the tcp connection has been already established from the same source socket before, but not properly closed yet, and so it sends only ACK to the received SYN but not its own SYN. For deeper analysis, you'd have to post .pcap (or .pcapng) capture files from both the http server side and from a client PC accessing the web taken simultaneously for some time & including occurrence of the issue for that client. As your artwork at the picture suggests that you are concerned about privacy, you may want to anonymize the captures using TraceWrangler before handing them out. Or read this answer on another site and take the measures suggested there. answered 20 Nov '15, 06:20 sindy edited 20 Nov '15, 07:00 thanks sindy for the reply and anonymizer tool, **if i'm understanding correctly are you saying that client machine has unclosed socket at his end and my server trying to reconnect to that socket? another question is it normal if we get SYN in middle of TCP session after Handshake process, like shouldn't it be the part of early stages only (3-Way handshake). I'm providing my side of pcap for now. we have requested the client to match the timings for pcap. but can you take a look and help me understand what the problem is? 10.10.10.10 - Client 10.248.187.181 - Our HTTP server** (20 Nov '15, 12:27) Arjun Singh You are right that the SYN should come only once in the beginning of session, yet I am not a TCP expert either. But:
So open your anonymized capture and use display filter
I was looking for some difference between the two cases and it is not very convincing:
So we can see that the tcp stack in your server treats the two quite similar cases differently but I don't dare to guess why exactly. ... to be continued (20 Nov '15, 13:25) sindy I wanted the capture from client side because I've expected to see some lost FIN packets, and I also didn't know you knew the public IP address of your client. As it looks now, the capture from their side won't bring too much additional information. My opinion is that you have to try to use the settings suggested at the link I've sent you before and hope that they help, because otherwise you would have to (let someone?) dive deep into the tcp stack of your server's kernel and change its treatment of remote socket reuse. (20 Nov '15, 13:47) sindy Hm, I am afraid you will need a kernel recompilation... see this thread for explanation why. In short, the window during which the kernel tcp stack still treats new packets from the same socket as late arrivals is 60 seconds and it is not configurable: net/tcp/tcp.h says
And please do not ask me why your server does not have the same issue with the first reuse in the capture. Your other option would be not to touch the server and insert a load balancer in front of it, which would spread the traffic among several servers or, if you feel brave, just among several ports of your current server to which you would attach the httpd. The assumption is that a load balancer's tcp stack should be better prepared for this (or not stateful, just passing the packets through while changing the tcp port number) so at each of the server's sockets, the SYN of the next connection would not arrive earlier than 60 seconds after the FIN/ACK/FIN/ACK of the previous one. And maybe the simplest to implement if you encounter the issue only with that single client: could you ask them to use more public addresses at their NAT device? Looking at the reuse rate in this capture, three public IPs should be enough to get rid of the problem. If they would also broaden the range of ports the NAT device is allowed to use (the capture shows that they use a range from 39xxx through 60xxx, which quite matches your 20000 connections in TIME WAIT state), just two public IPs would be enough for the traffic volume seen in the capture. Yet not knowing the client application, I cannot predict whether if the connections would stop failing, their total count would decrease because less retries would be necessary, or whether it would increase because the client would be able to pass more requests through the (then wider) bottleneck of their NAT device. (20 Nov '15, 14:52) sindy thanks for the analysis sindy, i'm going through it and trying to understand our feasible solution. Few questions
what your say? (21 Nov '15, 20:15) Arjun Singh "and it took another about 36 seconds until another SYN has arrived from the same socket - frame 148368, which has been evaluated as part of the previous session and as such ACKed with a local sequence number from that session, which the client has evaluated as an error and has responded with an RST." within 75 seconds i see this socket used three times.. why? and i have a question. isn't SYN = LAST RECEIVED ACK or 0? so lets say we generated 3 SYN for a session and our server responded to second one, the 1st one lingering in the path. now i wanna know the SEQ NUM for all SYN would be same or different? if it is same then isn't any mechanism in TCP STACK where it can reject the already ACKed SYN by checking SEQ number. and if all 3 SYN are different then wouldn't it be considered as 3 different connections? hmm... i'm thinking SACK. but its already ON. (21 Nov '15, 20:28) Arjun Singh I say few responses:
(22 Nov '15, 01:49) sindy 1 \3. No, they are not the same.
Look here for details. So try the following step, or both if necessary, on the machine which has been anonymized to 10.248.187.181:
(22 Nov '15, 01:50) sindy
because during that 75 seconds, the available pool of ~20000 sockets has been cycled through three times, i.e. the client(s) has attempted to open (at least!) ~60000 tcp sessions during that time. At least because there may have been even more attempts but we cannot see them as they haven't made it through the 20000 ports limit.
See the TCP RFC, but I haven't found anything regarding treatment of mid-session SYN there. In any case, the very meaning of a SYN indicator set to 1 in a packet is to declare that the absolute SEQ NUM in that packet is the initial (reference) sequence number for the session, i.e. a relative 0. So it would be a very bad idea to use different absolute SEQ NUM in the retransmissions of this packet. (22 Nov '15, 02:53) sindy Just out of curiosity have you set tw_recycle =1? Maybe it has been written here, then I missed it. Personal I do think, the smartest solution is to add one or more client IP's as @sindy suggested, because it is independent of operating system implementations. But here is another nice article: http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html#netipv4tcp_tw_recycle (22 Nov '15, 03:05) Christian_R @arjun-singh, forget what I wrote about tcp_tw_reuse. It is only relevant to outgoing session which is not your case. The article which @Christian_R has mentioned gives a much clearer view than the one I've given before. I even doubt whether I've understood properly the meaning of tcp_fin_timeout. @Christian_R, in the great article you've mentioned (I think one would have a hard time trying to find something to add to it, thank you for the link!) there is one important paragraph related to use of tcp_tw_recycle:
So until @arjun_singh tells us whether there is a NAT on client side, we cannot say whether use of tcp_tw_recycle would help or actually make it worse. (22 Nov '15, 03:57) sindy client agreed upon sharing captures...so will revert soon... in the meantime i'm sharing topology at our end. (23 Nov '15, 13:23) Arjun Singh 1 OK, so at least we know that the limited pool of ports is at customer side, not yours, because you've taken the capture between your border router and the load balancer's "outer" side. The client's topology is important. I now bet that there is a NAT at their side, and also that your HAProxy has tcp_tw_recycle enabled, because I had a look at the tcp option timestamp after all. Still looking at the three tcp sessions coming to us from port 44157, I can see the following timestamps in the SYN packets establishing them: So we can now be almost sure that tcp_tw_recycle is enabled at the HAProxy, but as part of the solution it has to be disabled, in order to permit clients behind customer's NAT to get through. Disabling tcp_tw_reuse will imply a need for a guard time > 60 seconds before reuse of the socket pair (local ip:local port, remote ip:remote port) for a new session. So without extending the number of IP addresses and/or ports on client side or on our side, it will be even worse. If neither you nor the client can extend the number of public IP addresses, try to agree with the customer on assignment of maximum possible pool of ports (1025-65534) at their NAT and see whether it is enough or not. In theory, extending the number of ports at your side would help too, but I have no idea how to make the client's web browsers evenly distribute the http requests among several ports at your side as neither Google Chrome nor Mozilla Firefox use SRV DNS records up till now :-( So if the maximum possible pool of ports at customer side is still not enough, in my opinion only two possibilities remain:
(23 Nov '15, 14:46) sindy hi guys, first of all thank you for being with me.. last night we face sudden surge of spurious re-transmission on our both servers. I got hands on live captures for both side. After analysing it i found that sindy was right client has NAT enabled.. 'cos i'm seeing private ip in capture. but i didn't allowed tcp_tw_recycle and its value is 0 on our loadbalancer.. I'm attaching both captures here.. please have a look.. Our capture >> http://www93.zippyshare.com/v/zd8f670O/file.html Client Capture >> http://www93.zippyshare.com/v/busirUxE/file.html 10.10.10.10 >>> client public IP @Sindy we tried with VPN first place, but it didn't work out. there were much more timeouts than public. but my first priority is VPN and i'll go for it after this issue get solved. I'm putting IP address / Port range extension suggestion on table tonight. so Lets see.. :) till then can you guys tell me why these spurious tranmission happend?? it came suddenly for 5 mins then again everything went normal. My another question which answer i couldn't find anywhere in your links clearly is, in tcp_tw_reuse server can reuse the ports in TIME_WAIT state, now if a current session is not over and a socket is reused with new request how tcp gonna know that its NEW not OLD one when its session is still running (in TIME_WAIT counter). and if i say timestamps, then how can we so sure about synchronization of time stamps of thrice servers, as you already quoted a 3 timestamp mismatch.. where our LB got confused? thanks (25 Nov '15, 02:25) Arjun Singh Hi Arjun, I cannot download huge captures on the road, but to answer your questions which don't require analysis of the new captures:
let's remind that a "server" can have two meanings:
Having said that: I understand the article found by @Christian_R that tcp_tw_reuse is only meaningful for a tcp session client, as it permits it to open a new session towards the same remote socket from the same local socket while the previous session using that socket pair is still in TIME_WAIT state. In another words, enabling tcp_tw_reuse has no effect on sessions coming to your servers from tcp clients. Going deeper into it (an not related to your current investigation), tcp_tw_reuse on client side without knowing that tcp_tw_recycle is enabled on the server side is a bad idea because an attempt of the client with tcp_tw_reuse enabled to establish a session reusing the same socket pair will fail if tcp_tw_recycle is not enabled at the server side. Make your own conclusion about how useful these settings actually are, given that they apply for the whole tcp stack and cannot be narrowed to apply only to a list of remote IP addresses. (25 Nov '15, 03:31) sindy 1
I'd like to draw your attention to what I wrote before: it is possible that the NAT at customer side is actually throttling the number of their workstations' requests which reach you, so while the VPN was in use, from customer perspective there may have been no difference while there was from your perspective (more session establishment requests made it from the client's workstations to your load balancer so you could see more tcp resets than now). Now if your servers and some of the client's network elements are e.g. in the same datacenter, or you can organize a direct communication channel between your border router and the client's network some other way, you may even not need a VPN to tunnel the connection of their network to your border router through public internet. To get id of the tcp reset issue, it is enough to use a NAT which hides the IP addresses of the customer's network behind a couple of private addresses which are not used in your own network. This NAT may be set up on their end as well as at your end. (25 Nov '15, 03:57) sindy sindy?? how moving to iPv6 gonna help us. I changed the tcp_fin_timeout at our LB to 10. i'm analyzing the traffic, but couldn't see much effect. (27 Nov '15, 08:09) Arjun Singh @Arjun Singh Your "answer" has been converted to a comment as that's how this site works. Please read the FAQ for more information. (27 Nov '15, 08:25) grahamb ♦ 1 @Arjun-Singh, I guess I may have even misunderstood the meaning of tcp_fin_timeout as well, so let's forget about it. As for IPv6, the idea behind is that NAT would not be necessary because an IPv6 address is not as scarce resource as an IPv4 public address - IPv6 addresses are allocated to customers in blocks of /64 subnets. So the tcp sessions would be established between your load balancer and individual workstations' addresses at customer side. So each workstation at customer side would be able to use ~ 60000 tcp ports instead of fighting against all the others for the same ~ 60000 ports at the public side of the NAT device. That would allow for port reuse rate of 16 sessions per second (which means ~ 60000 sessions per 60 seconds which is the TIME_WAIT duration) at client side without need to use tcp_tw_recycle on load balancer side and tcp_tw_reuse on workstation side. But as I started digging further (to find out that while TIME_WAIT duration is set to 60 seconds in linux, it is set to 120 seconds in Windows), I've come across this article, which suggests that the remedy for you could be much simpler: to never use tcp close() at server side. It means that after sending the answer to the http request, your servers should not actively close the tcp session (as they do now) and instead let the client send the FIN packet first (and if the client fails to do so in reasonable time, terminate the tcp session by sending a RST packet rather than by sending a FIN packet). The idea looks simple and logical. It seems that web browsers prefer to keep already established tcp connections open and reuse them for sending eventual further requests, so it is well possible that if your application stops actively terminating the connections after answering the request, the number of newly established connections might decrease a lot. (27 Nov '15, 13:46) sindy showing 5 of 19 show 14 more comments |
Your Question is similar to this question: https://ask.wireshark.org/questions/41982/bad-ack-response-to-syn?page=1&focusedAnswerId=41993#41993