This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Huge Number of TCP Reset Error

0

Hi All,

I'm Facing huge number of TCP Reset Errors at my network. I am not in depth familiar with TCP and its behavior so i've attached a screenshot of packet capture. I want you to analyze it and tell me what you guys observe any abnormalities that could lead to any abnormal behavior.

The Problem is our web server getting 6000-10000 hits per seconds, yes it is that high. and around 20000 connection goes to tcp Time-wait state right now. So problems we are facing is Client getting RTO at number of requests (is should be responded under max 3 seconds), we are re-transmitting numerous number of packets and this RESET we are getting from client. although our bandwidths are ideal for this kind of traffic so there is no problem.

Here is the packet trace !TRACE ALT At first i want to cut down the RESETs.

Please help guys, i need to solve it very soon. if you need anymore information i'm ready to share it ..

Thanks Arjun

http://i63.tinypic.com/2luc7ir.jpg

asked 20 Nov '15, 05:42

Arjun%20Singh's gravatar image

Arjun Singh
6113
accept rate: 0%

edited 20 Nov '15, 05:43

(21 Nov '15, 12:15) Christian_R

One Answer:

0

From the screenshot alone it seems that the http server assumes that the tcp connection has been already established from the same source socket before, but not properly closed yet, and so it sends only ACK to the received SYN but not its own SYN.

For deeper analysis, you'd have to post .pcap (or .pcapng) capture files from both the http server side and from a client PC accessing the web taken simultaneously for some time & including occurrence of the issue for that client. As your artwork at the picture suggests that you are concerned about privacy, you may want to anonymize the captures using TraceWrangler before handing them out.

Or read this answer on another site and take the measures suggested there.

answered 20 Nov '15, 06:20

sindy's gravatar image

sindy
6.0k4851
accept rate: 24%

edited 20 Nov '15, 07:00

thanks sindy for the reply and anonymizer tool,

**if i'm understanding correctly are you saying that client machine has unclosed socket at his end and my server trying to reconnect to that socket? another question is it normal if we get SYN in middle of TCP session after Handshake process, like shouldn't it be the part of early stages only (3-Way handshake). I'm providing my side of pcap for now. we have requested the client to match the timings for pcap. but can you take a look and help me understand what the problem is?

10.10.10.10 - Client

10.248.187.181 - Our HTTP server**

Pcap File

(20 Nov '15, 12:27) Arjun Singh

You are right that the SYN should come only once in the beginning of session, yet I am not a TCP expert either. But:

  • any packet in any direction may be lost and thus retransmitted, and the travel time of any two packets between the same source and destination may differ significantly. So you cannot exclude that the initial SYN would come twice or more times if your first SYN+ACK has not reached the session initiator fast enough. And due to the variable delay, you may have already ACKed a received SYN, the remote party may have received your ACK and sent to you the first data packet (or even more as ACK for every single packet is not mandatory), and still a SYN retransmission may arrive.
  • the ACK is not just a boolean value (nothing like NAK exists in tcp); instead, the sender of the ACK indicates to the remote party the order number of the byte it has successfully received.

So open your anonymized capture and use display filter (ip.src == 10.10.10.10 and tcp.srcport == 44157 and tcp.dstport == 80) or (tcp.srcport == 80 and tcp.dstport == 44157 and ip.dst == 10.10.10.10). You shall see that

  • the first tcp session has started with a SYN in frame 61303 and ended with ACK to FIN in frames 61315, 61316.
  • it then took about 32 seconds until the client side has used the same socket for a new session starting with SYN in frame 101255 and ending with ACK to FIN in frames 101333, 101334.
  • and it took another about 36 seconds until another SYN has arrived from the same socket - frame 148368, which has been evaluated as part of the previous session and as such ACKed with a local sequence number from that session, which the client has evaluated as an error and has responded with an RST.

I was looking for some difference between the two cases and it is not very convincing:

  • the absolute sequence number in frame 101255 was 0x90ab3f05 which is slightly lower than sequence number 0x91d3af78 of the FIN frame 61315,
  • the absolute sequence number in frame 148368 was 0x4b081d10 which is much lower than sequence number 0x90ab44f3 of the FIN frame 101334.

So we can see that the tcp stack in your server treats the two quite similar cases differently but I don't dare to guess why exactly.

... to be continued

(20 Nov '15, 13:25) sindy

I wanted the capture from client side because I've expected to see some lost FIN packets, and I also didn't know you knew the public IP address of your client. As it looks now, the capture from their side won't bring too much additional information.

My opinion is that you have to try to use the settings suggested at the link I've sent you before and hope that they help, because otherwise you would have to (let someone?) dive deep into the tcp stack of your server's kernel and change its treatment of remote socket reuse.

(20 Nov '15, 13:47) sindy

Hm, I am afraid you will need a kernel recompilation... see this thread for explanation why.

In short, the window during which the kernel tcp stack still treats new packets from the same socket as late arrivals is 60 seconds and it is not configurable: net/tcp/tcp.h says

#define TCP_TIMEWAIT_LEN 6000 /* How long to wait to sucessfully close the socket, about 60 seconds. */

And please do not ask me why your server does not have the same issue with the first reuse in the capture.

Your other option would be not to touch the server and insert a load balancer in front of it, which would spread the traffic among several servers or, if you feel brave, just among several ports of your current server to which you would attach the httpd.

The assumption is that a load balancer's tcp stack should be better prepared for this (or not stateful, just passing the packets through while changing the tcp port number) so at each of the server's sockets, the SYN of the next connection would not arrive earlier than 60 seconds after the FIN/ACK/FIN/ACK of the previous one.

And maybe the simplest to implement if you encounter the issue only with that single client: could you ask them to use more public addresses at their NAT device? Looking at the reuse rate in this capture, three public IPs should be enough to get rid of the problem. If they would also broaden the range of ports the NAT device is allowed to use (the capture shows that they use a range from 39xxx through 60xxx, which quite matches your 20000 connections in TIME WAIT state), just two public IPs would be enough for the traffic volume seen in the capture.

Yet not knowing the client application, I cannot predict whether if the connections would stop failing, their total count would decrease because less retries would be necessary, or whether it would increase because the client would be able to pass more requests through the (then wider) bottleneck of their NAT device.

(20 Nov '15, 14:52) sindy

thanks for the analysis sindy, i'm going through it and trying to understand our feasible solution. Few questions

  1. we are using loadbalancer HAproxy. just wanna understand in this case loadbalancer will temporarily keep all tcp segements in own buffer, and after confirming all timers, sequence, out of order frames. it will forward to server? is it how it will work?

  2. another solution might be asking the client to extend the port numbers range (1025 - 65534). to overcome reuse problem.

  3. TIMEWAIT is interesting.. wanna know is it TCP_TW_REUSE? both are same? altering this key might solve socket exhaustion problem but i think it will increase the resets.

what your say?

(21 Nov '15, 20:15) Arjun Singh

"and it took another about 36 seconds until another SYN has arrived from the same socket - frame 148368, which has been evaluated as part of the previous session and as such ACKed with a local sequence number from that session, which the client has evaluated as an error and has responded with an RST."

within 75 seconds i see this socket used three times.. why? and i have a question. isn't SYN = LAST RECEIVED ACK or 0? so lets say we generated 3 SYN for a session and our server responded to second one, the 1st one lingering in the path. now i wanna know the SEQ NUM for all SYN would be same or different? if it is same then isn't any mechanism in TCP STACK where it can reject the already ACKed SYN by checking SEQ number. and if all 3 SYN are different then wouldn't it be considered as 3 different connections? hmm... i'm thinking SACK. but its already ON.

(21 Nov '15, 20:28) Arjun Singh

I say few responses:

  1. We are at Wireshark Q&A, aren't we. So take a capture on the machine running the HAProxy (simultaneously on all interfaces that may be involved) and you'll see the answer. Or ask the HAProxy authors. I could only suppose. But your mentioning of HAProxy changes a lot of things. Where exactly the capture you've posted has been taken? Between the client and HAProxy or between HAProxy and the server? Maybe the 20000 ports pool is set on your HAProxy and not on the client side? Can you provide a drawing of the complete architecture of your site (covering the whole path between the interface providing connection to the internet and the server)?

  2. yes, that is why I've recommended already, but it may not be enough, as I've also already written: Currently, the machine from which the requests are coming is cyclically reusing some 20000 ports. If they start using a pool of 63000 ports, it may solve the problem of mistreatment of new sessions, but only if it is the only problem in the whole chain. Because the capture on your end only shows the requests that made it from the real client to your server through the limitation of the 20000 available ports somewhere between the real client and your server. There may have been a ton of connection attempts that have been rejected already at client side due to unavailability of free port, and these would reach your site after the number of ports would get tripled.

(22 Nov '15, 01:49) sindy
1

\3. No, they are not the same.

  • TIME_WAIT is a state of a state automaton.

  • TCP_TIMEWAIT_LEN is a constant in the source code defining for how long the state automaton should remain in TIME_WAIT state, but it seems it can actually be overridden by setting tcp_fin_timeout.

  • tcp_tw_reuse is a "boolean" value (0/1) changing the way how the tcp stack treats SYN packets coming from a remote socket in TIME_WAIT state.

Look here for details. So try the following step, or both if necessary, on the machine which has been anonymized to 10.248.187.181:

  • Use
    cat /proc/sys/net/ipv4/tcp_fin_timeout
    and if it returns 60, do
    echo 15 > /proc/sys/net/ipv4/tcp_fin_timeout
    and see whether it becomes better.

  • Only if it does not help, your next try would be
    cat /proc/sys/net/ipv4/tcp_tw_reuse
    and if it returns 0, use echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse to effectively override the TIME_WAIT state, so if a late packet from the previous session would come, it would be treated as part of the new session and corrupt it.

(22 Nov '15, 01:50) sindy

within 75 seconds i see this socket used three times.. why?

because during that 75 seconds, the available pool of ~20000 sockets has been cycled through three times, i.e. the client(s) has attempted to open (at least!) ~60000 tcp sessions during that time. At least because there may have been even more attempts but we cannot see them as they haven't made it through the 20000 ports limit.

now i wanna know the SEQ NUM for all SYN would be same or different?

See the TCP RFC, but I haven't found anything regarding treatment of mid-session SYN there. In any case, the very meaning of a SYN indicator set to 1 in a packet is to declare that the absolute SEQ NUM in that packet is the initial (reference) sequence number for the session, i.e. a relative 0. So it would be a very bad idea to use different absolute SEQ NUM in the retransmissions of this packet.

(22 Nov '15, 02:53) sindy

Just out of curiosity have you set tw_recycle =1? Maybe it has been written here, then I missed it.

Personal I do think, the smartest solution is to add one or more client IP's as @sindy suggested, because it is independent of operating system implementations.

But here is another nice article: http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html#netipv4tcp_tw_recycle

(22 Nov '15, 03:05) Christian_R

@arjun-singh, forget what I wrote about tcp_tw_reuse. It is only relevant to outgoing session which is not your case. The article which @Christian_R has mentioned gives a much clearer view than the one I've given before. I even doubt whether I've understood properly the meaning of tcp_fin_timeout.

@Christian_R, in the great article you've mentioned (I think one would have a hard time trying to find something to add to it, thank you for the link!) there is one important paragraph related to use of tcp_tw_recycle:

When the remote host is in fact a NAT device, the condition on timestamps will forbid all of the hosts except one behind the NAT device to connect during one minute because they do not share the same timestamp clock. In doubt, this is far better to disable this option since it leads to difficult to detect and difficult to diagnose problems.

So until @arjun_singh tells us whether there is a NAT on client side, we cannot say whether use of tcp_tw_recycle would help or actually make it worse.

(22 Nov '15, 03:57) sindy

client agreed upon sharing captures...so will revert soon... in the meantime i'm sharing topology at our end.

http://s000.tinyupload.com/?file_id=98475239441193171483

(23 Nov '15, 13:23) Arjun Singh
1

OK, so at least we know that the limited pool of ports is at customer side, not yours, because you've taken the capture between your border router and the load balancer's "outer" side.

The client's topology is important. I now bet that there is a NAT at their side, and also that your HAProxy has tcp_tw_recycle enabled, because I had a look at the tcp option timestamp after all. Still looking at the three tcp sessions coming to us from port 44157, I can see the following timestamps in the SYN packets establishing them:
3925502261
3925511625
3925426321
As the second one is higher than the first, the load balancer has recognized it as part of new session, so it has responded with a SYN, ACK for a new session; as the third is lower than the second, it
a) means that this third request actually comes from a different machine than the second one,
b) caused the HAProxy answer with plain ACK for the existing session (probably not "thinking" deep enough to notice that the timestamp is so low that it cannot be a late packet from the session in TIME_WAIT state).

So we can now be almost sure that tcp_tw_recycle is enabled at the HAProxy, but as part of the solution it has to be disabled, in order to permit clients behind customer's NAT to get through. Disabling tcp_tw_reuse will imply a need for a guard time > 60 seconds before reuse of the socket pair (local ip:local port, remote ip:remote port) for a new session. So without extending the number of IP addresses and/or ports on client side or on our side, it will be even worse.

If neither you nor the client can extend the number of public IP addresses, try to agree with the customer on assignment of maximum possible pool of ports (1025-65534) at their NAT and see whether it is enough or not.

In theory, extending the number of ports at your side would help too, but I have no idea how to make the client's web browsers evenly distribute the http requests among several ports at your side as neither Google Chrome nor Mozilla Firefox use SRV DNS records up till now :-(

So if the maximum possible pool of ports at customer side is still not enough, in my opinion only two possibilities remain:

  • to migrate to IPv6

  • to set up a VPN between you and the customer so that the customer's pool of "outer" addresses on customer's NAT box could be augmented with a couple of private addresses. Details on request.

(23 Nov '15, 14:46) sindy

hi guys, first of all thank you for being with me..

last night we face sudden surge of spurious re-transmission on our both servers. I got hands on live captures for both side. After analysing it i found that sindy was right client has NAT enabled.. 'cos i'm seeing private ip in capture. but i didn't allowed tcp_tw_recycle and its value is 0 on our loadbalancer.. I'm attaching both captures here.. please have a look..
our ip_local_port_range is 32768-61000,
tcp_tw_recycle = 0
tcp_tw_reuse =
tcp_fin_timeout = 60

Our capture >> http://www93.zippyshare.com/v/zd8f670O/file.html

Client Capture >> http://www93.zippyshare.com/v/busirUxE/file.html

10.10.10.10 >>> client public IP
10.248.187.181 >> Our Internal IP (private)
192.168.201.56 >> client Internal IP (private)
50.60.70.80 >> Our Public IP

@Sindy we tried with VPN first place, but it didn't work out. there were much more timeouts than public. but my first priority is VPN and i'll go for it after this issue get solved. I'm putting IP address / Port range extension suggestion on table tonight. so Lets see.. :) till then can you guys tell me why these spurious tranmission happend?? it came suddenly for 5 mins then again everything went normal.

My another question which answer i couldn't find anywhere in your links clearly is, in tcp_tw_reuse server can reuse the ports in TIME_WAIT state, now if a current session is not over and a socket is reused with new request how tcp gonna know that its NEW not OLD one when its session is still running (in TIME_WAIT counter). and if i say timestamps, then how can we so sure about synchronization of time stamps of thrice servers, as you already quoted a 3 timestamp mismatch.. where our LB got confused?

thanks

(25 Nov '15, 02:25) Arjun Singh

Hi Arjun, I cannot download huge captures on the road, but to answer your questions which don't require analysis of the new captures:

in tcp_tw_reuse server can reuse the ports in TIME_WAIT state

let's remind that a "server" can have two meanings:

  • a role in a session (a "client" is the party which actively opens the session, i.e. sends the initial SYN in case of tcp session, while a "server" is the party which receives the initial SYN and responds with SYN, ACK)

  • a powerful hardware intended to serve service requests from many clients.

Having said that: I understand the article found by @Christian_R that tcp_tw_reuse is only meaningful for a tcp session client, as it permits it to open a new session towards the same remote socket from the same local socket while the previous session using that socket pair is still in TIME_WAIT state. In another words, enabling tcp_tw_reuse has no effect on sessions coming to your servers from tcp clients. Going deeper into it (an not related to your current investigation), tcp_tw_reuse on client side without knowing that tcp_tw_recycle is enabled on the server side is a bad idea because an attempt of the client with tcp_tw_reuse enabled to establish a session reusing the same socket pair will fail if tcp_tw_recycle is not enabled at the server side. Make your own conclusion about how useful these settings actually are, given that they apply for the whole tcp stack and cannot be narrowed to apply only to a list of remote IP addresses.

(25 Nov '15, 03:31) sindy
1

we tried with VPN first place, but it didn't work out. there were much more timeouts than public.

I'd like to draw your attention to what I wrote before: it is possible that the NAT at customer side is actually throttling the number of their workstations' requests which reach you, so while the VPN was in use, from customer perspective there may have been no difference while there was from your perspective (more session establishment requests made it from the client's workstations to your load balancer so you could see more tcp resets than now).

Now if your servers and some of the client's network elements are e.g. in the same datacenter, or you can organize a direct communication channel between your border router and the client's network some other way, you may even not need a VPN to tunnel the connection of their network to your border router through public internet. To get id of the tcp reset issue, it is enough to use a NAT which hides the IP addresses of the customer's network behind a couple of private addresses which are not used in your own network. This NAT may be set up on their end as well as at your end.

(25 Nov '15, 03:57) sindy

sindy?? how moving to iPv6 gonna help us. I changed the tcp_fin_timeout at our LB to 10. i'm analyzing the traffic, but couldn't see much effect.

(27 Nov '15, 08:09) Arjun Singh

@Arjun Singh

Your "answer" has been converted to a comment as that's how this site works. Please read the FAQ for more information.

(27 Nov '15, 08:25) grahamb ♦
1

@Arjun-Singh, I guess I may have even misunderstood the meaning of tcp_fin_timeout as well, so let's forget about it.

As for IPv6, the idea behind is that NAT would not be necessary because an IPv6 address is not as scarce resource as an IPv4 public address - IPv6 addresses are allocated to customers in blocks of /64 subnets. So the tcp sessions would be established between your load balancer and individual workstations' addresses at customer side. So each workstation at customer side would be able to use ~ 60000 tcp ports instead of fighting against all the others for the same ~ 60000 ports at the public side of the NAT device. That would allow for port reuse rate of 16 sessions per second (which means ~ 60000 sessions per 60 seconds which is the TIME_WAIT duration) at client side without need to use tcp_tw_recycle on load balancer side and tcp_tw_reuse on workstation side.

But as I started digging further (to find out that while TIME_WAIT duration is set to 60 seconds in linux, it is set to 120 seconds in Windows), I've come across this article, which suggests that the remedy for you could be much simpler: to never use tcp close() at server side. It means that after sending the answer to the http request, your servers should not actively close the tcp session (as they do now) and instead let the client send the FIN packet first (and if the client fails to do so in reasonable time, terminate the tcp session by sending a RST packet rather than by sending a FIN packet). The idea looks simple and logical. It seems that web browsers prefer to keep already established tcp connections open and reuse them for sending eventual further requests, so it is well possible that if your application stops actively terminating the connections after answering the request, the number of newly established connections might decrease a lot.

(27 Nov '15, 13:46) sindy
showing 5 of 19 show 14 more comments