This is our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

My server works for lots of clients concurrently (client amount is between 300-800 in any moment).

I wrote a server and client implementation and clients getting disconnected somehow, which i dont why. Even i am getting disconnected rarely for this unknown reason. And this is ruining quality.

I logged in to server with WinSCP and made a test like this:

Started uploading a 200mb file to server and at %20 server disconnected me. WinSCP told me to enter name and password again. You can see last moments from below log:

Cloudshark log

Time 5:42 where disconnection have happened. Real log file was 1300+ seconds, so this version is splitted.

Server operating system is Centos5 64bit with 1Gbit bandwidth.

I cant even upload a file to server without getting disconnected. What must i do to fix this?

Edit: There was a logical deadlock in my software. Deadlock was causing to not being able to read & write sockets. Network buffers were filling up at this stage and linux was killing socket connections to fix problem at the end. That is why other softwares were being affected too.

asked 31 Aug '13, 15:41

mumeka's gravatar image

mumeka
16115
accept rate: 0%

edited 06 Feb '15, 06:41


I cant even upload a file to server without getting disconnected.

O.K. your problem seems to be affecting different applications on that server (own application, winscp, etc.), which leads me to the conclusion that there is a problem with either of these

  • The server itself is somehow overloaded from time to time (CPU load)
  • The interface of the server is broken (or the driver). Check with netstat -ni and kernel logs (dmesg)
  • The switch or switch port of the server is broken or overloaded (flooding). Check the switch port statistics and the switch logs.
  • There is a network burst in the local network, that overloads local network components. Check with capture files taken at the server
  • Any other system in path to the server (firewalls, load balancer, router) are overloaded from time to time. Ask the admins

What must i do to fix this?

Well, that's a lot of possible problems and you will not be able to identify all of them by looking at network traces (capture files), especially if you capture the traffic at the client (as in your sample on cloudshark).

The best way to eliminate (possibly) faulty components is to run some tests locally (client and server in the same subnet), to see if a file upload (scp) gets interrupted as well.

If YES: take a look at

  • the switch and/or switch-port of the server
  • the server interface
  • the server load
  • iptables on the server (rate limiting or similar)

If NO: take a look at

  • other components in the path to the server (firewalls, router, etc.)

To answer your question:

Why receiving RST because of many DUP ACKs?

The client closes the connection after many retries with a RESET. It (basically) gives up because there is no answer from the server anymore.

Regards
Kurt

permanent link

answered 31 Aug '13, 16:52

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

Your answer is very helpful. My first server machine was in another country, because of this problem, i rented another machine in my country. So this is a second server with same settings. So it is a low probability hardware has problems, it must be some configuration error. I tried to disable firewalls, but it did not help. I think same subnet test can be helpful, but i need a machine in same subnet first, so it will have to wait a bit.

(04 Sep '13, 16:44) mumeka

My first server machine was in another country, because of this problem, i rented another machine in my country. So this is a second server with same settings.

O.K. this would certainly rule out hardware related issues, as you say. So, you should concentrate on the CPU load and probably on some network parameters of CentOS 5. Which one? Good question.

Did you think about a migration to CentOS 6?

BTW: Anything suspicious in the logs?

(05 Sep '13, 05:54) Kurt Knochner ♦

The trace file provided was taken at the client side and is showing that one full-size segment tcp.seq==399361 is never acknowledged by the server while at the same time we still see packets in the reverse direction. So we can assume we still have connectivity and it is that single packet this is causing the problem.

"What must i do to fix this?" Hello, as Kurt mentions you should have a look at the server side and see whether CentOS is saw the retransmitted packets and if so, whether the seq/ack numbers are correct. (Compare the real sequence numbers in both client and server trace). Sometimes the TCP SACK option is confusing devices so it might be worth a try to disable it on the server side and see if it helps.

 /etc/sysctl.conf net.ipv4.tcp_sack = 0 
Then run "/sbin/sysctl -p /etc/sysctl.conf" to load the settings into the running kernel.
permanent link

answered 31 Aug '13, 22:45

mrEEde's gravatar image

mrEEde
3.9k152270
accept rate: 20%

i tried disabling sack but it did not help. I will try to get server side logs as well to compare ack values and to see if server is receiving retransmitted packages.

(04 Sep '13, 16:25) mumeka
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×752
×8

question asked: 31 Aug '13, 15:41

question was seen: 4,815 times

last updated: 06 Feb '15, 06:41

p​o​w​e​r​e​d by O​S​Q​A