My server works for lots of clients concurrently (client amount is between 300-800 in any moment). I wrote a server and client implementation and clients getting disconnected somehow, which i dont why. Even i am getting disconnected rarely for this unknown reason. And this is ruining quality. I logged in to server with WinSCP and made a test like this: Started uploading a 200mb file to server and at %20 server disconnected me. WinSCP told me to enter name and password again. You can see last moments from below log: Time 5:42 where disconnection have happened. Real log file was 1300+ seconds, so this version is splitted. Server operating system is Centos5 64bit with 1Gbit bandwidth. I cant even upload a file to server without getting disconnected. What must i do to fix this? Edit: There was a logical deadlock in my software. Deadlock was causing to not being able to read & write sockets. Network buffers were filling up at this stage and linux was killing socket connections to fix problem at the end. That is why other softwares were being affected too. asked 31 Aug '13, 15:41 mumeka edited 06 Feb '15, 06:41 |
2 Answers:
O.K. your problem seems to be affecting different applications on that server (own application, winscp, etc.), which leads me to the conclusion that there is a problem with either of these
Well, that's a lot of possible problems and you will not be able to identify all of them by looking at network traces (capture files), especially if you capture the traffic at the client (as in your sample on cloudshark). The best way to eliminate (possibly) faulty components is to run some tests locally (client and server in the same subnet), to see if a file upload (scp) gets interrupted as well. If YES: take a look at
If NO: take a look at
To answer your question:
The client closes the connection after many retries with a RESET. It (basically) gives up because there is no answer from the server anymore. Regards answered 31 Aug '13, 16:52 Kurt Knochner ♦ |
The trace file provided was taken at the client side and is showing that one full-size segment tcp.seq==399361 is never acknowledged by the server while at the same time we still see packets in the reverse direction. So we can assume we still have connectivity and it is that single packet this is causing the problem. "What must i do to fix this?" Hello, as Kurt mentions you should have a look at the server side and see whether CentOS is saw the retransmitted packets and if so, whether the seq/ack numbers are correct. (Compare the real sequence numbers in both client and server trace). Sometimes the TCP SACK option is confusing devices so it might be worth a try to disable it on the server side and see if it helps.
answered 31 Aug '13, 22:45 mrEEde i tried disabling sack but it did not help. I will try to get server side logs as well to compare ack values and to see if server is receiving retransmitted packages. (04 Sep '13, 16:25) mumeka |
Your answer is very helpful. My first server machine was in another country, because of this problem, i rented another machine in my country. So this is a second server with same settings. So it is a low probability hardware has problems, it must be some configuration error. I tried to disable firewalls, but it did not help. I think same subnet test can be helpful, but i need a machine in same subnet first, so it will have to wait a bit.
O.K. this would certainly rule out hardware related issues, as you say. So, you should concentrate on the CPU load and probably on some network parameters of CentOS 5. Which one? Good question.
Did you think about a migration to CentOS 6?
BTW: Anything suspicious in the logs?