I'm not sure if this is the right place? I have a FTP server running on an Azure Virtual Machine. My customers are sending PDF files to my server during the day. Most of these transfers go well (approx. 4000 files are uploaded okay and 4-5 files are errorness). The FTP server is a FileZilla Server. And the error message I get is "426 Connection closed; aborted transfer of "/affs.pdf"". Could the TCP reset have anything to do with the error I'm experiencing? Link to Cloudshark: https://www.cloudshark.org/captures/4c89922b45bb asked 06 Sep '16, 06:06 JanBokstedt edited 06 Sep '16, 06:32 showing 5 of 9 show 4 more comments |
Sure it could, the question is why the RST itself happens, and that seems to be related to the ftp daemon's behaviour.
Screenshots are almost useless for any analysis, so please publish the capture at Cloudshark (which is preferred at this site) or at any file sharing service and edit your Question with a login-free link to it.
The point is that at the screenshot you can see two different clients are attempting to open a TCP session to server's port 50044, and the question is whether the server doesn't have a problem with that. In general, there is nothing wrong if two TCP sessions share a IP:port tuple at the local side if their remote IP:port tuples differ, but the ftp daemon you use may think otherwise.
A complete capture should reveal more information on what made 40.69.0.11 start setting up a TCP session to the same port like 77.243.43.254 did. If the ftpd has indicated the same port to both clients, it's its own fault; if it hasn't, one of the clients may have chosen that port by mistake - in such case, it would be that client's fault. Or some timeout issue may play a role.
An additional note from the limited view given by the screenshot, the client 77.243.43.254 has requested passive mode (frame #7) and the server has responded (frame #8) with the listening IP of 191.235.221.213 and port 50044 (195, 124) but the IP isn't the one shown in your capture, so there's some routing\NAT going on somewhere (or you've "anonymized" the capture).
The server is an Azure VM. The internal IP is 100.74.32.152 and the public IP is 191.235.221.213. So yes - there is some NAT going on.
Do you need more of the capture? I have access to a complete capture (25 Megabytes), but I'm not sure how much you need
Your answer has been converted to a comment as that's how this site works. Please read the FAQ for more information.
I'd say three hundred packets before and three hundred packets after the part you've shown as screenshot should do. However, as you say there is a NAT, it is yet another candidate culprit of the trouble, because unless it is a 1:1 NAT, it must be ftp-aware (so that it would dynamically open port-forwarding pinholes for ftp-data sessions according to the contents of ftp messages), and this functionality may be broken too.
A more complete wireshark capture has been uploaded here: https://www.cloudshark.org/captures/de5054561de7 Packet 421 contains the error message.
The FTP server is hosted on an Azure virtual machine. I checked with Microsoft, and they have confirmed that endpoints and firewall are set up correctly. In the Azure portal I have defined standalone enpoints - I guess an Azure Endpoint is equal to a port forwarding. The natting is 1:1 then.
Do you know if an Azure LoadBalancer is FTP-Aware?
No, I have no knowledge of the Azure LB.
Which means that FTP-awareness is not necessary, as any packet arriving at public_ip:port_n is forwarded to private_ip:port_n, i.e. there is no need to analyse FTP messages, create dynamic public_ip:port_y to private_ip:port_x mappings, and modifying the port number in the FTP messages. Normally, the NAT device would have to modify the IP address indicated inside the
227 Entering Passive Mode
before forwarding the packet so that the client could initiate the TCP session for ftp-data properly, but in your case, it seems the server is configured to send the public IP in the 227 so the NAT doesn't have to modify it. (To make the picture complete: some clients can detect server side NAT by difference between the IP to which they have opened the control session and the IP indicated in the 227).Your capture shows that the 40.69.0.11 does not attempt to talk to the FTP server at control level, so its attempt to open a TCP session to port 50044 is not triggered by previous FTP control communication. As the TCP session is established by 40.69.0.11 acting as a client accessing a "high" server port (50044) from a "low" client port (1336), and as it actively sends a RST right after successfully establishing the session, my guess is that there is some malware running at 40.69.0.11, port-scanning the public IP of your NAT, and that either the FTP server application itself doesn't like two clients using the same ip:port tuple, or some security algorithm running on the machine is allergic to such thing to happen.
So I'd recommend to capture for a longer time, to catch more events where the FTP server sends a RST to the ftp-data connection, and check whether the event signature is the same, i.e. that these ftp-data connections are terminated using a RST soon after a TCP session from an IP not belonging to the FTP client gets established to the same local port (and then terminated using a RST) while a "legal" session is in progress.
If this proves to be true, you may have to implement some security device, implementing dynamic blacklisting techniques, to protect your FTP server from this type of port-scanning activity while keeping your FTP service accessible from any public IP. The dynamic blacklisting can, however, only reduce the number of these collisions, not eliminate them completely.
To me it looks as my VM (100.74.32.152) sends a TCP reset to the customer/client ip (77.243.43.254). The TCP reset gets an acknowledgement from the ip 40.69.0.11 - which is a Microsoft IP address. I have trouble understanding why the TCP reset i initiated by the VM and why the acknowledgement doesn't come from the destination ip?
Unfortunately Microsoft can't explain why this IP is involved in the conversation.
Think about the events in correct order. Yes, you are right that the RST from the 40... address comes after your server has sent a RST itself, but your server has sent the RST after it has received an ACK packet to its SYN,ACK packet from both the 40... and the 77...
So whether the 40... actually acknowledges the RST which you have sent to the 77... or whether it sends its RST spontaneously is not so important because the Bad Thing has happened already before.
(plus if the 40...'s RST packet was an acknowledge to the 100...'s RST packet, its absolute ack number would have to match the absolute seq number of the 100...'s RST packet, which it doesn't).
So I keep my opinion that the 40... has been hijacked and is port-scanning the internet without knowledge of its administrator.
It could theoretically also be a man in the middle situation: any machine on the path between your site and the real 40... has the possibility to spoof packets, impersonating the 40..., and intercept and drop the responses.
Yet another, even more theoretical, possibility is that the real source is not 40... but something else and that the NAT changes the source address to 40...
If you want to know more, you have to capture at the public side of the NAT simultaneously with capturing at the FTP server, and compare the traces.