NFS client is unable to copy file to NFS server. The NFS client is 10.206.5.158. The NFS server is 10.206.0.200. When the copy is attempted it appears that the session goes in to a cycle of TCP retransmissions. The client appears to transmit multiple packets and the server receives a single large packet. The MTUs in the path have been checked and all appear to be 1500. Can someone take a look and see if they can see the issue in the traces. asked 12 Apr '17, 12:24 SloDog |
2 Answers:
The problem is that the the initial transmits arriving at the server still have the tcp.checksum of 0 which indicates that tcp checksum offload (TCO) is enabled at the client. If the client is AIX (which I guess it is ) then disabling the offload functions can be done using chdev -l ent1 -a large_send=no -a chksum_offload=no -P shutdown -Fr Regards Matthias answered 12 Apr '17, 14:29 mrEEde |
I can tell you what is happening, but not exactly why. This description is for the client side capture, but the server side is very similar. There are 2 TCP connections that are re-used over and over. Each time, there is a 3-way handshake, data transfer then Reset. This is repeated on the other connection (tuple) then again on the first connection. This bouncing between the two continues 11 times. The data transfers are interesting in that there are always just 3 packets every 1.5 seconds. The pattern is that the first two are retransmissions of earlier data plus one new data packet. This pattern means that every data packet is transmitted three times (taking 4.5 seconds) and is only acknowledged after the third retransmission. This pattern can be seen in this zoom-in on the TCP Stream chart. After an initial burst of some small (occur once each) and then 5 large (retransmitted at least 3 times each) we settle into the pattern of three's. The blue horizontal line passes through three packets and the ACK line follows the third packets each time. Why would the receiver only ACK the third ones (when we see the first two in the server-side trace)? In the server trace, the pattern is slightly different. Each 1.5 seconds there are 2 packets, the first is size 1436 (later 1448) and the second double at 2872 (later 2896). The server perhaps is not accepting these double-sized packets and hence only ACKing the first smaller ones. The reason for this very likely the invalid checksum for the large ones and the good checksum for the 1448 ones. This explains why the receiver is ignoring the large packets. The difficult part for you is now to determine what is it in the receiver (server) that is deciding to combine the two received 1436 (or 1448) packets into a single 2872 (or 2896) packet with an incorrect checksum? This is the likely "root cause" of your problem. Why is the sender transmitting just 3 packets every 1.5 seconds? My guess is that the sender has a retransmission timeout of 1.5 seconds. It is observing that it gets an ACK only after retransmitting each packet twice. It therefore perceives that this is an extremely "congested" network and keeps its transmit window very small (just 3 packets). Bear in mind that it sees an ACK to just a single packet, even though it has sent 3 packets in each round trip. answered 12 Apr '17, 22:53 Philst edited 12 Apr '17, 23:49 Thanks. This confirms what we were seeing. (13 Apr '17, 09:55) SloDog |
I would disable offloading or updating the driver, too.
What you. I have my systems team checking the setting.
Not what you. Thank you.
All offload parameters have been set to off on both ends. The same behavior is be observed. Anyone have any additional ideas?
Just to confirm Matthias assumption. The server is Linux, the client is AIX.