Hi, I have a repeatable event with a specific job run on SLES10 box acessing an EMC celerra. I see very high READ latency (Max RTT). How can I see which file(s) is responsible for this latency. Here are two examples: I used the following filter with tshark: -q -z rpc,rtt,100003,3,'nfs.nfsstat3!=70'
Note the very high Max RTT for READ procedure.
NFS Version 3 RTT Statistics:
Filter: nfs.nfsstat3!=70
Procedure Calls Min RTT Max RTT Avg RTT
NULL 0 0.00000 0.00000 0.00000
GETATTR 1517132 0.00000 0.58692 0.00105
SETATTR 21 0.00088 0.00954 0.00155
LOOKUP 11581651 0.00001 0.98236 0.00036
ACCESS 887419 0.00000 0.18176 0.00024
READLINK 11462 0.00013 0.43970 0.00689
READ 376117 0.00009 8854.46070 0.46602
WRITE 748 0.00033 0.08924 0.01823
CREATE 8 0.00111 0.02262 0.00607
MKDIR 1 0.00633 0.00633 0.00633
SYMLINK 0 0.00000 0.00000 0.00000
MKNOD 0 0.00000 0.00000 0.00000
REMOVE 7 0.00021 0.00865 0.00342
RMDIR 1 0.00117 0.00117 0.00117
RENAME 2 0.00112 0.00113 0.00112
LINK 0 0.00000 0.00000 0.00000
READDIR 601 0.00016 0.03231 0.00037
READDIRPLUS 8254 0.00015 0.21962 0.00372
FSSTAT 0 0.00000 0.00000 0.00000
FSINFO 27 0.00013 0.00383 0.00071
PATHCONF 0 0.00000 0.00000 0.00000
COMMIT 3 0.00393 0.01380 0.00765
NFS Version 3 RTT Statistics:
Filter: nfs.nfsstat3!=70
Procedure Calls Min RTT Max RTT Avg RTT
NULL 0 0.00000 0.00000 0.00000
GETATTR 1425693 0.00000 0.59928 0.00099
SETATTR 21 0.00038 0.00070 0.00056
LOOKUP 11613451 0.00001 0.68981 0.00029
ACCESS 814987 0.00000 0.04566 0.00022
READLINK 11505 0.00003 0.37694 0.00704
READ 364942 0.00011 7943.28530 2.72942
WRITE 845 0.00029 0.57347 0.01104
CREATE 8 0.00053 0.01336 0.00282
MKDIR 1 0.00073 0.00073 0.00073
SYMLINK 0 0.00000 0.00000 0.00000
MKNOD 0 0.00000 0.00000 0.00000
REMOVE 5 0.00077 0.01270 0.00509
RMDIR 1 0.00067 0.00067 0.00067
RENAME 2 0.00044 0.00051 0.00048
LINK 0 0.00000 0.00000 0.00000
READDIR 601 0.00016 0.10647 0.00156
READDIRPLUS 8254 0.00014 0.11047 0.00389
FSSTAT 0 0.00000 0.00000 0.00000
FSINFO 3 0.00026 0.00032 0.00028
PATHCONF 0 0.00000 0.00000 0.00000
COMMIT 3 0.00241 0.00939 0.00486
asked 07 Feb ‘11, 09:53
debugme
6●3●3●6
accept rate: 0%
edited 07 Feb ‘11, 10:22
SYN-bit ♦♦
17.1k●9●57●245
Thank you SYNbit. rpc.time > 3600 returns a number of frames with NFS [Illegal Segments], i'm not sure what that means ... reassembly issues, lock / permission issues. Are there doc's out there which explain what NFS [Illegal Segments] means?
thanks
There are no docs available on each and every field. As this is Wireshark generated (because of the square brackets around Illegal Segments), it is probably a problem in the re-assembly. Maybe because of a problem with the packets themselves (which might coincide with the problems you are facing) or maybe a bug in Wireshark.
Are you able to share the trace-file? If so, I can have a look if I can pinpoint it. See my profile for my e-mail address...
If there is a pcap file available we can take a look at it for you. The biggest issues with NFS is that v3 (pretty common) has a read block limit of 32KB. So per RTT, you can only transfer 32KB.
If the problem is CPU or IO issues within EMC, the trace file can point it out. If it's an intermittent problem, I would look at the usual suspects as a start (pkt loss, CPU, AntiVirus locking up files)