I am trying to use tshark for reassembling and extracting NFS payloads. Because of the large amount of data I am processing (and some security concerns) I can not do this processing offline so I am trying to get tshark to run at or as close to wire speed as possible. I initially tried using tshark to do the packet capture but it was dropping too many packets so I am using another pcap based tool to do the packet capture (and successfully capturing and writing packets at wire speed) and then I am piping the output of tshark to another process that does processing on the payloads. So the whole setup looks something like this: pcap-packet-capture-tool | tshark -i - -n -T fields -e nfs.data | my_program In my experiments, I have seen that tshark significantly lags wire speed (1 Gb/s). It's actual rate is roughly 25 MB/s and this lag manifests itself by tshark taking extra time after the packet capture is complete to finish at roughly the rate of 1 to 1 ie. if I do a packet capture for 30 minutes, it will take tshark a total of an hour to finish. I have looked into trying to speed it up myself by not converting the data into hex representations and just printing the binary but the improvements were marginal. For more significant performance improvements I would either need a better understanding of the code (to know what to adjust/strip off) or perhaps I am missing some crucial parameters that will significantly speed tshark up. Please let me know if you have any suggestions on either front. I would also be happy to provide any extra information if this is not enough to troubleshoot the issue. I should add, this is running in a virtualized linux environment with a relatively modern/ powerful server and I have already disabled host look up (as I know that can significantly slow down packet captures). Thanks you! UPDATE: Thank you for the quick responses. Below are some clarifications.
Update 2: I added my own benchmarks in an answer. Update 3: I have used the tshark that gets installed with apt-get and I have built the latest myself. apt-get gives you 1.8.2 and the one I built is 1.8.4. The results from both are similar. I am running it in Ubuntu 12.10 VM running in VMWare fusion on OS X Mountain Lion. It's a macbook air with i7 processor, 8 GB of ram. The VM has 3GB of ram and 4 vCPUs.
asked 05 Dec ‘12, 14:11 notorious-pc… edited 06 Dec ‘12, 10:05 |
One Answer:
Some questions/suggestions:
You did not use the option UPDATE replying to your update in the question. I did some test with dd and the buffer size and that is a crucial factor. dd with default buffer size (you did not mention your buffer size for dd!!)
That's pretty slow! The buffer of dd is to small. Now cat.
That's much better. Now dd with a decent buffer size.
nearly 1 GByte/s (avg). That's not bad :-) HINT: This is obviously using the filesystem cache of Linux. My SSD is only capable of reading with at max. speed of 400-500 MByte/s (specs and measured). now, let's bring in tshark.
O.K. that's odd. Only 10% of the input stream, but still fast enough for a 1 GBit/s link, as the output of cpipe is in Byte/s, so it's ~90 MByte/s. Now tshark filtering on only some fields:
O.K, a bit better, with a (avg) peak up to 110 MByte/s. So, why is your tshark not that fast? I did my test in a VMware on a laptop. The file is on a real fast SSD (400 MByte/s). HOWEVER, I used just http traffic. There is a pretty good chance, that the NFS dissector consumes much more resources, and thus it is so much slower. As I don't have a large NFS capture file, I cannot test it. However, you can test your environment with a large http capture file (easy to create). Then compare your results with mine. If tshark is still much slower, then it's related to your system (CPU, I/O, etc.) or to the tshark version (mine: 1.8.3 on Ubuntu 12.04). If your system is much faster with http, then it's the NFS dissector and there is probably nothing you can do, except speeding up the dissector by improving the code or by using an even faster system (CPU) ;-) BTW: If you run tshark for a long time at a high data rate, it will build up internal state (hash tables, lists, etc.) and it will become slower over time, as it takes longer to add/extract data to those data structures! Regards answered 05 Dec '12, 16:09 Kurt Knochner ♦ So I did your set of tests to see how I compare. I have them below. In short, it seems my system's capabilities are similar to yours but when I add in tshark it's much slower. I think my pcap is a bit smaller (50MB) but I don't think that should make a this huge of a difference. Writing with dd with no block size specified is quite faster for me.
Using a bigger block size anyway:
Ok, that's pretty slow, but still above wire speed. Also, looking later in the process (MB 38 and on)
That's pretty comprable to your results. Here comes tshark:
Similar situation with the fields:
Interestingly, it goes the same speed for NFS reassembly so it seems that whatever is making my tshark slow should also help with the NFS case. Do you have any thoughts on what might cause this (about 2x) slowdown? Do you need more explicit details about my system/ setup to have a better idea of where to start looking? Thanks again. (06 Dec '12, 08:26) notorious-pc... what is your tshark version? Please post the output of.
What is your OS version?
Yes, every detail you can give (e.g. VMware Version, Host OS version, virtual OS version, etc.) may help. BTW: I converted your answer to a comment, as that's how this site works. (06 Dec '12, 09:14) Kurt Knochner ♦ Upon some further investigation I was able to get tshark to read in data up to 200 MB/s with a VM with 32 GB of RAM. This showed that the throughput measurement is not accurate because tshark will continue to run after it has read everything and finish reassembly much later. For example, I fed tshark a 1GB pcap which it consumed in 5 seconds (probably buffering most of it) but tshark actually runs for 3 minutes so if you average that 1GB over 180 seconds it's actually more like 5MB/s. (06 Dec '12, 11:23) notorious-pc... |
If you are building from source you are better off building from trunk, checking out from svn. In that way you get access to the latest improvments if any. Building a profiling build might help(I don’t know how to do that) finding out where the bottleneck is.