Getting Tshark up to wire speed for NFS packet captures

Question

I am trying to use tshark for reassembling and extracting NFS payloads. Because of the large amount of data I am processing (and some security concerns) I can not do this processing offline so I am trying to get tshark to run at or as close to wire speed as possible. I initially tried using tshark to do the packet capture but it was dropping too many packets so I am using another pcap based tool to do the packet capture (and successfully capturing and writing packets at wire speed) and then I am piping the output of tshark to another process that does processing on the payloads. So the whole setup looks something like this:

pcap-packet-capture-tool | tshark -i - -n -T fields -e nfs.data | my_program

In my experiments, I have seen that tshark significantly lags wire speed (1 Gb/s). It's actual rate is roughly 25 MB/s and this lag manifests itself by tshark taking extra time after the packet capture is complete to finish at roughly the rate of 1 to 1 ie. if I do a packet capture for 30 minutes, it will take tshark a total of an hour to finish.

I have looked into trying to speed it up myself by not converting the data into hex representations and just printing the binary but the improvements were marginal. For more significant performance improvements I would either need a better understanding of the code (to know what to adjust/strip off) or perhaps I am missing some crucial parameters that will significantly speed tshark up. Please let me know if you have any suggestions on either front. I would also be happy to provide any extra information if this is not enough to troubleshoot the issue.

I should add, this is running in a virtualized linux environment with a relatively modern/ powerful server and I have already disabled host look up (as I know that can significantly slow down packet captures).

Thanks you!

UPDATE: Thank you for the quick responses. Below are some clarifications.

The testing that I've done to find the speed of tshark were not in the full setup. I am writing the pcap to tshark with dd and letting tshark write it's output to /dev/null so there are no external limiting factors. The note above about tshark going at 25 MB/s is from the isolated tests, not the full setup. I just added those details in case there is some slowdown when capturing on the stdin interface or writing to stdout.
The pcap tool is tcpdump with pfring.
I have looked at top and vmstat while it runs. It uses up to 1 GB of ram and 100% cpu. tshark is single threaded so it's only taking up 1 of the 4 cores.
I do run tshark with -n, I just omitted it above. I am re-adding it now.

Update 2: I added my own benchmarks in an answer.

Update 3: I have used the tshark that gets installed with apt-get and I have built the latest myself. apt-get gives you 1.8.2 and the one I built is 1.8.4. The results from both are similar. I am running it in Ubuntu 12.10 VM running in VMWare fusion on OS X Mountain Lion. It's a macbook air with i7 processor, 8 GB of ram. The VM has 3GB of ram and 4 vCPUs.

$ ./tshark -v TShark 1.8.4 (SVN Rev Unknown from unknown) Copyright 1998-2012 Gerald Combs <[email protected]wireshark.org> and contributors. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled (64-bit) with GLib 2.34.0, with libpcap, with libz 1.2.7, without POSIX capabilities, without SMI, without c-ares, without ADNS, without Lua, without Python, without GnuTLS, without Gcrypt, without Kerberos, without GeoIP. Running on Linux 3.5.0-19-generic, with locale en_US.UTF-8, with libpcap version 1.3.0, with libz 1.2.7. Built using gcc 4.7.2. $ tshark -v TShark 1.8.2 Copyright 1998-2012 Gerald Combs <[email protected]wireshark.org> and contributors. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled (64-bit) with GLib 2.34.0, with libpcap, with libz 1.2.7, with POSIX capabilities (Linux), with SMI 0.4.8, with c-ares 1.9.1, with Lua 5.1, without Python, with GnuTLS 2.12.14, with Gcrypt 1.5.0, with MIT Kerberos, with GeoIP. Running on Linux 3.5.0-19-generic, with locale en_US.UTF-8, with libpcap version 1.3.0, with libz 1.2.7. Built using gcc 4.7.2.

$ uname -a Linux ubuntu 3.5.0-19-generic #30-Ubuntu SMP Tue Nov 13 17:48:01 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Answer 1

Some questions/suggestions:

What is your pcap-packet-capture-tool?
How do you know that this tool is not the limiting factor? Did you measure it's performance with cpipe?

pcap-packet-capture-tool | cpipe -vt > /dev/null

decoding the whole traffic will take a lot of resources. Did you check the resource usage of tshark while it is running? (top, vmstat, atop, etc.)
How do you know that your program (my_program) is not the limiting factor? What happens if you run this:

pcap-packet-capture-tool | tshark -i - -T fields -e nfs.data > /dev/null

the pipe buffer size is limited. If one process fills it up fast and the other is not reading fast enough, the first process might block.

and I have already disabled host look up

You did not use the option -n with tshark (at least not in your example above). Please try that.

UPDATE

replying to your update in the question.

I did some test with dd and the buffer size and that is a crucial factor.

dd with default buffer size (you did not mention your buffer size for dd!!)

dd if=http_250.cap | cpipe -b 1024 -vt > /dev/null 
thru: 320.396ms at    3.1MB/s (   3.1MB/s avg) 1024.0kB
thru: 302.170ms at    3.3MB/s (   3.2MB/s avg)    2.0MB
thru: 304.235ms at    3.3MB/s (   3.2MB/s avg)    3.0MB
thru: 300.765ms at    3.3MB/s (   3.3MB/s avg)    4.0MB

That's pretty slow! The buffer of dd is to small.

Now cat.

cat http_250.cap | cpipe -b 1024 -vt > /dev/null 
thru:   5.004ms at  199.8MB/s ( 199.8MB/s avg) 1024.0kB
thru:   5.092ms at  196.4MB/s ( 196.9MB/s avg)    2.0MB
thru:   5.027ms at  198.9MB/s ( 197.3MB/s avg)    3.0MB
thru:   5.188ms at  192.8MB/s ( 195.8MB/s avg)    4.0MB

That's much better.

Now dd with a decent buffer size.

dd bs=512000 if=http_250.cap | cpipe -b 1024 -vt > /dev/null 
thru:   2.006ms at  498.5MB/s ( 498.5MB/s avg) 1024.0kB
thru:   0.743ms at    1.3GB/s ( 714.5MB/s avg)    2.0MB
thru:   0.759ms at    1.3GB/s ( 813.0MB/s avg)    3.0MB
thru:   0.708ms at    1.4GB/s ( 897.1MB/s avg)    4.0MB
thru:   0.717ms at    1.4GB/s ( 955.7MB/s avg)    5.0MB
thru:   0.725ms at    1.3GB/s ( 995.2MB/s avg)    6.0MB
thru:   0.723ms at    1.4GB/s (   1.0GB/s avg)    7.0MB
thru:   0.739ms at    1.3GB/s (   1.0GB/s avg)    8.0MB

nearly 1 GByte/s (avg). That's not bad :-)

HINT: This is obviously using the filesystem cache of Linux. My SSD is only capable of reading with at max. speed of 400-500 MByte/s (specs and measured).

now, let's bring in tshark.

dd bs=512000 if=http_250.cap | cpipe -b 1024 -vt | tshark -ni - > /dev/null 
Capturing on Standard input
thru:   4.809ms at  207.9MB/s (  90.7MB/s avg)   41.0MB
thru:   5.187ms at  192.8MB/s (  91.8MB/s avg)   42.0MB
thru:   5.586ms at  179.0MB/s (  92.9MB/s avg)   43.0MB
thru:   5.552ms at  180.1MB/s (  93.9MB/s avg)   44.0MB
thru:   9.280ms at  107.8MB/s (  94.1MB/s avg)   45.0MB
thru:  17.993ms at   55.6MB/s (  92.7MB/s avg)   46.0MB

O.K. that's odd. Only 10% of the input stream, but still fast enough for a 1 GBit/s link, as the output of cpipe is in Byte/s, so it's ~90 MByte/s.

Now tshark filtering on only some fields:

dd bs=512000 if=http_250.cap | cpipe -b 1024 -vt | tshark -ni - -T fields -e tcp.port > /dev/null 
Capturing on Standard input
thru:   5.584ms at  179.1MB/s ( 109.4MB/s avg)  125.0MB
thru:   5.746ms at  174.0MB/s ( 109.7MB/s avg)  126.0MB
thru:   7.830ms at  127.7MB/s ( 109.8MB/s avg)  127.0MB
thru:   6.039ms at  165.6MB/s ( 110.1MB/s avg)  128.0MB
thru:   5.836ms at  171.4MB/s ( 110.4MB/s avg)  129.0MB
thru:   7.159ms at  139.7MB/s ( 110.6MB/s avg)  130.0MB
thru: 193.072ms at    5.2MB/s (  95.7MB/s avg)  131.0MB
thru:   5.408ms at  184.9MB/s (  96.0MB/s avg)  132.0MB
thru:   5.573ms at  179.4MB/s (  96.4MB/s avg)  133.0MB
thru:   5.702ms at  175.4MB/s (  96.7MB/s avg)  134.0MB
thru:   6.357ms at  157.3MB/s (  97.0MB/s avg)  135.0MB
thru:   6.004ms at  166.6MB/s (  97.3MB/s avg)  136.0MB

O.K, a bit better, with a (avg) peak up to 110 MByte/s.

So, why is your tshark not that fast? I did my test in a VMware on a laptop. The file is on a real fast SSD (400 MByte/s). HOWEVER, I used just http traffic.

There is a pretty good chance, that the NFS dissector consumes much more resources, and thus it is so much slower. As I don't have a large NFS capture file, I cannot test it. However, you can test your environment with a large http capture file (easy to create). Then compare your results with mine. If tshark is still much slower, then it's related to your system (CPU, I/O, etc.) or to the tshark version (mine: 1.8.3 on Ubuntu 12.04). If your system is much faster with http, then it's the NFS dissector and there is probably nothing you can do, except speeding up the dissector by improving the code or by using an even faster system (CPU) ;-)

BTW: If you run tshark for a long time at a high data rate, it will build up internal state (hash tables, lists, etc.) and it will become slower over time, as it takes longer to add/extract data to those data structures!

Regards
Kurt