This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

tshark / dumpcap stops writing files during capture (Red Hat Linux)

0

All,

I have a system set up to run essentially full packet capture using tshark in a multi-file/ringbuffer type arrangement. This has been running stably for months, but started failing after system updates about 24 hours ago.

The failure does not generate any logging and the dumpcap process does not quit or hang. It appears to keep running but it just stops writing in the middle of a file.

I have no logging preceding the event at all. I have a simple cron job in place to look for crashed processes, but in this case the process never hangs or quits, so it's not catching the problem either.

I haven't found any evidence of a root cause. This can happen when the network is relatively quiet and no special activity is happening on the computer, but I have also provoked it to happen while doing file accesses to pcap files that have already been written (slicing previous ringbuffer files for analysis.)

The command line being used to call tshark is:

thsark -q -i eth1 -b filesize:1000000 -b files:60000 -w "$CAPTURES_DIR/foo" -n -f "not host xxx.xxx.xxx.xxx" &

The $CAPTURES_DIR variable is defined in the script calling tshark and the excluded IP is a co-located box from a sister organization. To be 'polite', we don't sniff their traffic.

Running tshark 1.10.14 which calls dumpcap 1.10.14 on Red Hat Enterprise 7.2

The hardware being used is beefy and single-purpose. There should be no concerns about load-- the capture happens across fast SSDs and the CPU is <=10% utilized during most captures.

Is anyone else seeing this? Help?

asked 02 Feb '16, 07:48

OvrSteer's gravatar image

OvrSteer
6114
accept rate: 100%


2 Answers:

0

Solution: Increase the buffer size

I ended up migrating the command to use dumpcap directly as suggested, but I have no evidence this would not work if you still call the capture from tshark.

The old command:

thsark -q -i eth1 -b filesize:1000000 -b files:60000 -w "$CAPTURES_DIR/foo" -n -f "not host xxx.xxx.xxx.xxx" &

was replaced by the following:

dumpcap -q -B 100 -i eth1 -b filesize:1000000 -b files:60000 -w "$CAPTURES_DIR/foo" -f "not host xxx.xxx.xxx.xxx" &

In this case, the -B 100 switch before the first -i switch sets a global buffer size of 100 MiB up from a default of 2 MiB. You also have the option of specifying on a per-interface basis (after the -i statement) but since I'm only capturing off of one interface, I set the buffer globally. I don't know if that's the most sensible size, but it seems to work and has worked reliably for over 24 hours. The documentation also states that hardware may silently limit the buffer size, so I don't know if I'm only using a portion of that or not.

The -n switch is not supported in dumpcap since it's not sensible for the capture process to try to do DNS lookups...

I haven't had trouble with the 2 MiB buffer ever before and the write failures occurred both when at heavy load -- middle of the day capture -- but also late at night when the network was relatively quiescent. Still, it seemed to be worse with load. I don't know the root cause of the instability (patching/rebooting may have been a red herring) but in any case if your captures aren't as stable as you'd like, try increasing the buffer.

At this point, I think I'm fixed so I'm marking this as the answer. Thanks for the help!

answered 09 Feb '16, 06:01

OvrSteer's gravatar image

OvrSteer
6114
accept rate: 100%

1

Please use dumpcap directly instead, it doesn't have memory issues as tshark does (minor now, but still there).

See also https://blog.packet-foo.com/2013/05/the-notorious-wireshark-out-of-memory-problem/

answered 02 Feb '16, 08:02

Jasper's gravatar image

Jasper ♦♦
23.8k551284
accept rate: 18%

Thanks! Is there a way to suppress DNS lookups like the -n flag in tshark does?

Also, I have yet to have seen a tshark-specific memory leak. This hardware has been running for months very stably, and it replaced older hardware with essentially the same configuration. That one was also stable (just not sized to retain as many captures as the current hardware.)

(02 Feb '16, 11:56) OvrSteer

It's not a memory leak, it's deliberate retention of state. If your traffic is such that there is no state retained, then you won't see the memory increase.

Try to continue using tshark and monitor the process memory to see if it is a memory issue.

(02 Feb '16, 14:57) grahamb ♦

dumpcap doesn't do DNS lookups, it only writes packets to disk with no dissection (which would be required to be able to do the DNS lookups in the first place).

(02 Feb '16, 15:10) Jasper ♦♦

So to sum up-- still no evidence of memory utilization problems. Typical utilization on a 16 GiB system is about 500MiB for all processes.

tshark continued to fail just as above-- files just stop being written in the middle of a write. Sometimes it works for hours, sometimes minutes.

I switched to using dumpcap directly just to see if that made any difference, but it did not. (The only difference is omitting the -n switch on the command line.) It captured files, but still experienced intermittent failure.

I rolled back to the previous kernel version as it has been running stable, but no difference.

The one thing I noticed is that the failure happens at random times but could be provoked to fail on command with maybe 50% ability to repeat. I have a script that gathers previously recorded capture files and then has tcpdump slice them for specific time ranges and IP addresses. When this happens, it may cause tshark/dumpcap to fail. Still investigating, but it seems like this issue may be happening at least party because of some underlying instability. I'm just not sure what that might have been as patches didn't touch much (this is a minimal system) and the kernel has already been backed out. If I find the root cause, I'll update this.

(04 Feb '16, 06:16) OvrSteer

Is the script accessing the file that is currently being written to by dumpcap, or is it skipping it and only processes "closed" files?

Maybe there's an file access sharing problem - I have a tool that accesses a running dumpcap file ring, and before reading any of them (which would be possible even for the active file) I check if I can access them exclusively. If that fails, I know it's still being written to, and I skip it.

(04 Feb '16, 06:26) Jasper ♦♦

It moves the currently writing file but doesn't access it until the file is written. It only accesses files that are complete. This hasn't been a problem in the past but I'm not ruling anything out.

I'm off on a tangent now to see if there's something being captured that causes dumpcap to fail, but the failures truly show no errors. No errors are being thrown to the console and no logs show any indication of failure...

(04 Feb '16, 13:57) OvrSteer

So in case someone is having similar issues, I think I have the fix.

I'm using dumpcap to process directly now, but the syntax is the same as in tshark. I went ahead and overrode the default buffer with a -B 100 flag (100 MB buffer, set globally ahead of the interfaces.) So far, so good. Captures have been stable. Will report back tomorrow if I get 24 hours of reliability.

(08 Feb '16, 13:27) OvrSteer

I'm experiencing an identical issue capturing with dumpcap in Ubuntu. Have had the same issue in Windows using an Airpcap Nx device as well, which surprised me.

The capture continues without issue usually for several hours, sometimes days, then the packet count will halt and never iterate again. I'm capturing 802.11 traffic on a wireless interface in monitor mode.

Qualcomm Atheros AR9462 wireless device Dumpcap 1.10.6 (v1.10.6 from master-1.10)

I gave the -B 100 flag a try last night and left it running, but this morning right after I got into work I saw the count stop. No sign of a crashed process, dmesg for wlan3 doesn't report any concern.

Here is the command used: dumpcap -i wlan3 -B 100 -I -b filesize:100000 -f "ether host xx:xx:xx:xx:xx:xx || ether host xx:xx:xx:xx:xx:xx" -n -w output.pcapng

(17 Feb '16, 11:28) Allion_KRS
showing 5 of 8 show 3 more comments