Tshark generate core dump

Question

Hi all, I'm using wireshark to capture data in real time system. I use the command

tshark -i 5 -P -w /tmp/oh.pcap -b filesize:65535 -b files:5

But after few hours, I got a core dump. So I think the first of 5 rotation files was deleted so that tshark stopped running and generated core dump. So I tried again with a very small size

 tshark -i 5 -P -w /tmp/oh.pcap -b filesize:1 -b files:5

And after few seconds, it also generate a core dump:

The file "/tmp/oh_00185_20131009164918.pcap" doesn't exist

Could you please help me to explain why it is and How to solve this problem? is it related to the real system. This is a telecom system and tshark is running in real time, non-stop, so even I increase the file size or number of file, the problem could be appeared someday in the future. Thanks a lot.

Answer 1

1

I can confirm that behavior on Ubuntu 12.04 with tshark 1.10.0. Please file a bug report at https://bugs.wireshark.org with a detailed description and a reference to this question.

172 tshark: The file "/tmp/oh_00035_20131009165554.pcap" doesn't exist.
Segmentation fault (core dumped)

Possible reason:

It means, the current file which was dissecting was deleted, so I think maybe because the speed of dissector is less than the speed of dumpcap.

As you mentioned, the file that tshark was reading from was deleted (rotated) while it was in use. This only happens if there is enough traffic to make the rotation too fast.

I also cannot use dumpcap because I need to go through the dissector.

O.K. even if the current problem did not exist, you would still run into the memory problem already discussed, as tshark will then run for a very long time and accumulate internal state.

This is a telecom system and tshark is running in real time, non-stop,

Problem: You are trying to use tshark as a long term, real time, monitoring solution. However tshark was not built with that in mind, and that's the reason why you hit all sorts of problems (see your other questions).

Solution: Isn't the proposed solution I suggested in your other question a possible way to go? That construct will run forever (if built the right way) and it will not cause any problems with disk space, RAM, etc. And it's the only way (I can think of) to use tshark to solve your problem, as far as I understand it, based on your other questions.

Regards
Kurt

answered 09 Oct '13, 08:03

Kurt Knochner ♦
24.8k●10●39●237
accept rate: 15%

edited 09 Oct '13, 08:18

OK, thanks for your confirmation. About your proposed solution, i think it could be possible way and my partner also agree this is a good way but he is still considering because he really doesn't want to stop or restart the service even few seconds due to a hundred values coming to us per second. So, he ask me to try tshark, non-stop at least 1 day to check the problem of memory first, and he will think about that solution. I have just run tshark for 4 hours and got this problem.

(09 Oct '13, 08:34) hoangsonk49

1

because he really doesn't want to stop or restart the service even few seconds due to a hundred values coming to us per second.

As I mentioned in the solution, you should run two 'overlapping' (in time) processes. That way you will miss no values at all, but then you'll have to deal with duplicate messages. I mentioned a solution for that as well.

So, he ask me to try tshark, non-stop at least 1 day to check the problem

I don't think you can use tshark that way, because eventually you will run into the memory problem, however you won't be able to predict when it will happen, as that is related to the amount of traffic, which you can't predict either. It might happen after 12 hour, 16 hours or even 20 hours. No way to tell in advance. So, if you can't afford to miss a single value you cannot use tshark that way, as you will have to monitor and restart tshark. During that time span you will loose several hundred values.

(09 Oct '13, 08:43) Kurt Knochner ♦

Yes, I know it will happen but we 'd like to reproduce it and check the statistic of memory change. And one more question, Kurt: in order to verify that bug happened because of the difference between the processing speed of Dumpcap and Dissector. If it is true, it means: the "distance" between Dumpcap and Dissector will increase as time goes by, right? It also means we have a delay in processing the incoming data. For example: Assume that the rate of incoming data is 100 messages/second, the rate of dissecting is 90 messages/second. So,after each 1 second, the number of messages in queue will be increased 10 messages. As a result, after 12 hours, we have a number of 432000 messages which are waiting for dissector [The core dump generated if the size of messages in queue is greater than (filesize * file)]. From that point, I did a test: I run tshark for 14 hours with a buffer (filesize:65535, file:10000 - enough for running without core dump) and write a log of Dissector output. In theory, after that if I send a data to dumpcap, it has a delay [wait for a thousand other messages dissected]. Then, I use tail -f to see the change of log file. But in fact, when I send my data to dumpcap, I can see it appears in the log right after sending like it is running in real time. I don't understand why it is. I will let tshark keeps running in more 12 hours, so totally it has about 1 days in order to increase the delay.

(09 Oct '13, 19:08) hoangsonk49

I have just run another test which is similar to the test above but until now, totally, we have tshark running for ~22h, it means in theory, there is a large messages in queue need to be dissected. But in fact, when I send my data to dumpcap, I 'm still able to see its appearance in the output of dissector right after sending. It 's like no any delay or large messages queue so that it can run in real time. This seems to be a conflict with our expectation which is also the possible reason of the bug. I really don't understand why. Tomorrow, I will print out the number of messages in queue to have a clear view of this problem. So, let the tshark keeps running for next 14 hours and check again in the next morning. Please, if you have any idea or suggestion, please contribute so that I can have a better understanding of this bug.

(10 Oct '13, 03:53) hoangsonk49

Yes, I know it will happen but we 'd like to reproduce it and check the statistic of memory change

While this might be an interesting exercise, what kind of information do you expect to get that solves your real world problem?

the "distance" between Dumpcap and Dissector will increase as time goes by, right?

sounds reasonable and I encourage you to analyze this problem. If you find anything relevant, please update here for the benefit of others.

(10 Oct '13, 05:45) Kurt Knochner ♦

This is now bug 9258.

(10 Oct '13, 07:59) JeffMorriss ♦

It seems to me that this problem can be worked around with a pretty simple shell script, much of which has been suggested to address your actual problem hoangsonk. Use dumpcap with a set duration, add a timestamp as a variable to a script to be incorporated as the name of the output file, then once written call Tshark on it within that script, again controlling the file name as a script-defined variable and ensuring tshark itself is not called until after the file is written.

Want it to run forever? Run the script in a cron job (assuming Linux), where the cron's frequency is equal to the capture file duration, and the duration is based on your memory constraints foro tshark's post-processing of the capture file.

Also, Kurt has touched on this already but just to emphasize there's really no choice here for dumpcap versus tshark for the capture process - For a 'day long trace' in a telecom environment, even if this is just signaling and even if you're filtering it down to just IN/Camel, and even if you're a small operator, tshark memory usage will break it eventually.

As this isn't the first time a 'rolling capture' concept like this has been asked about on the board I might write something up as a base solution for it to distribute at some point. I have a couple workable solutions that do this kind of automated capture process, with pcap file retention policies, protocol statistics, reporting, etc., though none I could present here. There does seem to be a communal need or desire for this though.

(10 Oct '13, 21:17) Quadratic

While this might be an interesting exercise, what kind of information do you expect to get that solves your real world problem?

I expect to know how much memory tshark increase per day with our incoming data. By doing this, we can estimate how long it takes to reach "out of memory" so that we can have a plan of how often tshark should be restarted. Because this is a real-time service, so it should not be restarted as much as possible .

the "distance" between Dumpcap and Dissector will increase as time goes by, right?

This morning, I tried again to send my data into dumpcap, but I'm still able to see its appearance right after sending. It 's like in real-time with no any delay. It also means no large messages queue probably. It sounds good to my system, but not good to me because now, I really don't understand why it is. And I printed out the number of packets incoming to tshark before its dissecting. It seems to me that dumpcap always write data into .pcap file then each time, tshark pick up a number of packets from .pcap via a pipe to dissect together. This number depends on the filsize. For example: filesize:50k <--> 100-200 packets , filesize:100k <--> 300-400 packets ... But if I increase the filesize > 600M, the number of packet is always around 2000 packs even filesize = 6GB or 20 GB ...From that stable number, I think it is also the number of packet coming to dumpcap (If not, this number should increase when I increase the filesize). I'm not sure because it also could be the maximum packet that tshark can get via pipe even the input rate might be greater. I'm checking the code of tshark and dumpcap to find out what is happening. If you have any idea or experience on that, please help.

This is now bug 9258.

Only generation of core dump is solved, tshark still auto stop and this is other bug which has not been solved.

(11 Oct '13, 00:08) hoangsonk49

@Quadratic: thanks for your suggestion, I totally agree with you but whatever, we need to have an optimal number of restart, that 's reason why we have to try running tshark until it reach out of memory. About using ring buffer could make tshark auto stop because of file deleted, we have to find out the reason because if it is truly a problem of different processing speed, so this is a serious problem when the input rate is greater output.Even we apply Kurt 's suggestion, we might loose the incoming data when dumpcap stop but tshark still running until EOF (due to the difference of processing speed between dumpcap and Dissector), in that case, we need to consider how we make a decision.

(11 Oct '13, 00:29) hoangsonk49

Even we apply Kurt 's suggestion, we might loose the incoming data when dumpcap stop but tshark still running until EOF

That's true. I was not aware of that problem and until it is not solved, my proposed method is not reliable!!

(11 Oct '13, 00:45) Kurt Knochner ♦

By the way, do you know the relationship between the option filesize and Memory used by tshark? I did some tests and clearly see the difference. Same network, same code, same files:5, but if filesize:65535, %Memory increase slowly, about 0.1% (of 8GB) in 30 mins, but when I increase filesize by 10 times:655350, the %Memory inscrease about 0.1% (of 8GB) in ~2 mins. Does it make sense based on your understanding? Thanks a lot.

(11 Oct '13, 02:00) hoangsonk49

Does it make sense based on your understanding?

No, but I did not check the code yet. How do you measure memory usage?

(11 Oct '13, 02:03) Kurt Knochner ♦

I use "top" command and see the percentage of tshark

(11 Oct '13, 02:32) hoangsonk49

showing 5 of 13 show 8 more comments