Hi all, I'm using wireshark to capture data in real time system. I use the command
But after few hours, I got a core dump. So I think the first of 5 rotation files was deleted so that tshark stopped running and generated core dump. So I tried again with a very small size
And after few seconds, it also generate a core dump:
Could you please help me to explain why it is and How to solve this problem? is it related to the real system. This is a telecom system and tshark is running in real time, non-stop, so even I increase the file size or number of file, the problem could be appeared someday in the future. Thanks a lot. asked 09 Oct '13, 03:03 hoangsonk49 showing 5 of 21 show 16 more comments |
One Answer:
I can confirm that behavior on Ubuntu 12.04 with tshark 1.10.0. Please file a bug report at https://bugs.wireshark.org with a detailed description and a reference to this question.
Possible reason:
As you mentioned, the file that tshark was reading from was deleted (rotated) while it was in use. This only happens if there is enough traffic to make the rotation too fast.
O.K. even if the current problem did not exist, you would still run into the memory problem already discussed, as tshark will then run for a very long time and accumulate internal state.
Problem: You are trying to use tshark as a long term, real time, monitoring solution. However tshark was not built with that in mind, and that's the reason why you hit all sorts of problems (see your other questions). Solution: Isn't the proposed solution I suggested in your other question a possible way to go? That construct will run forever (if built the right way) and it will not cause any problems with disk space, RAM, etc. And it's the only way (I can think of) to use tshark to solve your problem, as far as I understand it, based on your other questions. Regards answered 09 Oct '13, 08:03 Kurt Knochner ♦ edited 09 Oct '13, 08:18 OK, thanks for your confirmation. About your proposed solution, i think it could be possible way and my partner also agree this is a good way but he is still considering because he really doesn't want to stop or restart the service even few seconds due to a hundred values coming to us per second. So, he ask me to try tshark, non-stop at least 1 day to check the problem of memory first, and he will think about that solution. I have just run tshark for 4 hours and got this problem. (09 Oct '13, 08:34) hoangsonk49 1
As I mentioned in the solution, you should run two 'overlapping' (in time) processes. That way you will miss no values at all, but then you'll have to deal with duplicate messages. I mentioned a solution for that as well.
I don't think you can use tshark that way, because eventually you will run into the memory problem, however you won't be able to predict when it will happen, as that is related to the amount of traffic, which you can't predict either. It might happen after 12 hour, 16 hours or even 20 hours. No way to tell in advance. So, if you can't afford to miss a single value you cannot use tshark that way, as you will have to monitor and restart tshark. During that time span you will loose several hundred values. (09 Oct '13, 08:43) Kurt Knochner ♦ Yes, I know it will happen but we 'd like to reproduce it and check the statistic of memory change. And one more question, Kurt: in order to verify that bug happened because of the difference between the processing speed of Dumpcap and Dissector. If it is true, it means: the "distance" between Dumpcap and Dissector will increase as time goes by, right? It also means we have a delay in processing the incoming data. For example: Assume that the rate of incoming data is 100 messages/second, the rate of dissecting is 90 messages/second. So,after each 1 second, the number of messages in queue will be increased 10 messages. As a result, after 12 hours, we have a number of 432000 messages which are waiting for dissector [The core dump generated if the size of messages in queue is greater than (filesize * file)]. From that point, I did a test: I run tshark for 14 hours with a buffer (filesize:65535, file:10000 - enough for running without core dump) and write a log of Dissector output. In theory, after that if I send a data to dumpcap, it has a delay [wait for a thousand other messages dissected]. Then, I use tail -f to see the change of log file. But in fact, when I send my data to dumpcap, I can see it appears in the log right after sending like it is running in real time. I don't understand why it is. I will let tshark keeps running in more 12 hours, so totally it has about 1 days in order to increase the delay. (09 Oct '13, 19:08) hoangsonk49 I have just run another test which is similar to the test above but until now, totally, we have tshark running for ~22h, it means in theory, there is a large messages in queue need to be dissected. But in fact, when I send my data to dumpcap, I 'm still able to see its appearance in the output of dissector right after sending. It 's like no any delay or large messages queue so that it can run in real time. This seems to be a conflict with our expectation which is also the possible reason of the bug. I really don't understand why. Tomorrow, I will print out the number of messages in queue to have a clear view of this problem. So, let the tshark keeps running for next 14 hours and check again in the next morning. Please, if you have any idea or suggestion, please contribute so that I can have a better understanding of this bug. (10 Oct '13, 03:53) hoangsonk49
While this might be an interesting exercise, what kind of information do you expect to get that solves your real world problem?
sounds reasonable and I encourage you to analyze this problem. If you find anything relevant, please update here for the benefit of others. (10 Oct '13, 05:45) Kurt Knochner ♦ This is now bug 9258. (10 Oct '13, 07:59) JeffMorriss ♦ It seems to me that this problem can be worked around with a pretty simple shell script, much of which has been suggested to address your actual problem hoangsonk. Use dumpcap with a set duration, add a timestamp as a variable to a script to be incorporated as the name of the output file, then once written call Tshark on it within that script, again controlling the file name as a script-defined variable and ensuring tshark itself is not called until after the file is written. Want it to run forever? Run the script in a cron job (assuming Linux), where the cron's frequency is equal to the capture file duration, and the duration is based on your memory constraints foro tshark's post-processing of the capture file. Also, Kurt has touched on this already but just to emphasize there's really no choice here for dumpcap versus tshark for the capture process - For a 'day long trace' in a telecom environment, even if this is just signaling and even if you're filtering it down to just IN/Camel, and even if you're a small operator, tshark memory usage will break it eventually. As this isn't the first time a 'rolling capture' concept like this has been asked about on the board I might write something up as a base solution for it to distribute at some point. I have a couple workable solutions that do this kind of automated capture process, with pcap file retention policies, protocol statistics, reporting, etc., though none I could present here. There does seem to be a communal need or desire for this though. (10 Oct '13, 21:17) Quadratic
I expect to know how much memory tshark increase per day with our incoming data. By doing this, we can estimate how long it takes to reach "out of memory" so that we can have a plan of how often tshark should be restarted. Because this is a real-time service, so it should not be restarted as much as possible .
This morning, I tried again to send my data into dumpcap, but I'm still able to see its appearance right after sending. It 's like in real-time with no any delay. It also means no large messages queue probably. It sounds good to my system, but not good to me because now, I really don't understand why it is. And I printed out the number of packets incoming to tshark before its dissecting. It seems to me that dumpcap always write data into .pcap file then each time, tshark pick up a number of packets from .pcap via a pipe to dissect together. This number depends on the filsize. For example: filesize:50k <--> 100-200 packets , filesize:100k <--> 300-400 packets ... But if I increase the filesize > 600M, the number of packet is always around 2000 packs even filesize = 6GB or 20 GB ...From that stable number, I think it is also the number of packet coming to dumpcap (If not, this number should increase when I increase the filesize). I'm not sure because it also could be the maximum packet that tshark can get via pipe even the input rate might be greater. I'm checking the code of tshark and dumpcap to find out what is happening. If you have any idea or experience on that, please help.
Only generation of core dump is solved, tshark still auto stop and this is other bug which has not been solved. (11 Oct '13, 00:08) hoangsonk49 @Quadratic: thanks for your suggestion, I totally agree with you but whatever, we need to have an optimal number of restart, that 's reason why we have to try running tshark until it reach out of memory. About using ring buffer could make tshark auto stop because of file deleted, we have to find out the reason because if it is truly a problem of different processing speed, so this is a serious problem when the input rate is greater output.Even we apply Kurt 's suggestion, we might loose the incoming data when dumpcap stop but tshark still running until EOF (due to the difference of processing speed between dumpcap and Dissector), in that case, we need to consider how we make a decision. (11 Oct '13, 00:29) hoangsonk49
That's true. I was not aware of that problem and until it is not solved, my proposed method is not reliable!! (11 Oct '13, 00:45) Kurt Knochner ♦ By the way, do you know the relationship between the option filesize and Memory used by tshark? I did some tests and clearly see the difference. Same network, same code, same files:5, but if filesize:65535, %Memory increase slowly, about 0.1% (of 8GB) in 30 mins, but when I increase filesize by 10 times:655350, the %Memory inscrease about 0.1% (of 8GB) in ~2 mins. Does it make sense based on your understanding? Thanks a lot. (11 Oct '13, 02:00) hoangsonk49
No, but I did not check the code yet. How do you measure memory usage? (11 Oct '13, 02:03) Kurt Knochner ♦ I use "top" command and see the percentage of tshark (11 Oct '13, 02:32) hoangsonk49 showing 5 of 13 show 8 more comments |
I think because the speed of dissector is less than the speed of dumpcap when it writes data into .pcap. So, even we have a big file size, but as I studied at junior high school: dead time = distance/ difference of velocities. So that, if we use "-w" and "-P" together, we will get the core dump, soon or late. If so, there is no way for me in this case. Please tell me "YOu are wrong". Thanks
Isn't that already answered in your other question?
It's the same memory problem, because tshark runs for a very long time. Please use dumpcap instead.
I think it is not a problem of memory because if I use file size of 1k, it generate core dump after few seconds, not a long time and the reason is "The file "/tmp/oh_00185_20131009164918.pcap" doesn't exist", this file was replaced by new file because we use file rotation. It means, the current file which was dissecting was deleted, so I think maybe because the speed of dissector is less than the speed of dumpcap. I also cannot use dumpcap because I need to go through the dissector.
And one more thing: I don't meet the problem when I remove option "-P" (it means no dissector running). I tried with user version (Binary file downloaded from website and installed on my PC) to capture Local Area Network and use the command: "tshark -i 1 -w D:\sonnh.pcap -b filesize:1 -b files:5". Then I got the same problem So, is this a bug of tshark when using "-P" and "-w" together?
Well, you said:
Now, what's true. A few seconds or a few hours?
As I said in the first question, I tried with 2 cases: (1) file size: 65535 --> got core dump after few hours, and (2) file size: 1 --> Got core dump after few seconds. Then, I remove the option "-P", and try with file size: 1 --> No problem.
Ah, O.K. I missed the second part. My fault.
Some questions:
What is your OS and Wireshark version?
How much empty space do you have on /tmp?
Is there any automatic process that cleans /tmp regularly?
I tried 2 cases: first on my server, it uses CentOS, wireshark 1.10.3, free space about 200 GB. I'm sure it is enough because I 'm quite familiar with generating rotation files by tshark. Then, I tried with my PC, Win 7, free space of D: is about 150 GB, wireshark version 1.10.3 for user, not developer. There is no other automatic process that clean /tmp. when I change the destination into other place, I got the same problem. P/s: in the /tmp, still have 5 new .pcap: but the file with ID 185: "oh_00185_20131009164918.pcap" which was mentioned in the log was deleted (I guess by tshark).
Does this only happen under Windows?
No, both of Windows and Linux.
where did you get 1.10.3 from? The download page only contain 1.10.2. !?!
I got from svn, branch trunk-1.10 and I check tshark -version it shows 1.10.3
did you change the code?
Are you using two-pass mode?
Yes, I did change the code, but when I got the problem, I download the original version and also got the same thing. Then, I download the binary pack for user, and install on Win7. Try the command: "tshark -i 1 -P -w D:\sonnh.pcap -b filesize:1 -b files:5", the problem appear. I think you can try by yourself with these options "-P -w" , filesize:1 -b file:5
P/s: You can see the difference when you remove the option "-P" . If filesize increase, the dead time could be longer but still get core dump
@berose: I don't know how to set two-pass or one-pass mode. I use tshark with this command: "tshark -i 1 -P -w D:\sonnh.pcap -b filesize:1 -b files:5"
Two-pass mode is specified with the -2 command line argument. Since you're not using that, you're in one-pass mode. I asked because the code handles the two cases slightly differently.
Hi all, as discussed, tshark is having a problem of file deleting when using ring buffer. The possible reason is the difference of processing speed between tshark and dumpcap ( See here ). It is a serious problem because it is the way of how tshark works. Dumpcap always generates a pcap file (including temp file) and tshark reads a file to get data. So, is there any reason for using file instead of a queue on RAM if I don't want to have a pcap file (don't use option "-w"). In that case, Dumpcap write data into queue and tshark can use multi-thread to take data as much as possible. Data could be free after getting out of a queue. It could increase the performance of tshark because:
Please correct me if I said something wrong. I don't have much experience so I don't know whether it is possible to deploy in fact or just in theory. Thanks a lot.
I don't think that passing the packets via RAM will increase the speed of tshark at all, because the problem is not reading the file, but analyzing the frames (dissecting them) and that won't be any faster if you pass the packets via RAM. You will simply build an ever growing queue in RAM until there is no RAM left. As I said in another comment, this is a structural problem and there is (probably) no good solution, other than faster dissection speed, which can be accomplished by a faster CPU and probably some dissector code optimization.
Multi-threading might help if you want to open several capture files in parallel within the same Wireshark/tshark process. However, using multiple threads to dissect the frames in one capture file is quite hard, as the whole system uses a lot of shared data structures. As soon as you go multi-threaded you need either local copies of those data structures or a lock/protection mechanism. That will however cause quite a lot of overhead and lock wait time and I don't think it's easy to implement. Think of IP fragmentation, TCP reassembly and the like, where you have dependencies of several consecutives frames. In a multi-threded dissector engine this can create massive problems, if the frames are being processed in the wrong order because one thread is faster than the other.
See also here: http://wiki.wireshark.org/Development/multithreading
As I already mentioned. Wireshark/tshark was not built with real-time analysis in mind and thus it does not work well in that area.
As you need exactly that, tshark/wireshark might not be the ideal tool for you. Maybe you can just use parts of the dissector engine to build your own real time monitoring system for the protocol you want to monitor. If so, please contribute that code for the benefits of others.
Hi all,thanks for your comments. After considering, I decided to choose a way to prevent the stopping of tshark which is caused by the difference of processing speed: I remove the option ring buff "-b files:" and still keep the option "-b filesize". After one pcap file processed, the program renames it into "*.bak". A cronjob run everyday to clean .bak files. It is more possible because: