SetupI have a week's worth of pcapng captures and I'm trying to filter out specific data into reports organized into tables. I'm using a batch script calling tshark on the command line running to automatically go through the files one by one. Unfortunately, this process is going very slow. It's almost 1:1 time to process. That is, at the rate I'm currently going it will take my computer running 24/7 for a week to extract the week's worth of data. And that's only for this one report! I have other reports to do, and if they each take a whole week to collect the data, this will not work out. Part of the issue is I'm using a .lua script and I know that's slowing it down. Unfortunately, it's necessary as we're using a non-standard protocol. I won't ask about lua coding here, though. My question is, is there a way I can "smartly" set the filters so that tshark can filter out the packets I don't want to look at more quickly and speed up the process? For example, I know to use "short-circuit" logic operators, && and || so once it gets a definite false on && or definite true on || it won't look at the rest. What other things might help speed it up? Here's an example of the command I'm running:
myprotocol.field1, myprotocol.field2, myprotocol.field3 are fields defined in my .lua script Some things I'm looking for:If I put the -R filter first, does that change processing speed? If in my filter, instead of doing a [! (bunch of || staments)] I did && (bunch of ! statements) would that improve speed? [Actually I think I'll try this, I'm kind of stopping the short circuit from working this way I think] Is there any other reordering I could use that might improve speed? Are there any other methods or general tips that could improve speed? Thank You asked 13 May '16, 11:40 Trashman edited 13 May '16, 13:39 sindy |
4 Answers:
What is the You could try running You might also try to disable name resolution by specifying the What is the percent of traffic that matches the filter vs. the total traffic in each capture file? If it's a much smaller subset, then perhaps swapping the filter order would help? In other words, if you're not matching very many -R "(myprotocol.BYTE_COUNT > 0 && (ip.addr == 192.168.1.1 || ip.addr == 192.168.1.2 || ip.addr == 192.168.2.1 || ip.addr == 192.168.2.2) && !(arp || icmp || icmpv6 || dns || nbns || browser || dhcpv6 || igmp || dhcpfo || bootp || http || ip.flags == 0x01 || ip.fragment.count >= 2))" You could possibly replace Perhaps you could modify your Lua script to add a new field whenever it meets your filter criteria above so that you'd only have to specify a filter of Other suggestions on the Wireshark wiki's Performance page may or may not be helpful to you, but you might want to have a look in case. answered 13 May '16, 13:02 cmaynard ♦♦ edited 13 May '16, 13:06 No, D is a local drive. Definitely would not want to do this job over the network. I can't find a way to consistently run tshark at high priority in batch. While it's running, I can click on it in task manager and up it's priority. But as soon as my batch job moves to the next file, the next tshark is invoked at normal and I can't sit here for that long tending to it. Also, I can't get the "start" command to work with all the parameters I'm passing, plus I'm using the STDOUT to read the data in to another program which I don't think I can do with the start command, anyway. Now, I've read STDOUT is slow and there's a suppression switch to use to make Tshark faster by not using it. But my only option other than STDOUT is to write to a file, and the files this would write out to are huge!!! The program reading them in is automatically compressing the data so it doesn't take up too much space on my drive (I have a 2 TB drive and it's 75% used up from this data). I've tried a few different combinations of re-arranging my filters. It actually seems like the one I started with is the fastest (slightly). arp is definitely my most common unwanted packet type, so it being first makes sense. I've tried not filtering, only filtering arp, not doing the ip filtering, and a few different orders. All changes so far have made it slower. Of course, there are many permutations of those options, so I can keep trying. Yeah, I'm not sure if it really short-circuits or not. Guess I was assuming since the && and || short circuit in C++. But, the order does seem to affect the speed slightly, so maybe it does. As for the IPs I just gave those as examples. I have to cherry pick certain IPs that are not contiguous and I specifically only want the data from some of them and not the others for a given report. The Performance page looks like mostly GUI changes. Can't see much applicable to Tshark specifically, and none that apply to me. But thanks for looking into it. I'm sure some of these tips will be good on other projects. (13 May '16, 13:17) Trashman If nothing else, I'd highly recommend disabling name resolution. As for changing priority, in the past I've also used WMIC to do this. If you're interested, you can have a look at the (13 May '16, 13:30) cmaynard ♦♦ Putting (13 May '16, 13:49) sindy Name resolution is off. Downloaded your dumppcap.bat file and looking it over now. This is a big one, so may take me a while! Never used WMIC before so new to me. Right now I'm "solving" my problem by running 8 instances of my batch routine (each pointed at a different directory) in parallel in different windows. Doing this, it should be done over the weekend rather than a whole week. each instance is, of course, running slower than if I only had one running, but still saving massive amounts of time overall doing it this way. Hooray for multi core processors! (13 May '16, 13:52) Trashman @sindy, Yes, I'm using an older version of Wireshark out of necessity. When we first made the "myprotocol" lua, newer versions didn't like it. We recently made changes that make it work in newer versions, but now my batch scripts don't work with the newer version, of course - for the very reason you pointed out about requiring the -2. Tried a simple switch to -Y and it didn't give me the results I wanted, so I'm looking into that, too. I will have to fix them eventually. (13 May '16, 14:57) Trashman @cmaynard, Thank you for the WMIC tip. Kind of a clunky thing for me to do, but I ran a 9th batch script in parallel that prioritizes all my "tshark.exe" process to "High Priority" every 10 seconds. Since I'm running so many at a time, it's not making a huge difference (they're fighting each other for resources) but a few seconds is shaved on each pass so it should save some hours over the weekend. Since I want to get this report done this weekend, that's what I'll have to do for now. I'll try a more elegant solution for my next reports. (13 May '16, 14:59) Trashman showing 5 of 6 show 1 more comments |
Another thought:
So, a suggestion: limit your filter to just:
(Assuming that answered 13 May '16, 14:35 JeffMorriss ♦ That's interesting. When I do my next set of reports, I'll give this a try. I was thinking that arp and the other protocols I filtered out are "higher level" than "myprotocol" (which is wrapped inside ethernet/ip/udp/myprotocol). I guess I had an implicit assumption that it would filter out "higher" layers without having to try to dissect "lower" layers and was saving me some time. Also, since it needed lua to dissect "myprotocol" that I was somehow bypassing calling my script some by doing so. But now that I think about it, that's some pretty complex logic to try to do that. It would have to specifically rank all possible protocols and then check for them in some priority order. It would also have to preprocess the filter directive and resort them internally. I'm sure it doesn't do that. So, by necessity it has to dissect first. However, oddly, I did cut out large portions of my filter in some of my attempts and it seemed to make it slower, so I'm not sure this will work. That being said, the first thing I cut out of the filter was the myprotocol.BYTE_COUNT. Didn't try leaving that in and taking the rest out. Maybe that will work. The IPs do have to always be in the filter or I won't get the right data. (13 May '16, 14:52) Trashman A couple more (fairly obvious) optimizations:
(16 May '16, 06:22) JeffMorriss ♦ I can't strip it down to just BYTE_COUNT, I have to have the fields I'm looking for in my table columns specified by the -e option (e.g. myprotocol.field1, field2, field3). The other issue is that this is just one report, the other reports I have to run will be using different fields altogether - the lua script is actually pretty central to this operation. I'll have to work on my C skills to rewrite this in C. (16 May '16, 07:30) Trashman I wonder if it's more efficient to simply define the columns you want in Wireshark, optionally in a specific profile, and then just have Adding Wireshark columns and creating profiles is relatively easy and self-explanatory. If you need help with the The (16 May '16, 11:45) cmaynard ♦♦ |
TribeLab Workbench would help. Although it uses tshark for filtering it runs multiple instances in parallel. It will also do the whole thing unattended and merge the filtered files if you want. answered 13 May '16, 15:01 PaulOfford I'm using a MATLAB script to do this same thing. I ran 8 instances in parallel unattended this weekend and combined the outputs to a single large table. Worked quite well. I got the results I needed from this one report. Just the tshark part was slower than I wanted and I have more reports to run. Everything else worked efficiently. (16 May '16, 07:30) Trashman |
Have a look at the Wireshark primary sponsor Riverbed's SteelCentral Packet Analyzer, which used to be called Pilot. Allows operations over multiple large capture files. answered 14 May '16, 03:11 grahamb ♦ Not sure how much this package costs, but I'll check it out. Thanks for the tip. (16 May '16, 07:31) Trashman You get a 30 day "free" trial in exchange for your email address. (16 May '16, 12:01) grahamb ♦ |
doing !XXX&!XXX... etc made it slower, not faster! Oh, well, was worth a shot.