What does processing speed of Dissectors depend on?

Question

Hi guys, I'm analyzing the processing speed of incoming data and dissector in order to know how they are different. I use the tshark command on Windows and surf Youtube to increase network speed rate :

tshark -i 1 -P -w D:/sonnh.pcap -b filesize:1000 -b files:4

I change the code to print out the number of incoming packet (which are written to .pcap file by dumpcap) and the number of outgoing packets (which are dissected by dissector together). Note that: dumpcap doesn't write packets one by one, it captures a group of packet (e.g: 10 packet a time) and then write this group to a .pcap file, then Dissector also take a group of packet from file to dissect. As I saw in the log, the number of packet in Dumpcap group and Dissector group are different:

    Line    Number of Incoming - Dissected packets
     1.     17:16
     2.     11:12
     3.     7:7
     4.     19:17
     5.     17:14
     6.     230:235
     7.     2012:637
     8.     89:839
     9.     2444:37
     10.        1:500
     11.        92:55
     12.        21:16
     13.        0:18

In theory, if tshark works well, the sum of the number of incoming and dissected packets should be identical (100% dissected, drop 0%). The difference of number of incoming packets is because of the speed of network, that is ok, nothing wrong. But I wonder why the number of dissected packets are so different from the number of incoming. As can be seen in line 1:

 Line   Number of Incoming - Dissected packets
  1.        17:16

There is 1 packet left are waiting for dissector and this one is dissected in the line 2:

Line    Number of Incoming - Dissected packets
  2.        11:12

so, totally,the sums are identical. But in the line 9:

Line    Number of Incoming - Dissected packets
9.              2444:37

The number of dissected packs is only 37 vs 2444 of incoming. It is not a real capability of Dissector because in the line 8:

Line    Number of Incoming - Dissected packets
    8.      89:839

The dissector can handle 839 packets vs 89 packets of Incoming. So it means dissector is able to handle a large number of packets but why it only dissect only 37 packets in line 9. From that point, I have some questions:

Why does the number of dissected packets vary in every time?
What does the number of dissected packets depend on? (What make the difference of number of dissected packets)
Why doesn't it take all incoming packets ?
Does this mean the processing speed of Dissectors is less than Dumpcap (i don't think so because we don't have enough evidence)? If Yes, is there any way to increase the number of dissector corresponding to the number of incoming. Sorry for asking too deep in code detail but I hope there 's someone in this forum who work as developer can help me with their experience.

Please, if you are expert or just have any idea, suggestion, or experience on that, please help me to answer. Thank you so much.

Answer 1

0

I'm not sure your experiment is going to be useful, because you're probably getting thrown off by side effects that will have a more or less great impact on your measurements. Keep in mind that, while dumpcap is writing to file, tshark is reading that file at the same time. Meaning: you have file I/O from two processes - one is writing, one is reading, so there is going to be some serialization of who accesses the file when. Also, as you noticed, dumpcap will often buffer frames in memory and write them to disk in a bunch, which means that tshark will not be able to read them as soon as they really arrived.

All in all, dissecting packets may be slower than writing them to disk, but I doubt it. File I/O is usually a lot slower than in-memory processing, so I guess your test is basically a measurement of how fast your capture file is written and read again, biases by serialization of allowing access to the file.

Now, I'll sit and wait for Guy's answer - this is probably more in his area of expertise :-)

answered 14 Oct '13, 22:24

Jasper ♦♦
23.8k●5●51●284
accept rate: 18%

edited 14 Oct '13, 22:25

Hi Jasper, I agree with you about

while dumpcap is writing to file, tshark is reading that file at the same time

we can consider they are doing at the same time, but actually, they have an order: write-read-write-read... at least in my experiment, I'm able to see this in the log.

dumpcap will often buffer frames in memory and write them to disk in a bunch, which means that tshark will not be able to read them as soon as they really arrived

In the question number 4, I also say that I don't have enough evidence to say which is faster. But it doesn't matter because my objective is not measurement. My concern is why Dissector only take a number of packet differently. For example: in the line 9, there are 2444 incoming packets but Dissector takes only about 37 outgoing packets to dissect even they are able to do more as they dissect 839 packets in the line 8. In line 1,2, the number of packet incoming and outgoing are often similar but sometime they are too different. It makes sense to me if dumpcap write x packets into pcap, then Dissector take y packets to dissect and y = x or close to x as much as possible. I'm trying to find out what decide the number y in stead of x and how to make it close to x. And one more question: How to start debug mode to print out g_log (both of tshark and dumpcap). Thanks

(14 Oct '13, 23:54) hoangsonk49

Is there any possible reason related to the pipe? Because as I check, the number of outgoing packets was read from the header of message from pipe (1 byte indicator, 3 bytes for message length)

/* convert header values */
pipe_convert_header((guchar*)header,4,indicator,&required);

--> To have "required" : number of byte to read

/* read the actual block data */
newly = pipe_read_bytes(pipe_fd, msg, required, err_msg);

--> To have "msg" : value reading from "required" bytes. It is also the number of packets which are going to be dissected

(15 Oct '13, 03:00) hoangsonk49

Keep in mind that dumpcap only alerts tshark to there being additional packets every once in a while (every 500 msec: see the DUMPCAP_UPD_TIME macro in dumpcap.c). So you may have times when tshark does less work than dumpcap simply because tshark is waiting for packets. Theoretically tshark should catch up after the next time tick.

(15 Oct '13, 10:30) JeffMorriss ♦

Thanks for your comment. As I understand,the DUMPCAP_UPD_TIME is used for non-overload slow displays. During this duration, dumpcap does nothing and wait for tshark and display, right? Now, if I don't care about what is printing on display, should I increase the DUMPCAP_UPD_TIME (for example: 750 ms or 1s) so that Dumpcap have a "longer delay" for tshark does its work. Is there any problem? Thanks.

(15 Oct '13, 18:55) hoangsonk49

I have just done with DUMPCAP_UPD_TIME = 1000 and DUMPCAP_UPD_TIME = 100 ms. When DUMPCAP_UPD_TIME = 1000, the number of dissecting packets increase a lot but the number of incoming packet to dumpcap each time also increase more rapidly (I use speed limiter = 2000 kB/s to make sure that the network always stabilize at speed rate ~ 2 MB/s). It also happens similarly to DUMPCAP_UPD_TIME = 1000. So totally, we still got the problem.

So you may have times when tshark does less work than dumpcap simply because tshark is waiting for packets

In the line 9: 2444 packets already written by dumpcap and THEN only 37 packets are dissected by tshark (even 2444 packets ready for dissecting and tshark does not need to wait for anything) while Dissector can dissect 839 packets as it have done in the line 8.

(16 Oct '13, 00:51) hoangsonk49

During this duration, dumpcap does nothing and wait for tshark and display, right? Now, if I don't care about what is printing on display, should I increase the DUMPCAP_UPD_TIME (for example: 750 ms or 1s) so that Dumpcap have a "longer delay" for tshark does its work.

Actually I think you're going for the opposite: you want tshark to dissect packets as soon as they're available, right? So there is less variability between how many packets dumpcap processes and how many tshark processes. To do that, decrease DUMPCAP_UPD_TIME to a small number.

(16 Oct '13, 06:23) JeffMorriss ♦

I have done with 2 cases: DUMPCAP_UPD_TIME = 1000 ms and DUMPCAP_UPD_TIME = 100 ms, let 's see the difference. With DUMPCAP_UPD_TIME = 1000 ms, the number of incoming packets: (8000 ~ 19000) vs the number of outgoing packets: (1000 ~ 2000). With DUMPCAP_UPD_TIME = 100 ms, the number of incoming packets: (2000 ~ 8000) vs the number of outgoing packets: (300 ~ 500). So, we have a less variability between how many packets dumpcap processes and how many tshark processes, but if we focus on the percentage of outgoing packets/incoming packets, the performance might be decrease so that when we stop tshark (by using -a duration:120), only dumpcap stops while tshark is still running and the delay is still long.

(17 Oct '13, 19:36) hoangsonk49

showing 5 of 7 show 2 more comments

Answer 2

Does this mean the processing speed of Dissectors is less than Dumpcap (i don't think so because we don't have enough evidence)?

well, just by applying logic, I would answer: Yes

Reason: Both dumpcap and tshark have the same amount of I/O work (if we ignore file system caches for a while). So, that amount of time is the same (dumpcap writing data, tshark reading the same amount of data). Then tshark has a lot of additional work due to the dissection of frames. So, yes dissecting packets with tshark will always be slower than just writing the packets to disk with dumpcap. This is due to the way how dumpcap 'delivers' packets to tshark, through a file they use both.

Thus, I believe the specific problem you have found is a structural problem and I don't know if there is a general or an easy solution for this. There will always be situations where dumpcap will be ahead of tshark, due to the time tshark needs to dissect the frames and it gets worse the longer tshark runs, due to larger lists and hash tables tshark needs to fill, search and possibly reorganize (hash table collisions - although I did not check where exactly that might happen!!). File system caching might help, as the I/O work of tshark (reading what dumpcap just wrote) has a much lower impact than for dumpcap (writing new data), but still....

I change the code to print out the number of incoming packet (which are written to .pcap file by dumpcap) and the number of outgoing packets (which are dissected by dissector together).

Can you please post the code change, so we can check if the changes are appropriate to measure what you are trying to measure ;-))

Regards
Kurt