Capture performance CentOS 6 vs CentOS 7

Question

I am attempting to capture approx 20mbit/sec worth of radius traffic continuously with tshark. If I capture packets with tshark on CentOS 6.5 I get around 4% to 66% packets dropped. If I do the same thing on CentOS 7 it never reports any dropped packets. I've actually tried to get it to drop packets by doing crazy stuff like outputting large amounts of traffic to xml. As far as I can tell it is not dropping packets. My question is, does CentOS 7 have some sort of feature that makes dropping packets impossible? Or is it dropping packets and not telling me?

As an example, I execute commands like this:

tshark -i ens224 -c 100000 -w /tmp/delme.pcap
tshark -i ens224 -c 100000 -T pdml > /tmp/delme.pcap

For the first command CentOS 6 reports 4% dropped packets, CentOS 7 reports none. For the second command CentOS 6 reports 66% dropped packets but CentOS 7 reports none.

Note that both machines are running tshark 1.12.7 compiled from source.

linux versions for CentOS 6 and 7:

2.6.32-431.5.1.el6.x86_64
3.10.0-123.el7.x86_64

Libpcap versions for CentOS 6 and 7:

14:1.4.0-1.20130826git2dbcaa1.el6
14:1.5.3-3.el7

hardware:

2 CPU, 4GB Ram, 2GHz E7-4850 Xeon
4 CPU, 8GB Ram, 2GHz E7-4850 Xeon

Both capture on VMXNET3 10G optical connection. Same Hard disk.

Accepted Answer

My question is, does CentOS 7 have some sort of feature that makes dropping packets impossible?

No, but it has two features that make dropping packets far less likely:

a kernel version that includes TPACKET_V3 for PF_PACKET sockets;
a libpcap version that uses TPACKET_V3 for PF_PACKET sockets.

Libpcap uses PF_PACKET sockets to capture on Linux 2.2 and later (Linux 1.x and 2.0 didn't have PF_PACKET sockets). The original PF_PACKET sockets delivered packets using the regular socket mechanisms, meaning libpcap (or any other program capturing traffic) had to make one recvmsg() call on that socket for every packet. This was more expensive than, for example, the way the BPF mechanism on *BSD and OS X works, where multiple packets are delivered on every read, so, with a high level of traffic, fewer system calls are made.

Linux 2.4, I think, introduced the "turbopacket" mechanism (that's what the "T" in "TPACKET" stands for - "turbo"), which provides a memory-mapped buffer shared by the kernel and userland. With that, fewer copies are needed when delivering packets, and the packet-reading loop in userland can process multiple packets per wakeup (to wait for packets to arrive, userland makes a select(), poll(), or epoll() call). Unfortunately, that mechanism provided a ring of fixed-size buffers, and libpcap has to choose a size big enough for the largest possible packet. Earlier versions picked packets that were the same size as the snapshot length provided, i.e. probably 64K-1 for the version of Wireshark you're using, which is quite wasteful - the buffer ends up not having enough slots for packets to avoid overrun. Some later versions attempted to use the MTU to determine the slot size, but it can't always do that, and even if it can that can be wasteful.

In some 3.x version (3.6?), TPACKET_V3, a significantly different turbopacket mechanism, was added. It's more like BPF, in that a buffer doesn't hold one packet, it can have multiple packets packed into it. This makes a lot better use of memory for capturing, and a lot fewer packets get dropped.