This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Capturing Kafka buffer length

0

Hi,

I wonder what's the process of capturing Kafka producer buffer length (whatever is being passed over to socket to the broker as a single chunk).

What I tried is running the following on a machine with Kafka producers:

sudo tcpdump -n -s 0 -w kafka.log -i eth0 'port 9092'

Then:

tshark -V -r kafka.log -o 'kafka.tcp.port:9092' -d tcp.port==9092,kafka -2 -Tfields -e kafka.bytes_len

The output looks a bit weird. There are numerous lines with numbers like:

 -1,23,2559,23,2572,23,2351,23,2171,23,4710,23,2335,23,3357,23,2449,23,2454,23,2273,23,2530,23,2417,23,2344,23,2616,23,2499,23,2213,23,2141,23,2575,23,2419,23,2552,23,2532,23,2308,23,2555,23,2247,23,2660,23,3399,23,2451,23,2772,23,2437,23,2631,23,2536,23,2374,23,2397,23,2472,23,2282,23,3334,23,2217,23,2553,23,2301,23,2547,23,2485,23,2654,23283

Can you confirm whether these number represent real buffer size, or point me to a more correct direction?

Thanks!

asked 13 Mar '17, 01:25

spektom's gravatar image

spektom
6113
accept rate: 0%


2 Answers:

1

It would help if you could provide an example pcap file (through cloudshark, dropbox, googledrive or any other filesharing service).

When looking at the kafka protocol, I suspect there can be multiple values in one kafka PDU and one kafka PDU can span multiple TCP packets. Due to reassembly, Wireshark (and tshark) will gather the whole PDU and then parse it. As there are multiple values in the one PDU, there will also be multiple fields kafka.bytes_len, one for each value.

What do you mean by "real buffer size"?

answered 13 Mar '17, 02:33

SYN-bit's gravatar image

SYN-bit ♦♦
17.1k957245
accept rate: 20%

Kafka aggregates data into a buffer (the size is determined by batch.size and linger.ms parameters), then sends it over to a broker. This is what I meant to capture - the buffer size, which is being send to a broker. I've put some sample (1000 packets) here: https://www.cloudshark.org/captures/e92de4d1daf4

(13 Mar '17, 05:12) spektom

1

The output appears correct:

If you examine the attached capture with Wireshark, you will note that certain frames contain multiple instances of the field "kafka.bytes_len" with the values as shown in the tshark output (see below).

I suggest that you look at the wireshark kafka dissection to determine if there exists a field which gives you the information wanted by you ("real buffer size"). (I'm not familiar with the kafka protocol).

Partial tshark output from your capture file

Notes: Current version of Wireshark filter name is "kafka.tcp.ports" "-d" option (decode as) is not needed since you are specifying the port in the "-o" option. I added "-e frame number"

The "kafka.bytes_len" fields are shown only for the frames in which the complete kafka PDU is reassembled. Again, see the Wireshark dissection.

tshark  -r kafka.log.pcap -o "kafka.tcp.ports:9092"  -2 -Tfields -e frame.number -e kafka.bytes_len
1
2
3
4
5
6
7       -1,23,3937,23,5742,23,4154,23,4320,23,4252,23,4169,23,4962,23,7890,8689
8
9
10      -1,23,6406,2524
11

answered 13 Mar '17, 09:10

Bill%20Meier's gravatar image

Bill Meier ♦♦
3.2k1850
accept rate: 17%

edited 13 Mar '17, 09:17

Thanks! I guess the closest I can get is kafka.message_set_size

(13 Mar '17, 22:40) spektom