I'm examining results from tcpdump using wireshark/tshark and I'm seeing many packets with info "Continuation or non-HTTP traffic" and many other packets with info "[TCP segment of a reassembled PDU]". I'm curious as to what the difference between the two is.
The trace comes from a simulation of client-server interaction using HTTP streaming. Each client initiates an HTTP connection (using GET) and the server proceeds to send back chunked data indefinitely. The size of the content is therefore unknown and cannot be provided in the header.
I'm quite confused because when I compare a "Continuation" packet with a "TCP segment" packet, they look nearly identical (the differences being minor details such as the timestamp). Can anyone shed some light on these two concepts for me?
Here is one of my captures. In this particular trace, it looks like the switchover from "reassembled PDU" to "HTTP continuation" starts at number 6054/6055. Note that there are quite a lot of duplicate messages where the difference is just in the port; this is because it is simulating many clients (500 in this one, I believe).
asked 07 Nov '11, 18:31
edited 10 Nov '11, 15:43
This is merely a result of the TCP Protocol preferences setup, giving you two different views on the same type of data. If you go to your Wireshark Preferences and select the TCP protocol settings, you'll see something called "Allow subdisector to reassemble TCP streams". Depending on wether it is checked or unchecked you get either "reassembled PDU" or "continuation" Messages in the info column.
What this setting does is to allow Wireshark to look for and combine packets that contain pieces of the same payload and reconstruct it for you (otherwise you'd have to export them one by one and assemble them yourself). This is mostly needed for payload reconstruction. If you're more interested in packet timings etc. it is usually better to disable reassembly to see a clearer view on what happened when.
answered 07 Nov '11, 23:59
In fact there are at least three different issues with reassembling considered chunked HTTP transfer encoding and you must check your preferences very carefully, especially if you are dealing with 'endless' server connection sending chunks of messages.
First, the application-level protocol packet, such as HTTP request may fit in single TCP segment, and may not. If the HTTP header is big enough to be split in segments (that's a rare issue, but happens if site is sending lots of cookies and optional X-headers), then you will see two or more packets in the wireshark capture, period. The same can happen to HTTP response headers and mostly it does happen to HTTP request/response bodies. Sometimes applications just do send HTTP headers in single TCP segment and HTTP body in next one. But please note, that those segments have nothing in common with chunks, when chunked Transfer-Encoding is used, because that encoding is application level and TCP is the transport level of the OSI model. So, even your single "chunk" can span multiple segments. But that's not the whole story. Single TCP segment can either fit in ethernet frame (PDU), but can be split as well. Most of the time this does not happen, but for some badly configured Windows machines the maximum size of TCP frame is bigger then usual maximum of Ethernet switches can handle. To add more fun, transport-level packets must be ACK'ed by the endpoint, and sometimes ACK is set within next TCP data packet, and sometimes it is sent separately, while still on the same HTTP port.
So, if you try to analyse Web application traffic on TCP level, you'll get a loads of useless sh#t most of the time. That's why you should use filters.
To help upper-level protocols collect and filter information, the wireshark dissectors have notion of 'reassembling', where higher-level dissector returns special code meaning 'hey, I need more data to properly dissect this packet' and then processing is restarted when more data arrives.
If you turn off ALL reassembling options for TCP and HTTP (and SSL) protocols, then you'll see the naked packets as they are on the wire. You'll notice that 'Continuation of HTTP traffic' message in Info column when packet is with data, but neither HTTP request nor HTTP response header found within it. And all packets without data will be tagged as plain TCP in Protocol column. Mostly that's about ACKs, SYNs and FINs, so you can filter them out.
If you allow TCP to reassemble streams, but leave other options unchecked - the picture won't change much, because upper level protocols won't request reassembling.
If you allow HTTP to request reassembling the headers spanning multiple segments and bodies then you can already do filtering by application protocol means. E.g. enter 'http' in the Display filter and you'll can forget about all [reassembled PDU] infos - they all be marked as being 'TCP' protocol.
Now the dangled part - reassembling application-level chunks. If you analyse protocol that depends upon sending data in chunks, e.g. AJAX chat over HTTP, I'd suggest leaving that option unchecked. Because reassembling stops when you receive the chunk with '0' size, which in your case you would never.
However, if your application does encode HTTP bodies with gzip, and use chunked encoding just to send it in streamlined version, you'd better check option of chunk reassembling, otherwise ungzipping will fail.
That was quite a lot of text above, but hope now everything is clear for you.
Also, if you want more advanced filtering options for HTTP responses, you may find it useful to install following Lua script : Assocating HTTP responses to requests in Wireshark. Should you have any questions about it, feel free to ask.
answered 08 Nov '11, 22:57