This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Analyzing HTTP protocol using Tshark

2
2

Hello! I'm trying to analyze HTTP requests and responses using Tshark and following command.

 /usr/local/bin/tshark -R "http.response or http.request" \
                -T fields -E separator="|" \
                -e frame.time_epoch \
                -e ip.src \
                -e tcp.srcport \
                -e ip.dst \
                -e tcp.dstport \
                -e http.request.version \
                -e http.request.method \
                -e http.host \
                -e http.request.uri \
                -e http.user_agent \
                -e http.response.code \
                -e http.content_type \
                -e http.content_length \
                -e http.location \
                -e http.referer \
                -r input.pcap

It works fine generally but sometimes it gives a multiple request at the same time. for example,

1351717925.251286000|xxx.xxx.xx.xx|12345|xxx.xx.xx.xxx|80|
HTTP/1.1,HTTP/1.1|GET,GET|www.aaa.com,www.aaa.com|/upload/xxxxx,/upload/xxxxx|agent1|||||

Here's my question. If there's a only one user-agent field, how can I know this agent value correspond to which request? Is there any way to put a 'blank mark' for not exist http field like following?

1351717925.251286000|xxx.xxx.xx.xx|12345|xxx.xx.xx.xxx|80|
HTTP/1.1,HTTP/1.1|GET,GET|www.aaa.com,www.aaa.com|/upload/xxxxx,/upload/xxxxx|agent1, 'no value'|||||

Actual responses come with a single packet but each packet is marked as "tcp segment of a reassembled pdu"

These packets are not parsed as HTTP protocol by Tshark or Wireshark. Tshark parses reassembled packet. This is why I got this strange result. Do you have any solution for this?

Thank you for your time.

alt text

asked 17 Dec '12, 03:23

fates's gravatar image

fates
35459
accept rate: 0%

edited 17 Dec '12, 04:54


2 Answers:

2

tshark prints the fields packet-wise (as far as I know), so there should be only one request per line, unless there are really two requests in one packet. So, either this is a tshark bug or there are really two requests in one packet. The later will happen when the client uses Pipelining. In that case it's the same client software, as it's the same TCP connection (the same IP packet). So, even if the client does not send the User-Agent header twice, it will be the same client software (same User-Agent).

BTW: What is your tshark version (tshark -v) and OS version?

UPDATE:

I did some tests. tshark (V1.8.4 and V1.6.12) does report the HTTP requests packet-wise (as I 'guessed'). Maybe it's also doing reassembly if the request is really large, but I was not able to test that.

Anyway, reassembly is not related to your 'problem', as tshark will print several http requests in one output line if no reassembly is necessary (several requests in one packet - see below).

See the following capture file

https://www.cloudshark.org/captures/8da00a00215f

and the output of tshark

4|1355756983.988805000|192.168.158.139|2758|217.13.68.220|80|HTTP/1.1|GET|scripts.zeit.de|/static/js/iqd/adam.js|Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0|||||http://www.zeit.de/index

6|1355756983.996463000|192.168.158.139|2758|217.13.68.220|80|HTTP/1.1,HTTP/1.1,HTTP/1.1|GET,GET,GET|scripts.zeit.de,scripts.zeit.de,scripts.zeit.de|/static/js/jquery/1.4.2/jquery-1.4.2.min.js,/static/js/loader.js?282,/static/js/webtrekk/webtrekk_v3.js|Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0,Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0,Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0|||||http://www.zeit.de/index,http://www.zeit.de/index,http://www.zeit.de/index

9|1355756984.029324000|217.13.68.220|80|192.168.158.139|2758|HTTP/1.1|||||200|application/javascript|962||

As you can see, there are 3 requests in frame #6, as the client uses pipelining (Firefox -> about:config -> network.http.pipelining -> true).

As a result, tshark reports those three requests in one output line for frame #6. If you need to separate those three requests, split the output fields (split character: ‘,'). Beware: ‘,’ might be used in the URL as well! In that case use a different character: -E aggregator=

Regards
Kurt

answered 17 Dec ‘12, 03:37

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

edited 17 Dec ‘12, 07:51

Hello Kurt,

My Tshark version is 1.6.12. I have open the pcap file with Wireshark. Please see the picture I uploaded again (it’s a multiple response case). Actual responses come with a single packet but each packet is marked as “tcp segment of a reassembled pdu”

These packets are not parsed as HTTP protocol by Tshark or Wireshark. Tshark parses reassembled packet. This is why I got this strange result. Do you have any solution for this?

(17 Dec ‘12, 03:46) fates

maybe I’m wrong and tshark does reassembly as well when it prints the fields. I’ll have to test it myself.

Do you have any solution for this?

O.K. what exactly is the problem you need a solution for?

(17 Dec ‘12, 05:14) Kurt Knochner ♦

see the UPDATE in my answer.

(17 Dec ‘12, 07:27) Kurt Knochner ♦

2

As @Kurt explained, there can be multiple HTTP requests in one (reassembled) HTTP PDU. If not all the requests contain all fields that you are looking for (like the User-Agent in your example), there is no way to correlate the fields. This is due to how the -T fields operator works. Dissection will populate the fields (and will not populate a field that is not there) and -T fields will just show you the available values for a particular field.

If you do want to see all fields within context, you will have to either write a LUA script to output the data for you (including "missing" data). Or you might want to use the PDML output and use an XML parser to extract the information you need.

answered 20 Dec '12, 03:07

SYN-bit's gravatar image

SYN-bit ♦♦
17.1k957245
accept rate: 20%

edited 20 Dec '12, 03:07