tshark truncated records - why can’t we get the full record?

Question

I understand how to see the full non-truncated record (in my case it's the actual HTML page that I'm trying to read, and it contains long lines) in Wireshark - it's a clicking operation to copy into the buffer. But I want to do the same in tshark, so as to automate it. So far from Google I've found at least one person asking this, and the response was that the 240 char limit is hardcoded and would not be feasible to make dynamic, therefore you would have to make your own build of Wireshark.

On the other hand, it's clear that the pcap files are storing the full records, and indeed just getting tshark to mindlessly read from one pcap file and output to another one and then putting that into wireshark confirms that all the data is still there.

I just can't understand why this is not a huge problem for people - surely if you want to analyze traffic you sometimes (often) have to see the actual data being sent, and often it's in lines/records that are longer than 240 bytes. Can this not be fixed, or is there some workaround aside from the enormous task of rebuilding Wireshark itself?

Answer 1

I just took a quick peek at the code and indeed it is not easy to change the truncation of displayed fields. If you want to reconstruct data sent over TCP sessions (and http in particular in your case) you can better use the follow stream options of tshark.

From the manpage:

-z follow,prot,mode,filter[,range] Displays the contents of a TCP or UDP stream between two nodes. The data sent by the second node is prefixed with a tab to differentiate it from the data sent by the first node. prot specifies the transport protocol. It can be one of: tcp TCP udp UDP ssl SSL mode specifies the output mode. It can be one of: ascii ASCII output with dots for non-printable characters hex Hexadecimal and ASCII data with offsets raw Hexadecimal data Since the output in ascii mode may contain newlines, the length of each section of output plus a newline precedes each section of output. filter specifies the stream to be displayed. UDP streams are selected with IP address plus port pairs. TCP streams are selected with either the stream index or IP address plus port pairs. For example: ip-addr0:port0,ip-addr1:port1 tcp-stream-index range optionally specifies which "chunks" of the stream should be displayed.</code></pre><p>An example:</p><pre><code>for stream in $(tshark -r http.cap -R http.request -T fields -e tcp.stream | sort -n | uniq)

do echo "Processing stream $stream" tshark -r http.cap -q -z follow,tcp,raw,$stream > /tmp/stream-$stream done

Which will result in:

===================================================================
Follow: tcp,raw
Filter: tcp.stream eq 0
Node 0: 192.168.1.43:50166
Node 1: 66.102.13.103:80
474554202f20485454502f312e310d0a486f73743a207777772e676f6f676c652e6e6c0d0a557365722d4167656e743a204d6f7a696c6c612f352e3020284d6163696e746f73683b20553b20496e74656c204d6163204f5320582031302e363b20656e2d55533b2072763a312e392e3229204765636b6f2f32303130303131352046697265666f782f332e360d0a4163636570743a20746578742f68746d6c2c6170706c69636174696f6e2f7868746d6c2b786d6c2c6170706c69636174696f6e2f786d6c3b713d302e392c2a2f2a3b713d302e380d0a4163636570742d4c616e67756167653a20656e2d75732c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c6465666c6174650d0a4163636570742d436861727365743a2049534f2d383835392d312c7574662d383b713d302e372c2a3b713d302e370d0a4b6565702d416c6976653a203131350d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a436f6f6b69653a20505245463d49443d333634376265366563336465356231393a553d623732313065313434336139316337313a544d3d313236333539353637363a4c4d3d313236383832373637343a4c3d304a703665444c6c5a37654e465775473730456d72364138784f67413a533d77304631706163334a4753736b6c396a3b204e49443d33373d5a455f7979716654635957445a576847616463774b5438312d714a32447243436c336374672d614e4e477076484d435249415a493075483541546c52585473724a626e43306562634b5441495342326a335931774d48536e723847444468494f706852756e706a35786c4d2d69547265386469674846713771314535755a424b3b204b42443d306e6c2d330d0a0d0a
485454502f312e3120323030204f4b0d0a446174653a205468752c2031322041756720323031302030383a33343a353620474d540d0a457870697265733a202d310d0a43616368652d436f6e74726f6c3a20707269766174652c206d61782d6167653d300d0a436f6e74656e742d547970653a20746578742f68746d6c3b20636861727365743d5554462d380d0a436f6e74656e742d456e636f64696e673a20677a69700d0a5365727665723a206777730d0a436f6e74656e742d4c656e6774683a20353532340d0a582d5853532d50726f74656374696f6e3a20313b206d6f64653d626c6f636b0d0a0d0a1f8b08000000000002ffa53b695bdc38d2dff32b8cb369ecc5b80f20401b93071292c96c663207333bb30c9347b6655bb47c60bbb99afeef6f9564b9edee2664f70dc1b6ae52a96e95c4d14690f9d57d4eb5b84af8f1113eb52c4db2694993ec86ba7a946511a7bd9e7cdb2109ba0583ded0b43275184b49707c94d08a00b02adfa6d75376e3ea7e9656d0631b67d1b5bae4ea15bdabfa389da3f931294a5ab9bf9dbfdf3e0040
[…]

Which can be parsed by a script quite easily.

Or use ascii, but then you end up with a lot of dots for the non-ascii-characters:

===================================================================
Follow: tcp,ascii
Filter: tcp.stream eq 0
Node 0: 192.168.1.43:50166
Node 1: 66.102.13.103:80
649
GET / HTTP/1.1^M
Host: www.google.nl^M
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6^M
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8^M
Accept-Language: en-us,en;q=0.5^M
Accept-Encoding: gzip,deflate^M
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7^M
Keep-Alive: 115^M
Connection: keep-alive^M
Cookie: PREF=ID=3647be6ec3de5b19:U=b7210e1443a91c71:TM=1263595676:LM=1268827674:L=0Jp6eDLlZ7eNFWuG70Emr6A8xOgA:S=w0F1pac3JGSskl9j; NID=37=ZE_yyqfTcYWDZWhGadcwKT81-qJ2DrCCl3ctg-aNNGpvHMCRIAZI0uH5ATlRXTsrJbnC0ebcKTAISB2j3Y1wMHSnr8GDDhIOphRunpj5xlM-iTre8digHFq7q1E5uZBK; KBD=0nl-3^M
^M
1212

HTTP/1.1 200 OK^M
Date: Thu, 12 Aug 2010 08:34:56 GMT^M
Expires: -1^M
Cache-Control: private, max-age=0^M
Content-Type: text/html; charset=UTF-8^M
Content-Encoding: gzip^M
Server: gws^M
Content-Length: 5524^M
X-XSS-Protection: 1; mode=block^M
^M
………ÿ¥;i[Ü8Òßó+.³iìÅ¸. @…..Élf2.3;³..G¶e[´|`»¹.þïo.d¹íî&d÷^MÁ¶®R©n.ÄÑF.ùÕ}Nµ¸Jøñ.>µ,M²iI.ì.ºz.e.§½.|Û!.º..ÞÐ´2u.KIp|.Ð..°*ß¦×Svãê~.VÐc.gÑµºäê.½«ú8.£ù1)JZ¹¿.¿ß>[email protected]«8=þ æ8êËÒQé.,¯.oY.d·¶DÀ.MÎ>.õ.>}.uþÓtç$ûn´}âÅ£ÿìÜ¼ûY·&g.ü.ÍÃýÑÞ¡5Ú.íãsÿà.ZÞþúq<£kÛ({."½ËÙºAs+áãp.ú.Ë
[…]

All-in-all, if reconstructing html-pages is what you’re after, I think there will be better tools available when you google for it.

Answer 2

I ran into this problem and I had trouble understanding the method used to access the un-truncated data in Wireshark (not tshark, I don't use that). Here is what I discovered. If you double-click a packet to open a detail view, and highlight the truncated field, then right-click, you don't get a context menu. I found it was necessary to highlight the truncated field in the 3-pane view, and then right-click, and select Copy > Value (or use the shortcut <ctr> <shift> V) which copies the entire un-truncated field onto clipboard.