I'm using tshark to get html pages out of a capture file, and checking how many of those contain a specific element. Currently I'm using the following params:
tshark.exe -r test.pcap -o "tcp.desegment_tcp_streams:TRUE" -R "tcp.stream==13 and http" -T pdml > test.session13.pdml
and get the html documents themselves inside a data-text-lines
element spread over many field
elements in the output pdml. Which means that I need to concatenate the data in those elements to get the whole html back together. That's not a great problem, I just wonder, is there a better way to do so? Meaning, letting tshark itself output the complete html?
thanks!
asked 21 Dec '10, 05:56
r0u1i
61●7●7●12
accept rate: 0%
It's more complicated than that, it's really a validation rule (the location of javascript references). So I need to get the plain HTMLs out.
Then you might want to take a look at tcpflow
(http://www.circlemud.org/~jelson/software/tcpflow/)
I'm accepting it as 'NO, there's no better way in tshark' :)