Hello! I run tshark with -T fields and -e to convert pcaps to CSVs for future processing. One problem I run into is that tshark bundles all the fields with the same name together. For example, -e tcp.srcport will result in one value; -e ip.addr will result in two values, and other fields may potentially result in many values taken from different locations in the packet - all written consecutively. Parsing the result may be problematic or sometimes nearly impossible; for example, if you have more than one ethernet/ip load on top of TCP. Is it possible to do anything, e.g. add some flags to tshark, to help unravel the output? Thanks! Update: Just to clarify: the question referred to complex application-level protocols that have multiple occurrences of the same field - and may themselves appear several times in a single packet. IP addresses was just an initial example - of course, there are always two. And PDML is the best we can do, so the question is resolved. asked 16 Nov '15, 06:55 dfrumkin edited 16 Nov '15, 08:27 |
2 Answers:
Yes. Don't use ip.addr, because that will be the src and dst IP address and tshark will print them together, according to the settings in -E aggregator=. Instead, you could use: ip.src and ip.dst
If you want to use ip.addr, you should define the separator and the aggregator character/string to ease the parsing process.
Regards answered 16 Nov '15, 07:07 Kurt Knochner ♦ edited 16 Nov '15, 07:20 |
There's the answered 16 Nov '15, 07:04 grahamb ♦ How would it help? Don't I get all the occurrences by default? (16 Nov '15, 07:07) dfrumkin Those flags (that's all that are available in this area currently) do allow you to pick the first or last. (16 Nov '15, 07:35) grahamb ♦ |
Thanks for the aggregator; it saves some hacks. However, the major problem s that the fields of the same type are bundled together. In the example I gave in the original question, where you have multiple ethernet/ip loads on top of TCP, how would I know what belongs to each one?
You won't, as there is no marker for the layer at which the IP address appears. Usually, they will be printed serially, one after the other.
If you need more insight, please run one of the following commands and parse the full output.
That way you will get everything in the order as it appears in the frames.
But maybe I'm misunderstanding your request. If so, please upload a small sample pcap somwhere (dropbox, etc.) and post the link here, together with the desired output.
Thanks a lot, Kurt! Looks like pdml (pyshark?) is the way to go. You understood my question correctly; I was just looking for a simple solution where there is none.
Yes, maybe pyshark is an option as well....