I'm capturing and decoding traffic within a SOAP envelope. The source application passes an XML payload through WCF which then converts all the XML reserved characters into HTML entities. So the less than symbol (<) becomes
What I see in Wireshark is something like this:
The xml in the truncated line can be thousands of bytes long. But it contains fields that I need to do filtering and statistics on. I want to convert the HTML entities back to their original less than and greater than symbols, and then do filtering as I would with any XML document in Wireshark--something like gs2message A=10%, gs2message B=20%, etc. Of course, this would mean XML inside XML and I think the parser would have a fit. Why it wasn't put into a CDATA block to begin with, I don't know. But this is what I have to work with. So can I load it into a CDATA block instead within Wireshark and then reconstitute the XML for display, filtering, stats? And converting this back to real XML, wouldn't that mess up my byte size statistics? If within Wireshark would this be done with a dissector or DTD file? Is it even possible to reconstitute the XML payload within WireShark? Or do I have to do it after the fact?
If I go outside of Wireshark would something like Pilot work? Or do I need to write something custom in say Python? But at the same time, I still want all the Frame, Ethernet, TCP/IP info on data sizes for bandwith and latency analysis. It's just that the filtering fields are inside this locked up XML.
All advice is welcome.
asked 16 May '11, 18:33
edited 18 May '11, 14:16
I have a solution.
It turns out that my application payload data, the line above that says [truncated] <?xml...... Well, if the XML dissector has already executed, this data will be held inside of the field xml.cdata By using a Post Dissector (thereby insuring that the HTTP and XML dissectors have executed), you can steal the payload from xml.cdata, sun a series of substitutions to put it back into XML form, and now you're ready to process your application data.
The steps are:
Hope this was helpful
answered 10 Jun '11, 14:36