Using Lua and Tshark I'm attempting to obtain the XML payload from SOAP messages exchanged with my web service. I went with a listener approach (see program below), but it doesn't appear to be working properly. The s_xml_cdata print statement simply prints "RB14". I don't receive the full XML. Admittedly I'm new to both Tshark and Lua so I may be making some rookie mistakes here. I scoured the Web for any examples, but I've yet to come up with anything helpful. My ultimate goal is to save the SOAP XML to a flat file and/or redirect to a named pipe.
asked 20 Jun ‘11, 11:51 sethlwilson edited 20 Jun ‘11, 12:41 Guy Harris ♦♦ |
12 Answers:
Here's Lua that extracts the XML fields to a file (with a dotted line in between fields). I tested it against your pcap in tshark and Wireshark. For tshark, run: Note that the file is written in append mode to:
answered 29 Jun ‘11, 20:10 helloworld edited 30 Sep ‘11, 12:40 Great piece of code but unfortunately it fails to extract SOAP message if it was distributed over several TCP segments. Anyone can help to solve this issue? (29 Jan ‘13, 09:07) masgad |
Well we seem to be getting somewhere now -- the complete XML is printed but by bits and pieces over several packets. Here's the output - files.me.com/sethlwilson/vlrmhm. answered 30 Jun '11, 11:04 sethlwilson By the way the Web services whose traffic I captured (with filter "tcp port 8280") communicates on port 8280. (30 Jun '11, 11:10) sethlwilson Yes, exactly. That's the same result I get. We're bypassing TCP reassembly of the HTTP segments, and writing the segments one at a time. If you commented out the dotted lines (debug), you'd get the original payload in the file. (30 Jun '11, 11:17) helloworld 1 Please don't add comments as answers; you should comment on specific posts. Otherwise, this answer list becomes unwieldy and difficult to follow. (30 Jun '11, 11:20) helloworld Sorry, no problem and I agree it looks like spaghetti now. So, yes this is now giving me exactly what I've needed! So, does this mean that there is a bug with the TCP reassembly or is it the Lua API? Just wondering if I need to open an artifact. Again, thanks for all your help! I was almost to the point of dropping the use of Wireshark in our solution. (30 Jun '11, 11:28) sethlwilson One more question ... when I run this script on a live capture, only the request XML is successfully captured. The response appears mangled. ... -- #13 --------------------------------------------------- ¨:&óÀ?A.?q¡f¦q¶˜_øç{ÛÔ,¹üz¹£#áõúçÍÞÖ~«·Éø^DFú›?ÅI£Öðî°™çù?×_ðÀª§tÍ pÎP ¡3Ï5r„"‚b™g$¿ úX;Îæüt(U«‰.…8'˜/Yšàiè3Ñ!UAåxîý=Ç?ªœv”÷U Ñ(qa9p?Á£á‚+Ÿ¾z„URÒž™í¥¬Tc•” ³3Q;` Å”êë¥-®¤ … ŽDî=Fhr‰²äÑCA<ÚFœBïàïãoóÃ>Æ2ì Üq)Ç+&q‡ëµ‰~Q¨¾, -- #19 --------------------------------------------------- (30 Jun '11, 12:47) sethlwilson OK, I think I understand what's going on here - the data in the variable xmldata is gzip compressed. I suppose then that's because we're bypassing the html dissector. (30 Jun '11, 13:56) sethlwilson I doubt the bug is in TCP reassembly because the GUI has no trouble identifying the HTTP/XML packets. My guess is the bug lies in the Lua API (based on the fact that Sure, you're welcome! Helping you only helped me better understand how listeners work, which will help me when I start documenting it the Lua API wiki (an ongoing project). (30 Jun '11, 16:31) helloworld If Wireshark can determine that (30 Jun '11, 16:39) helloworld showing 5 of 8 show 3 more comments |
For some reason, I had problems combining the proto variable function call along with the tostring() call So I broke them up and added a check in between.
answered 20 Jun '11, 15:56 NewbieBrian I tried what you suggested, but the print function doesn't display all of what should be contained in xml.cdata. It appears to be truncating the string. (21 Jun '11, 07:29) sethlwilson The middle display area of Wireshark has about 255 character limit. Truncation is correct. I transform the XML entities back to XML characters, then break up the line by carriage return line feeds, returning an array. I print out the array and I get CDATA files of over 750K pretty printing inside WireShark. I can post an example later. (21 Jun '11, 11:12) NewbieBrian |
XML CDATA is often encoded with
This last line is a call to my function
So this function will take any long line and break it by the sep character, which for XML happens to be the greater than symbol. It returns an array of all your lines. Use a simple For loop to print your lines.
Comments and improvements welcome! answered 21 Jun '11, 11:28 NewbieBrian Ok, but this doesn't solve the original problem, which is to grab the entire XML document. (21 Jun '11, 18:03) helloworld |
I do appreciate your help. Here is my latest rendition incorporating your latest suggestion. The file:write(s_xml_cdata) prints only 4 bytes of the xml, and that data is the value contained in the first set of tags. I'm running this with tshark using the following command: tshark -f "tcp port 8280" -X lua_script:C:Userssetwilxml.lua. My service is going over that port specifically. I have the following Wireshark packaage installed: Version 1.6.0 (SVN Rev 37592 from /trunk-1.6) Copyright 1998-2011 Gerald Combs [email protected] and contributors. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled (64-bit) with GTK+ 2.22.1, with GLib 2.26.1, with WinPcap (version unknown), with libz 1.2.5, without POSIX capabilities, without libpcre, without SMI, with c-ares 1.7.1, with Lua 5.1, without Python, with GnuTLS 2.10.3, with Gcrypt 1.4.6, without Kerberos, with GeoIP, with PortAudio V19-devel (built Jun 7 2011), with AirPcap. Running on 64-bit Windows 7, build 7600, with WinPcap version 4.1.2 (packet.dll version 4.1.0.2001), based on libpcap version 1.0 branch 1_0_rel0b (20091008), GnuTLS 2.10.3, Gcrypt 1.4.6, without AirPcap. Built using Microsoft Visual C++ 9.0 build 21022
This answer is marked “community wiki”. answered 21 Jun ‘11, 12:37 sethlwilson |
To dump XML documents from a tap/listener:
answered 21 Jun ‘11, 18:10 helloworld |
Thanks helloworld, now I'm getting somewhere! Funny thing though is that using the code below I only get a portion of the xml and the amount differs between the request and the response. I confirmed though that the offset given by the FieldInfo object is the same as that indicated in the Wireshark GUI. Do you think that there may be some stray non-printable characters in the tvb that are interfering with string conversion? Again I'm running this with tshark from cmd.exe.
answered 22 Jun ‘11, 09:05 sethlwilson Something that I’ve just discovered is that for one packet that I captured, the fieldinfo.offset is greater than the tvb:len(). Here’s the program output: fieldinfo.name=xml fieldinfo.len=6068 fieldinfo.offset=257 fieldinfo.range=3c736f6170656e763a456e76656c6f706520786d6c6e733a… fieldinfo.generated=false tvb:length=282 So it leaves me wondering to what buffer fieldinfo.offset refers. (22 Jun ‘11, 13:04) sethlwilson An additional discovery: When comparing these results with those of the GUI I discovered that what’s contained in tvb are the bytes from the frame and that its contents are what’s printing to my file. (22 Jun ‘11, 14:32) sethlwilson I guess the non-printable chars theory is plausible. Isn’t that discernible from the hexdump in Wireshark/tshark? The Do you have a pcap to share (or a way for me to recreate your symptoms)? (23 Jun ‘11, 19:46) helloworld Sure I can share the pcap with you. How can I send it to you? (24 Jun ‘11, 06:42) sethlwilson You can share the pcap via min.us, and post a link here to it. (24 Jun ‘11, 16:18) helloworld Here is the pcap file in question: http://files.me.com/sethlwilson/cd4tw7 (27 Jun ‘11, 09:02) sethlwilson showing 5 of 6 show 1 more comments |
Well, I finally managed to obtain the XML payload using the below approach which probably isn't too terribly efficient. During a live capture I capture a single request-response relay and print their soap envelope contents. As for the request, I get all of the xml, byte for byte; but as for the response, there are some bytes missing at the very end of the xml:
The missing bytes are ...
At first I thought I was unintentionally lopping off those bytes in my convoluted algorithm, but in fact those bytes are even missing in xml_fieldinfo.value ( xml_fieldinfo.len / 2 == string.length(xml_string) ). Why would the value member be truncated? Do you think that it's a bug, or is it an imposed limit? Program source:
answered 24 Jun ‘11, 12:54 sethlwilson Yes, I think it’s a bug that (29 Jun ‘11, 21:16) helloworld |
Here is the pcap file in question: http://files.me.com/sethlwilson/cd4tw7 answered 29 Jun '11, 12:22 sethlwilson |
Hi helloworld, I tried your Lua program, but I'm not getting any output (temp.xml is never created) when replaying my pcap. Tap.packet() never fires.
answered 30 Jun ‘11, 07:30 sethlwilson Something is wrong with your cmdline entry (try quotes around the path of -r). It works for me on OSX and XP. (30 Jun ‘11, 08:11) helloworld Another possible problem is that a dissector is parsing out your TCP data (as HTTP?), which could cause the tap’s filter to miss the packets. If so, you’d have to either change the tap filter or disable the custom dissector. If it’s just HTTP, then change both the tap filter and the field name to “xml”. (30 Jun ‘11, 08:32) helloworld I just reproduced the symptom. In my case, my HTTP prefs did not include 56013 as an HTTP port, so these packets were falling thru as undissected data (allowing for “tcp and data” filter to catch them). Try the filter/fieldname change I previously mentioned. (30 Jun ‘11, 08:43) helloworld |
Essentially, using Lua I want to be able to send to output the re-assembled xml like in the Wireshark GUI. It seems like this should be possible given the xml protocol exists right? How else would the GUI do it?