Using Lua and Tshark I'm attempting to obtain the XML payload from SOAP messages exchanged with my web service. I went with a listener approach (see program below), but it doesn't appear to be working properly. The s_xml_cdata print statement simply prints "RB14". I don't receive the full XML. Admittedly I'm new to both Tshark and Lua so I may be making some rookie mistakes here. I scoured the Web for any examples, but I've yet to come up with anything helpful. My ultimate goal is to save the SOAP XML to a flat file and/or redirect to a named pipe.
|
Here's Lua that extracts the XML fields to a file (with a dotted line in between fields). I tested it against your pcap in tshark and Wireshark. For tshark, run: Note that the file is written in append mode to:
Great piece of code but unfortunately it fails to extract SOAP message if it was distributed over several TCP segments. Anyone can help to solve this issue?
(29 Jan '13, 09:07)
masgad
|
Well we seem to be getting somewhere now -- the complete XML is printed but by bits and pieces over several packets. Here's the output - files.me.com/sethlwilson/vlrmhm. By the way the Web services whose traffic I captured (with filter "tcp port 8280") communicates on port 8280.
(30 Jun '11, 11:10)
sethlwilson
Yes, exactly. That's the same result I get. We're bypassing TCP reassembly of the HTTP segments, and writing the segments one at a time. If you commented out the dotted lines (debug), you'd get the original payload in the file.
(30 Jun '11, 11:17)
helloworld
1
Please don't add comments as answers; you should comment on specific posts. Otherwise, this answer list becomes unwieldy and difficult to follow.
(30 Jun '11, 11:20)
helloworld
Sorry, no problem and I agree it looks like spaghetti now. So, yes this is now giving me exactly what I've needed! So, does this mean that there is a bug with the TCP reassembly or is it the Lua API? Just wondering if I need to open an artifact. Again, thanks for all your help! I was almost to the point of dropping the use of Wireshark in our solution.
(30 Jun '11, 11:28)
sethlwilson
One more question ... when I run this script on a live capture, only the request XML is successfully captured. The response appears mangled. ... -- #13 --------------------------------------------------- ¨:&óÀ?A.q¡f¦q¶˜_øç{ÛÔ,¹üz¹£#áõúçÍÞÖ~«·Éø^DFú›ÅI£Öðî°™çù?×_ðÀª§tÍ pÎP ¡3Ï5r„"‚b™g$¿ úX;Îæüt(U«‰.…8'˜/Yšàiè3Ñ!UAåxîý=Ǫœv”÷U Ñ(qa9p?Á£á‚+Ÿ¾z„URÒž™í¥¬Tc•” ³3Q;` Å”êë¥-®¤ … ŽDî=Fhr‰²äÑCA<ÚFœBïàïãoóÃ>Æ2ì Üq)Ç+&q‡ëµ‰~Q¨¾, -- #19 ---------------------------------------------------
(30 Jun '11, 12:47)
sethlwilson
OK, I think I understand what's going on here - the data in the variable xmldata is gzip compressed. I suppose then that's because we're bypassing the html dissector.
(30 Jun '11, 13:56)
sethlwilson
I doubt the bug is in TCP reassembly because the GUI has no trouble identifying the HTTP/XML packets. My guess is the bug lies in the Lua API (based on the fact that Sure, you're welcome! Helping you only helped me better understand how listeners work, which will help me when I start documenting it the Lua API wiki (an ongoing project).
(30 Jun '11, 16:31)
helloworld
If Wireshark can determine that
(30 Jun '11, 16:39)
helloworld
showing 5 of 8
show 3 more comments
|
For some reason, I had problems combining the proto variable function call along with the tostring() call So I broke them up and added a check in between.
I tried what you suggested, but the print function doesn't display all of what should be contained in xml.cdata. It appears to be truncating the string.
(21 Jun '11, 07:29)
sethlwilson
The middle display area of Wireshark has about 255 character limit. Truncation is correct. I transform the XML entities back to XML characters, then break up the line by carriage return line feeds, returning an array. I print out the array and I get CDATA files of over 750K pretty printing inside WireShark. I can post an example later.
(21 Jun '11, 11:12)
NewbieBrian
|
XML CDATA is often encoded with
This last line is a call to my function
So this function will take any long line and break it by the sep character, which for XML happens to be the greater than symbol. It returns an array of all your lines. Use a simple For loop to print your lines.
Comments and improvements welcome! Ok, but this doesn't solve the original problem, which is to grab the entire XML document.
(21 Jun '11, 18:03)
helloworld
|
I do appreciate your help. Here is my latest rendition incorporating your latest suggestion. The file:write(s_xml_cdata) prints only 4 bytes of the xml, and that data is the value contained in the first set of tags. I'm running this with tshark using the following command: tshark -f "tcp port 8280" -X lua_script:C:Userssetwilxml.lua. My service is going over that port specifically. I have the following Wireshark packaage installed: Version 1.6.0 (SVN Rev 37592 from /trunk-1.6) Copyright 1998-2011 Gerald Combs [email protected] and contributors. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled (64-bit) with GTK+ 2.22.1, with GLib 2.26.1, with WinPcap (version unknown), with libz 1.2.5, without POSIX capabilities, without libpcre, without SMI, with c-ares 1.7.1, with Lua 5.1, without Python, with GnuTLS 2.10.3, with Gcrypt 1.4.6, without Kerberos, with GeoIP, with PortAudio V19-devel (built Jun 7 2011), with AirPcap. Running on 64-bit Windows 7, build 7600, with WinPcap version 4.1.2 (packet.dll version 4.1.0.2001), based on libpcap version 1.0 branch 1_0_rel0b (20091008), GnuTLS 2.10.3, Gcrypt 1.4.6, without AirPcap. Built using Microsoft Visual C++ 9.0 build 21022
permanent link
This answer is marked "community wiki".
|
To dump XML documents from a tap/listener:
|
Thanks helloworld, now I'm getting somewhere! Funny thing though is that using the code below I only get a portion of the xml and the amount differs between the request and the response. I confirmed though that the offset given by the FieldInfo object is the same as that indicated in the Wireshark GUI. Do you think that there may be some stray non-printable characters in the tvb that are interfering with string conversion? Again I'm running this with tshark from cmd.exe.
Something that I've just discovered is that for one packet that I captured, the fieldinfo.offset is greater than the tvb:len(). Here's the program output: fieldinfo.name=xml fieldinfo.len=6068 fieldinfo.offset=257 fieldinfo.range=3c736f6170656e763a456e76656c6f706520786d6c6e733a... fieldinfo.generated=false tvb:length=282 So it leaves me wondering to what buffer fieldinfo.offset refers.
(22 Jun '11, 13:04)
sethlwilson
An additional discovery: When comparing these results with those of the GUI I discovered that what's contained in tvb are the bytes from the frame and that its contents are what's printing to my file.
(22 Jun '11, 14:32)
sethlwilson
I guess the non-printable chars theory is plausible. Isn't that discernible from the hexdump in Wireshark/tshark? The Do you have a pcap to share (or a way for me to recreate your symptoms)?
(23 Jun '11, 19:46)
helloworld
Sure I can share the pcap with you. How can I send it to you?
(24 Jun '11, 06:42)
sethlwilson
You can share the pcap via min.us, and post a link here to it.
(24 Jun '11, 16:18)
helloworld
Here is the pcap file in question: http://files.me.com/sethlwilson/cd4tw7
(27 Jun '11, 09:02)
sethlwilson
showing 5 of 6
show 1 more comments
|
Well, I finally managed to obtain the XML payload using the below approach which probably isn't too terribly efficient. During a live capture I capture a single request-response relay and print their soap envelope contents. As for the request, I get all of the xml, byte for byte; but as for the response, there are some bytes missing at the very end of the xml:
The missing bytes are ...
At first I thought I was unintentionally lopping off those bytes in my convoluted algorithm, but in fact those bytes are even missing in xml_fieldinfo.value ( xml_fieldinfo.len / 2 == string.length(xml_string) ). Why would the value member be truncated? Do you think that it's a bug, or is it an imposed limit? Program source:
Yes, I think it's a bug that
(29 Jun '11, 21:16)
helloworld
|
Hi helloworld, I tried your Lua program, but I'm not getting any output (temp.xml is never created) when replaying my pcap. Tap.packet() never fires.
Something is wrong with your cmdline entry (try quotes around the path of -r). It works for me on OSX and XP.
(30 Jun '11, 08:11)
helloworld
Another possible problem is that a dissector is parsing out your TCP data (as HTTP?), which could cause the tap's filter to miss the packets. If so, you'd have to either change the tap filter or disable the custom dissector. If it's just HTTP, then change both the tap filter and the field name to "xml".
(30 Jun '11, 08:32)
helloworld
I just reproduced the symptom. In my case, my HTTP prefs did not include 56013 as an HTTP port, so these packets were falling thru as undissected data (allowing for "tcp and data" filter to catch them). Try the filter/fieldname change I previously mentioned.
(30 Jun '11, 08:43)
helloworld
|
Essentially, using Lua I want to be able to send to output the re-assembled xml like in the Wireshark GUI. It seems like this should be possible given the xml protocol exists right? How else would the GUI do it?