This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Extracting SOAP XML Payload

1

Using Lua and Tshark I'm attempting to obtain the XML payload from SOAP messages exchanged with my web service. I went with a listener approach (see program below), but it doesn't appear to be working properly. The s_xml_cdata print statement simply prints "RB14". I don't receive the full XML. Admittedly I'm new to both Tshark and Lua so I may be making some rookie mistakes here. I scoured the Web for any examples, but I've yet to come up with anything helpful. My ultimate goal is to save the SOAP XML to a flat file and/or redirect to a named pipe.

xml_cdata_f = Field.new("xml.cdata")
xml_tag_f = Field.new("xml.tag")

local tap = Listener.new(nil, "xml" )

function tap.packet(pinfo, pvb, xml)

print("\nTap Hit!")

local s_xml_cdata = tostring(xml_cdata_f())
local s_xml_tag = tostring(xml_tag_f())

print("\n" .. s_xml_cdata)
print("\n" .. s_xml_tag)
end

function tap.draw(xml) end

function tap.reset(xml) end

asked 20 Jun ‘11, 11:51

sethlwilson's gravatar image

sethlwilson
31226
accept rate: 12%

edited 20 Jun ‘11, 12:41

Guy%20Harris's gravatar image

Guy Harris ♦♦
17.4k335196

Essentially, using Lua I want to be able to send to output the re-assembled xml like in the Wireshark GUI. It seems like this should be possible given the xml protocol exists right? How else would the GUI do it?

(21 Jun ‘11, 09:58) sethlwilson


12 Answers:

12next »

1

Here's Lua that extracts the XML fields to a file (with a dotted line in between fields). I tested it against your pcap in tshark and Wireshark. For tshark, run: tshark -r sethwilson.pcap -R "tcp and data"

Note that the file is written in append mode to:

  • UN*X: ~/.wireshark/temp.xml
  • Windows XP: C:\Documents and Settings\your_username\Application Data\Wireshark\temp.xml

-- tap uses dfilter for tcp data and ignores retransmissions
local tap       = Listener.new(nil, "tcp && data && !tcp.analysis.retransmission")
local xml_field = Field.new("data")
local file      = nil

– ####################################################################### – # If not already open, this opens a file for writing (append mode) – ####################################################################### local function open_file() if not file then local path = USER_DIR .. "/temp.xml"

    print("opening file:", path)
    file = assert(io.open(path, "a"), 
                "Can't open file for writing")
end

end

local HTML_REQ = { ["HTTP"] = 1, ["GET "] = 1, ["PUT "] = 1, ["POST"] = 1, }

– ####################################################################### – # Extracts the XML field from the buffer and writes the field to file – ####################################################################### local function handle_xml(pinfo, tvb) if not file then print("no file…ignoring packet") return end

local fieldinfo = xml_field()
local xmldata   = tvb(fieldinfo.offset):string()

-- some of these packets start w/HTTP header...skip to XML
local starts = xmldata:sub(1,4)
if HTML_REQ[starts] ~= nil then
    local pos = string.find(xmldata, "<%?xml version")
    if not pos then
        return
    end

    xmldata = xmldata:sub(pos)
end

print(xmldata)
print("\n\n-- #"..pinfo.number.." ---------------------------------------------------\n\n")

file:write(xmldata)
file:write("\n\n-- #"..pinfo.number.." ---------------------------------------------------\n\n")

end

– ####################################################################### – # tap.packet() is called to notify the Listener of a packet that – # matches its filter rule ("xml" in this case). This can be called – # multiple times before tap.draw(). – ####################################################################### function tap.packet(pinfo, tvb) print("tap.packet", "#"..pinfo.number)

-- XXX: Compensate for no tap.reset() in tshark
if not gui_enabled() then open_file() end

-- wrap the handler in a pcall() in case an error occurs
local ok, msg = pcall(  function() 
                            handle_xml(pinfo,tvb) 
                        end )

-- print any error and bow out
if not ok then
    print("wtf!", msg)
end

end

– ####################################################################### – # tap.draw() is called to notify the Listener to "draw" its results – # that were accumulated in tap.packet(). This is normally called after – # tap.packet(), based on "Preferences > Statistics > Tap update interval". – ####################################################################### function tap.draw() print("tap.draw")

-- flush toilet (NOTE: When $file is garbage collected, it's 
-- automatically flushed and closed...that doesn't mean we
-- can't do it sooner to free resources.)
if file then
    print("closing file")
    file:close()
    file = nil
end

end

– ####################################################################### – # tap.reset() is called to notify the Listener to reset any variables – # or counters in preparation for a packet (passed to tap.packet()). – # This can be called multiple times before a packet is even seen. – # – # XXX: tshark doesn't call this function, but Wireshark does. Bug? – ####################################################################### function tap.reset() print("tap.reset") open_file() end

answered 29 Jun ‘11, 20:10

helloworld's gravatar image

helloworld
3.1k42041
accept rate: 28%

edited 30 Sep ‘11, 12:40

Great piece of code but unfortunately it fails to extract SOAP message if it was distributed over several TCP segments. Anyone can help to solve this issue?

(29 Jan ‘13, 09:07) masgad

1

Well we seem to be getting somewhere now -- the complete XML is printed but by bits and pieces over several packets. Here's the output - files.me.com/sethlwilson/vlrmhm.

answered 30 Jun '11, 11:04

sethlwilson's gravatar image

sethlwilson
31226
accept rate: 12%

By the way the Web services whose traffic I captured (with filter "tcp port 8280") communicates on port 8280.

(30 Jun '11, 11:10) sethlwilson

Yes, exactly. That's the same result I get. We're bypassing TCP reassembly of the HTTP segments, and writing the segments one at a time. If you commented out the dotted lines (debug), you'd get the original payload in the file.

(30 Jun '11, 11:17) helloworld
1

Please don't add comments as answers; you should comment on specific posts. Otherwise, this answer list becomes unwieldy and difficult to follow.

(30 Jun '11, 11:20) helloworld

Sorry, no problem and I agree it looks like spaghetti now.

So, yes this is now giving me exactly what I've needed! So, does this mean that there is a bug with the TCP reassembly or is it the Lua API? Just wondering if I need to open an artifact.

Again, thanks for all your help! I was almost to the point of dropping the use of Wireshark in our solution.

(30 Jun '11, 11:28) sethlwilson

One more question ... when I run this script on a live capture, only the request XML is successfully captured. The response appears mangled.

...

-- #13 ---------------------------------------------------

¨:&óÀ?A.?q¡f¦q¶˜_øç{ÛÔ,¹üz¹£#áõúçÍÞÖ~«·Éø^DFú›?ÅI£Öðî°™çù?× _ðÀª§tÍ pÎP ¡3Ï5r„"‚b™g$¿  úX;Îæüt(U«‰.…8'˜/Yšàiè ­3Ñ!UAåxîý=Ç?ªœv”÷U Ñ(qa9p?Á£á‚+Ÿ¾z„URÒž™í¥¬Tc•” ³3Q;` Å”êë¥-®¤ … ŽD î=Fhr‰²äÑCA<ÚFœBïàïãoóÃ>Æ2ì Üq)Ç+&q‡ëµ‰~Q¨¾,

-- #19 ---------------------------------------------------

(30 Jun '11, 12:47) sethlwilson

OK, I think I understand what's going on here - the data in the variable xmldata is gzip compressed. I suppose then that's because we're bypassing the html dissector.

(30 Jun '11, 13:56) sethlwilson

I doubt the bug is in TCP reassembly because the GUI has no trouble identifying the HTTP/XML packets. My guess is the bug lies in the Lua API (based on the fact that fieldinfo.value and fieldinfo.len do not always correspond to the actual contents of tvb).

Sure, you're welcome! Helping you only helped me better understand how listeners work, which will help me when I start documenting it the Lua API wiki (an ongoing project).

(30 Jun '11, 16:31) helloworld

If Wireshark can determine that xml.data is GZIP'ed, then I presume you can write it into a .gz file for later decompression. I think you'd have to pay mind to http.content_length_header to determine the full/expected length of the file.

(30 Jun '11, 16:39) helloworld
showing 5 of 8 show 3 more comments

0

For some reason, I had problems combining the proto variable function call along with the tostring() call So I broke them up and added a check in between.

On your line:
  local s_xml_cdata = tostring(xml_cdata_f())
try changing to
  local l_xml_cdata = f_xml_cdata()
  if l_xml_cdata == nil then return end
  s_xml_cdata = tostring(l_xml_cdata)

answered 20 Jun '11, 15:56

NewbieBrian's gravatar image

NewbieBrian
1224
accept rate: 0%

I tried what you suggested, but the print function doesn't display all of what should be contained in xml.cdata. It appears to be truncating the string.

(21 Jun '11, 07:29) sethlwilson

The middle display area of Wireshark has about 255 character limit. Truncation is correct. I transform the XML entities back to XML characters, then break up the line by carriage return line feeds, returning an array. I print out the array and I get CDATA files of over 750K pretty printing inside WireShark. I can post an example later.

(21 Jun '11, 11:12) NewbieBrian

0

XML CDATA is often encoded with &lt; (<) &gt; (>) and other codes. There may be a better way to do all that follows with an XML parser, but I spend enough time just getting Wireshark to behave as advertised. I don't have time to dive into Lua nuances. So I manually transfer the CDATA to a string as shown above. Then I use string.gsub() to do searches and replaces:

s_xml_cdata = string.gsub(s_xml_cdata, "&lt;", "<")
s_xml_cdata = string.gsub(s_xml_cdata, "&gt;", ">")
s_xml_cdata = string.gsub(s_xml_cdata, "&#xA;", "")
s_xml_cdata = string.gsub(s_xml_cdata, "&#xD;", "")
s_xml_cdata = string.gsub(s_xml_cdata, "\n", "")
local la_cdata = BreakUpLongLines(s_xml_cdata, ">")

This last line is a call to my function BreakUpLongLines(string_xml_cdata, string split_character). Pass in your string with the reformed XML data, and the character you want to do splits on. In this case, it's basically the greater than symbol. Here is the BreakUpLongLines() function.

--utility to break up long lines into smaller ones
function BreakUpLongLines (str, sep)
    --Array to return each of the elements
    vals = {}; valindex = 1; word = ""
    local strSize = #str
    local offset =0
    while strSize > offset do
        svar, evar=string.find(str, sep, offset)
        if evar == nil then return vals end
        if evar > strSize then return vals end
        vals[valindex] = string.sub(str, offset,evar)
        valindex = valindex+1
        offset=evar+1
    end
    return vals
end

So this function will take any long line and break it by the sep character, which for XML happens to be the greater than symbol. It returns an array of all your lines. Use a simple For loop to print your lines.

for kvar,vvar in next, la_cdata do
    g2stree:add(kvar, la_cdata[kvar])
end

Comments and improvements welcome!

answered 21 Jun '11, 11:28

NewbieBrian's gravatar image

NewbieBrian
1224
accept rate: 0%

Ok, but this doesn't solve the original problem, which is to grab the entire XML document.

(21 Jun '11, 18:03) helloworld

0

I do appreciate your help. Here is my latest rendition incorporating your latest suggestion. The file:write(s_xml_cdata) prints only 4 bytes of the xml, and that data is the value contained in the first set of tags. I'm running this with tshark using the following command: tshark -f "tcp port 8280" -X lua_script:C:Userssetwilxml.lua. My service is going over that port specifically. I have the following Wireshark packaage installed:

Version 1.6.0 (SVN Rev 37592 from /trunk-1.6)

Copyright 1998-2011 Gerald Combs [email protected] and contributors. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compiled (64-bit) with GTK+ 2.22.1, with GLib 2.26.1, with WinPcap (version unknown), with libz 1.2.5, without POSIX capabilities, without libpcre, without SMI, with c-ares 1.7.1, with Lua 5.1, without Python, with GnuTLS 2.10.3, with Gcrypt 1.4.6, without Kerberos, with GeoIP, with PortAudio V19-devel (built Jun 7 2011), with AirPcap.

Running on 64-bit Windows 7, build 7600, with WinPcap version 4.1.2 (packet.dll version 4.1.0.2001), based on libpcap version 1.0 branch 1_0_rel0b (20091008), GnuTLS 2.10.3, Gcrypt 1.4.6, without AirPcap.

Built using Microsoft Visual C++ 9.0 build 21022

do
  local file = io.open("C:\\Users\\setwil\\xml.out", "w")

f_xml_cdata = Field.new("xml.cdata")

–utility to break up long lines into smaller ones function BreakUpLongLines (str, sep) –Array to return each of the elements vals = {}; valindex = 1; word = "" local strSize = #str local offset =0 print("\nstrSize=" .. strSize .. "\n") while strSize > offset do svar, evar=string.find(str, sep, offset) if evar == nil then return vals end if evar > strSize then return vals end vals[valindex] = string.sub(str, offset,evar) valindex = valindex+1 offset=evar+1 end return vals end

local mytap = Listener.new(nil, "xml")

function mytap.packet(pinfo, tvb, xml)

local l_xml_cdata = f_xml_cdata()
local s_xml_cdata = tostring(l_xml_cdata)

s_xml_cdata = string.gsub(s_xml_cdata, &quot;&amp;lt;&quot;, &quot;&lt;&quot;)
s_xml_cdata = string.gsub(s_xml_cdata, &quot;&amp;gt;&quot;, &quot;&gt;&quot;)
s_xml_cdata = string.gsub(s_xml_cdata, &quot;&amp;#xA;&quot;, &quot;&quot;)
s_xml_cdata = string.gsub(s_xml_cdata, &quot;&amp;#xD;&quot;, &quot;&quot;)
s_xml_cdata = string.gsub(s_xml_cdata, &quot;\n&quot;, &quot;&quot;)

file:write(s_xml_cdata)

local la_cdata = BreakUpLongLines(s_xml_cdata, &quot;&gt;&quot;)

file:write(&quot;\nBefore loop&quot;)
for i,v in ipairs(la_cdata), la_cdata do
  file:write(v)
end
file:write(&quot;\nAfter loop&quot;)

end

function mytap.draw() end

function mytap.reset() file:close() end end

This answer is marked “community wiki”.

answered 21 Jun ‘11, 12:37

sethlwilson's gravatar image

sethlwilson
31226
accept rate: 12%

0

To dump XML documents from a tap/listener:

local tap = Listener.new(nil, "xml")
local xml_field = Field.new("xml")

function tap.packet (pinfo, tvb)

-- Extract the XML field (which contains the full XML document).
--
-- NOTE: `xml_field()` returns a `FieldInfo` object. See this link for more info:
-- http://www.wireshark.org/docs/wsug_html_chunked/lua_module_Field.html#lua_class_FieldInfo

local fieldinfo = xml_field()

-- `print()` refuses to print the field because it &quot;may contain invalid
-- characters&quot;. The particular subfield that&#39;s causing the problem is
-- `fieldinfo.label` (not sure why). One would think that `fieldinfo.value`
-- would contain the full XML and `fieldinfo.len` would be the document&#39;s
-- byte length, but not true (bug?).
-- 
-- No worries. We can parse the string from the `tvb` ourselves. `fieldinfo.offset`
-- tells us where in the `tvb` the XML document begins. So, all we need to do
-- is convert the bytes from the offset to the end of the buffer into a string.
-- (This assumes that the XML document is the last field in the packet.)
--
-- See http://wiki.wireshark.org/LuaAPI/Tvb#tvbrange:string.28.29

print( tvb(fieldinfo.offset):string() )

end

answered 21 Jun ‘11, 18:10

helloworld's gravatar image

helloworld
3.1k42041
accept rate: 28%

0

Thanks helloworld, now I'm getting somewhere! Funny thing though is that using the code below I only get a portion of the xml and the amount differs between the request and the response. I confirmed though that the offset given by the FieldInfo object is the same as that indicated in the Wireshark GUI. Do you think that there may be some stray non-printable characters in the tvb that are interfering with string conversion? Again I'm running this with tshark from cmd.exe.

local tap = Listener.new(nil, "xml")
local xml_field = Field.new("xml")
local file = io.open("C:\\Users\\setwil\\xml.out", "w")

function tap.packet(pinfo, tvb) local fieldinfo = xml_field() print("fieldinfo.name=" .. tostring(fieldinfo.name)) print("fieldinfo.len=" .. tostring(fieldinfo.len)) print("fieldinfo.offset=" .. tostring(fieldinfo.offset)) –print("fieldinfo.value=" .. tostring(fieldinfo.value))

file:write( tvb(fieldinfo.offset):string() ) file:write("\n————————————————————\n\n")

end

function tap.reset() file:close() end

answered 22 Jun ‘11, 09:05

sethlwilson's gravatar image

sethlwilson
31226
accept rate: 12%

Something that I’ve just discovered is that for one packet that I captured, the fieldinfo.offset is greater than the tvb:len(). Here’s the program output:

fieldinfo.name=xml

fieldinfo.len=6068

fieldinfo.offset=257

fieldinfo.range=3c736f6170656e763a456e76656c6f706520786d6c6e733a…

fieldinfo.generated=false

tvb:length=282

So it leaves me wondering to what buffer fieldinfo.offset refers.

(22 Jun ‘11, 13:04) sethlwilson

An additional discovery: When comparing these results with those of the GUI I discovered that what’s contained in tvb are the bytes from the frame and that its contents are what’s printing to my file.

(22 Jun ‘11, 14:32) sethlwilson

I guess the non-printable chars theory is plausible. Isn’t that discernible from the hexdump in Wireshark/tshark?

The fieldinfo.offset field is the raw offset from the beginning of the frame. I’m not sure why fieldinfo.len and tvb:length are so different.

Do you have a pcap to share (or a way for me to recreate your symptoms)?

(23 Jun ‘11, 19:46) helloworld

Sure I can share the pcap with you. How can I send it to you?

(24 Jun ‘11, 06:42) sethlwilson

You can share the pcap via min.us, and post a link here to it.

(24 Jun ‘11, 16:18) helloworld

Here is the pcap file in question: http://files.me.com/sethlwilson/cd4tw7

(27 Jun ‘11, 09:02) sethlwilson
showing 5 of 6 show 1 more comments

0

Well, I finally managed to obtain the XML payload using the below approach which probably isn't too terribly efficient. During a live capture I capture a single request-response relay and print their soap envelope contents. As for the request, I get all of the xml, byte for byte; but as for the response, there are some bytes missing at the very end of the xml:

... 
</Response>
<SubClientID>MAS01</SubClientID>
<UserID>PSPRPC1</UserID>
</APPX>
</ns:getAPPXResponse>
</SOAP-ENV:Bod

The missing bytes are ...

y>
</SOAP-ENV:Envelope>

At first I thought I was unintentionally lopping off those bytes in my convoluted algorithm, but in fact those bytes are even missing in xml_fieldinfo.value ( xml_fieldinfo.len / 2 == string.length(xml_string) ).

Why would the value member be truncated? Do you think that it's a bug, or is it an imposed limit?

Program source:

local tap = Listener.new(nil, "xml")
local xml_field = Field.new("xml")
local file = io.open("C:\\Users\\setwil\\xml.out", "w")

function tap.packet(pinfo, tvb) local xml_fieldinfo = xml_field()
local xml_hex_string = tostring(xml_fieldinfo.value) print("xml_fieldinfo.len" .. xml_fieldinfo.len .. "\n\n") file:write("xml_hex_string=" .. "\n" .. xml_hex_string .. "\n\n") print("xml_hex_string.length=" .. string.len(xml_hex_string) .. "\n")

local xml_string

for i = 1, string.len(xml_hex_string), 2 do local hex_byte = string.sub(xml_hex_string, i, (i + 1))

if( i == 1 ) then
  xml_string = string.char(tonumber(hex_byte, 16))
else
  xml_string = xml_string .. string.char(tonumber(hex_byte, 16))
end

end

file:write(xml_string .. "\n\n") file:write("—————————————————————\n\n")
end

function tap.reset() file:close() end

answered 24 Jun ‘11, 12:54

sethlwilson's gravatar image

sethlwilson
31226
accept rate: 12%

Yes, I think it’s a bug that xml_fieldinfo.value is missing bytes (and I had commented that in the example code of my earlier answer). Skimming the internal code, I don’t see an imposed limit, so we shouldn’t be seeing this.

(29 Jun ‘11, 21:16) helloworld

0

Here is the pcap file in question: http://files.me.com/sethlwilson/cd4tw7

answered 29 Jun '11, 12:22

sethlwilson's gravatar image

sethlwilson
31226
accept rate: 12%

0

Hi helloworld,

I tried your Lua program, but I'm not getting any output (temp.xml is never created) when replaying my pcap. Tap.packet() never fires.

C:\Program Files\Wireshark>tshark -r C:\Users\setwil\sethwilson.pcap -R "tcp and data" -X lua_script:C:\Users\setwil\helloworld_xml_sniff_devl.lua
tap.draw

C:\Program Files\Wireshark>

answered 30 Jun ‘11, 07:30

sethlwilson's gravatar image

sethlwilson
31226
accept rate: 12%

Something is wrong with your cmdline entry (try quotes around the path of -r). It works for me on OSX and XP.

(30 Jun ‘11, 08:11) helloworld

Another possible problem is that a dissector is parsing out your TCP data (as HTTP?), which could cause the tap’s filter to miss the packets. If so, you’d have to either change the tap filter or disable the custom dissector. If it’s just HTTP, then change both the tap filter and the field name to “xml”.

(30 Jun ‘11, 08:32) helloworld

I just reproduced the symptom. In my case, my HTTP prefs did not include 56013 as an HTTP port, so these packets were falling thru as undissected data (allowing for “tcp and data” filter to catch them). Try the filter/fieldname change I previously mentioned.

(30 Jun ‘11, 08:43) helloworld