This is our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

The following script:http://ask.wireshark.org/questions/4639/extracting-soap-xml-payload/4835 allows extracting soap messages. However, it fails to do so when the message is distributed over several TCP segments. Any hint to solve this problem?

asked 29 Jan '13, 09:50

masgad's gravatar image

masgad
5114
accept rate: 0%

edited 29 Jan '13, 10:30

Please upload a sample capture on http://cloudshark.org, and post a link here. What version of tshark are you using?

(03 Feb '13, 10:30) helloworld

Thank you helloworld for your reply. tshark version: 1.9.0 and there you go a sample pcap: http://cloudshark.org/captures/74a6deb7aa4e

(04 Feb '13, 06:29) masgad

The original Lua script pulls out TCP data in raw form. In your case, your payload is encoded, so the raw form looks like gibberish.

To work with your packet capture, change the first 3 lines of the Lua script to the following:

-- tap uses dfilter for HTTP and XML, where SOAP content is found
local tap       = Listener.new(nil, "http && xml")
local xml_field = Field.new("xml")

I must admit this Lua script is a bit "hacky".

I recommend checking out tshark -z follow,tcp,ascii,STREAM combined with a little scripting (bash, awk, python, etc). For instance, the following bash script dumps the SOAP streams from your pcap to individual files (one per stream). Further scripting is necessary to clean out the non-SOAP text from the files.

#!/bin/bash

TSHARK=$HOME/src/wireshark/tshark
PCAP=$HOME/tmp/sample.pcap

# write the streams to individual files
while read stream
do
    echo "writing stream $stream --> $stream.txt"
    $TSHARK -qz follow,tcp,ascii,$stream -r $PCAP > $stream.txt
done < <($TSHARK -T fields -e tcp.stream -r $PCAP | sort | uniq)
permanent link

answered 04 Feb '13, 22:13

helloworld's gravatar image

helloworld
3.1k42041
accept rate: 28%

I tried the two suggestions above. The first one (i.e., using "http && xml" filter in the Listener) still missing a lot of xml SOAP messages. Dsipite capturing more SOAP messages, the script still missing SOAP messages besides that its output requires more cleaning to strip off non SOAP text. I am working on a workaround and I will post it later. Thank you.

(05 Feb '13, 13:21) masgad

Which packets in your sample.pcap contain the missing messages from the output?

(05 Feb '13, 22:46) helloworld

The Lua script cannot extract SOAP parts from segmented PDUs except the last one. I think that it couldn't identify them as xml or http. I noticed that wireshark's GUI also sees only the last packet as HTTP/XML whereas the previous packets are identified as TCP.

(11 Feb '13, 10:34) masgad

I made a dirty workaround on the code from this post: http://ask.wireshark.org/questions/4639/extracting-soap-xml-payload/4835 The modified code differs from the original one in the following:

  1. It sets the Listener's filter to tcp rather than http or xml
  2. It creates two separate Field extractors:

local xml_field = Field.new("xml")

local tcp_segment = Field.new("tcp.data")

  1. It checks whether the analyzed packet contains any segemented PDU, and if so it will append the its contents in soap_message.

  2. Every time it checks if the soap_message contains both beginning and closing tags. If yes, it flushes the soap_message before begin processing the next packet.

The problem with this work around, is that sometimes it produces incorrect reassembly where it appends PDU contents from other packets if it happens to arrive in between as far as the soap message was not complete :(

Here is the code:

-- tap uses filter for tcp and ignores retransmissions
local tap       = Listener.new(nil, "tcp && !tcp.analysis.retransmission")
-- XML field extractor
local xml_field = Field.new("xml")
-- TCP segment data extractor
local tcp_segment = Field.new("tcp.data")
local file      = nil
local soap_message = ''

-- #######################################################################
-- # If not already open, this opens a file for writing (append mode)
-- #######################################################################
local function open_file()
    if not file then
        local path = "." .. "/temp.xml"
        print("opening file:", path)
        file = assert(io.open(path, "a"),
                    "Can't open file for writing")
    end
end

local HTML_REQ = {
    ["HTTP"] = 1,
    ["GET "] = 1,
    ["PUT "] = 1,
    ["POST"] = 1,
}

-- #######################################################################
-- # Extracts the XML field from the buffer and writes the field to file
-- #######################################################################
local function handle_xml(pinfo, tvb)
    if not file then
        print("no file...ignoring packet")
        return
    end
    -- extract xml data if contained in single packet
    local data = ''
    local fieldinfo = xml_field()
    local segmentinfo = tcp_segment()
-- Check for PDUs
    if segmentinfo  then
        segmentdata = tvb(segmentinfo.offset):string()
        data = segmentdata
    elseif fieldinfo then
         xmldata   = tvb(fieldinfo.offset):string()
         data = xmldata
    end
    local starts = data:sub(1,4)
    -- some of these packets start w/HTTP header...skip to XML
    if HTML_REQ[starts] ~= nil then
--        local pos = string.find(xmldata, "<%?xml version")
        local pos = string.find(data, "<soap:Envelope")
        if not pos then
            return
        end
        data = data:sub(pos)
    end

    soap_message = soap_message .. data

    print("\n\n-- #"..pinfo.number.." ---------------------------------------------------\n")
    print(data)

    local soap_begin = string.find(soap_message, "<soap:Envelope")
    local soap_end = string.find(soap_message, "</soap:Envelope>")

-- Check for a the completion of the soap meassage
    if  soap_begin and soap_end then
        file:write("\n\n-- #"..pinfo.number.." ---------------------------------------------------\n")
        file:write(soap_message)
        soap_message = ''
    end
end

-- #######################################################################
-- # tap.packet() is called to notify the Listener of a packet that
-- # matches its filter rule ("xml" in this case). This can be called
-- # multiple times before tap.draw().
-- #######################################################################
-- #######################################################################
-- # tap.packet() is called to notify the Listener of a packet that
-- # matches its filter rule ("xml" in this case). This can be called
-- # multiple times before tap.draw().
-- #######################################################################
function tap.packet(pinfo, tvb)
    print("\ntap.packet", "#"..pinfo.number)

    -- XXX: Compensate for no tap.reset() in tshark
    if not gui_enabled() then open_file() end

    -- wrap the handler in a pcall() in case an error occurs
    local ok, msg = pcall(  function()
                                handle_xml(pinfo,tvb)
                            end )

    -- print any error and bow out
    if not ok then
        print("wtf!", msg)
    end

end

-- #######################################################################
-- # tap.draw() is called to notify the Listener to "draw" its results
-- # that were accumulated in tap.packet(). This is normally called after
-- # tap.packet(), based on "Preferences > Statistics > Tap update interval".
-- #######################################################################
function tap.draw()
    print("tap.draw")
    -- flush toilet (NOTE: When $file is garbage collected, it's
    -- automatically flushed and closed...that doesn't mean we
    -- can't do it sooner to free resources.)
    if file then
        print("closing file")
        file:close()
        file = nil
    end
end

-- #######################################################################
-- # tap.reset() is called to notify the Listener to reset any variables
-- # or counters in preparation for a packet (passed to tap.packet()).
-- # This can be called multiple times before a packet is even seen.
-- #
-- # XXX: tshark doesn't call this function, but Wireshark does. Bug?
-- #######################################################################
function tap.reset()
    print("tap.reset")
    open_file()
end
permanent link

answered 11 Feb '13, 11:33

masgad's gravatar image

masgad
5114
accept rate: 0%

edited 11 Feb '13, 12:17

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×832
×431
×294
×13
×2

question asked: 29 Jan '13, 09:50

question was seen: 5,729 times

last updated: 11 Feb '13, 12:17

p​o​w​e​r​e​d by O​S​Q​A