Thanks for the help.
A colleague of mine with more coding skills than me were kind enough to help.
That resulted in the dissector script below.
The script looks for packets which have the following identical fields: - IP ID - TCP sequence number - TCP source port - TCP destination port - TCP stream index
The new "TCP Duplicates Protocol" dissector identifies (just as the original) any duplicates by inserting a new boolean field called "Duplicated". If a duplicate exist the duplicate frame number is also inserted in a separate field called "DupFrame". Finally, the first frame of any duplicates are identified with another boolean field called "FirstSeen".
For my use case I can use a display filter like this "tcpdup.duplicate == false || tcpdup.firstseen == true"
This results in the dissector filtering out all the duplicates, while still showing the frames that were not duplicated.
I'm sure there are more ways to skin this cat, and most likely also more elegant ways, but for now this serves the purpose.
Hopefully someone else can get benefit from this dissector :-)
-- our new Proto object
local tcpdup = Proto("tcpdup","TCP Duplicates Protocol")
– new fields for our "tcpdup" protocol
– the purpose for these is so they can be filtered upon
local pf_is_dup = ProtoField.bool("tcpdup.duplicate", "Duplicated")
local pf_dup_frame = ProtoField.framenum("tcpdup.frame", "DupFrame", base.NONE)
local pf_dup_frame_firstseen = ProtoField.bool("tcpdup.firstseen", "FirstSeen")
– register the ProtoFields above
tcpdup.fields = { pf_is_dup, pf_dup_frame, pf_dup_frame_firstseen }
– some existing fields we need to extract from TCP packets, to determine duplicates
– all of these must be the same for us to consider two packets duplicates
local f_ip_id = Field.new("ip.id")
local f_tcp_seq = Field.new("tcp.seq")
local f_tcp_srcport = Field.new("tcp.srcport")
local f_tcp_dstport = Field.new("tcp.dstport")
local f_tcp_stream = Field.new("tcp.stream")
– the table we use to track seen packet #s and seen field info
– we'll use this as both an array and map table
– the array portion is indexed by packet number
– the map portion is keyed by "ip.id:tcp.seq:tcp.stream"
– the resultant for both is the same instance of a subtable with the
– packet numbers of the dups in an array list
local packets = {}
local function generateKey(…)
local t = { … }
return table.concat(t, ':')
end
– adds the packet's number to both the array and map
– which is done when we see a particular set of fields for the first time
local function addPacketList(pnum, key)
local list = { pnum }
packets[key] = list
packets[pnum] = list
end
– adds the packet to the array part, using an existing list of dups
– also adds the packet's number to the list of dups
local function addPacket(pnum, list)
– add this packet's number to the array portion of the big table
packets[pnum] = list
– add this packet's number to the list of dups
list[#list + 1] = pnum
end
– whenever a new capture file is opened, we want to reset our table
– so we hook into the init() routine to do that
function tcpdup.init()
packets = {}
end
– some forward "declarations" of helper functions we use in the dissector
local createProtoTree
– our dissector function
function tcpdup.dissector(tvb, pinfo, tree)
– first, check if this is an tcp packet, by seeing if it has a tcp.seq
local tcp_seq = select(1, f_tcp_seq())
if not tcp_seq then
-- not a TCP packet
return
end
local pnum = pinfo.number
-- see if we've already processed this packet number
local list = packets[pnum]
if not list then
-- haven't processed this packet
-- see if the fields match another packet we've seen before
local ip_id = select(1, f_ip_id())
local tcp_seq = select(1, f_tcp_seq())
local tcp_srcport = select(1, f_tcp_srcport())
local tcp_dstport = select(1, f_tcp_dstport())
local tcp_stream = select(1, f_tcp_stream())
local key = generateKey(tostring(ip_id), tostring(tcp_seq), tostring(tcp_srcport), tostring(tcp_dstport), tostring(tcp_stream))
list = packets[key]
if not list then
-- haven't seen these fields before, so add it as a non-dup (so far)
addPacketList(pnum, key)
createProtoTree(pnum, tree)
else
-- we haven't processed this packet, but we have seen the same fields
-- so it's a duplicate. Add its number to the array and entry...
addPacket(pnum, list)
-- and now create its tree
createProtoTree(pnum, tree, list)
end
else
-- we found the packet number already in the table, which means
-- we've processed it before
createProtoTree(pnum, tree, list)
end
end
createProtoTree = function (pnum, root, list)
– add our "protocol"
local tree = root:add(tcpdup)
local number_of_packets
number_of_packets=1
if not list or #list < 2 then
– it's not a duplicate
tree:add(pf_is_dup, false):set_generated()
else
tree:add(pf_is_dup, true):set_generated()
– now add the other packet numbers as reference tree item fields
for _, num in ipairs(list) do
if num < pnum then
tree:add(pf_dup_frame, num):set_generated()
tree:add(pf_dup_frame_firstseen,false):set_generated()
end
if num > pnum then
tree:add(pf_dup_frame, num):set_generated()
tree:add(pf_dup_frame_firstseen,true):set_generated()
end
end
end
end
– then we register tcpdup as a postdissector
register_postdissector(tcpdup)
answered 09 Aug ‘16, 05:54
NJL
21●4●4●8
accept rate: 0%
just a question - have you tried to do get rid of the duplicates using Super Deduper?
Or editcap which comes with Wireshark?
I tried editcap and used it initially to remove the vast majority of the duplicates. I should have stated that but forgot :-)
The duplicates that are left are from traffic that had to be routed (source and destination on different subnets on the same switch). They have different source and destination MAC addresses, hence editcap does not see them as duplicates even though everything else is identical.
I have not tried Super Deduper, I’ll give that a try, however I could see this LUA script (if I can get it working the way I want) really useful for other use cases.
I’ve given it some more thought and I think what I need is “simply” a number of each duplicate packet. That way I should be able to use a display filter with e.g. “tcpdup.duplicate_packet_number == 1” which would then only display the 1st packet of all duplicates identified.
Any suggestions on how to do it or where to find more information is very welcome :-)
The point is that a dissector is not called just once per frame but several times - once when the file is loaded, and then at least each time you click a packet in the packet list. So you would have to extend the contents of the
list
withframe.number
for each copy of the packet, compare theframe.number
of the currently dissected packet with all the stored ones, and only mark as duplicates those whoseframe.number
would be higher than the lowest stored one. Or you could even assign order numbers 1 to N to each copy, assuming that for the first time, the frames would be dissected in the order they are read in, and that you would always calculate the order number to be assigned to the tree in that dissector run by finding itsframe.number
in the list.