I have a need to determine over a 30 day period, every single unique TCP/IP conversation (pair of IP addresses) communicating to or from our ERP system (iSeries DB2). To satisfy the curious and/or head off any "you don't actually need or want that" ... this is our primary ERP database and we want to migrate it to SQL, but we are not positive we know of every single external system that integrates directly to the database so we need a way to prove this out. A packet capture is the most right way to be positive IMO. I already have a monitor session configured on our core switches sending a copy of all traffic hitting the interface of the iSeries (10.0.0.4) to a 3rd party Windows computer and can see the traffic in Wireshark exactly as I want. I am currently doing a ring capture with a simple filter (host 10.0.0.4) so that it's only direct traffic, no broadcasts. After a few minutes if I stop the capture and run a summary conversation report it is exactly what I want to see. The trick is, how can I effectively do this over a long period of time (30 days)? After even 20 minutes it's over 6 GB of capture files so that is not going to be sustainable to just capture for 30 days and then analyze the terabytes of data. I really don't need the full capture at all, my only requirement is to be able to know for 100% certain after 30 days I've recorded every unique IP address that talked with 10.0.0.4. While it would be cool to also know the "what" of the conversation, it's not a requirement. Is there a way with tshark or something else to capture only this summary data to disk and not full packets? I'm open to any kind of solution that could run on Windows. I can't do anything directly on the iSeries and there doesn't appear to be a way for it to report that anyway. So I'm limited in what I can see from network traffic on the SPAN port in Windows. My current thought train is heading down writing some type of script to execute several times a day against every single file in the ring buffer and concatenate to an ever growing text file:
I'll get loads of duplicate entries and have to do a lot of text processing later on at the end to remove duplicates and header rows so I don't feel like this is an awesome solution. It also takes around 30 to 45 seconds to run that command against a single 1 GB ring buffer .pcap so I'm worried about the scale if I had 100+ files at a time to check. The whole thing just feels so wasteful when really what would be ideal is something that did this type of process:
As long as what it writes in the table for each packet would be overwritten every time it's the same source & destination, by the end of 30 days I should only have maybe a couple dozen entries. Any thoughts or advice appreciated! asked 27 Mar '17, 14:21 JSanders4040 |
2 Answers:
Since you are only interested in the source and destination addresses you could run limit the frame size to 64 byte. The trace would also reveal UDP/TCP port numbers in case there are other services involved. To keep the file sizes manageable I would split them into files of 100 MB each. This can be configured through the capture options (Menu Capture -> Options).
If required you can merge the tracefile into one large monster using mergecap and run all types of tshark-foo. Personally, I prefer Jasper's TraceWrangler. At risk of being banned for mentioning another solution in a Wireshark forum: If you want to keep a low profile on the capture engine, a simple Linux system with tcpdump will do.
Good hunting! answered 28 Mar '17, 14:23 packethunter |
For this kind of thing, gathering and analyzing NetFlow records would be a much better solution, e.g. using ntop. answered 27 Mar '17, 14:26 Jasper ♦♦ |
On Windows you can use dumpcap (part of the Wireshark suite) or windump (part of WinPCap found here) to run a low overhead capture.
This is very promising so far, thank you! It looks like it's about 2,000 packets a second of relevant traffic during the day, probably tapers way off at night. But if I'm doing the math on that correctly ... each packet will be 64 bytes, so that's about 11 GB per day which is much more manageable. That doesn't even count any compression or anything that might be done. TraceWrangler looks very interesting as well, thank you for the detailed comment! I'll let this go all day and see how it ends up but so far so good.
This worked out really well so I have accepted your answer! I ended up additionally expanding the capture filter to drop the top 10 talkers which were responsible for well over 95% of all the traffic and are already well known / legit. Now I'm getting a very comfortable and manageable stream of only the oddball stuff which is what this exercise was all about. So this should be perfectly reasonable to let run for 30 days now.