This is a static archive of our old Q&A Site. Please post any new questions and answers at

What to look for in malware captures?


For my student project, I have been working on botnets and for this reason I used Wireshark, but unfortunately, in some points, I am completely confused and need different ideas. I have a botnet traffic pcap files mixed with normal traffic. The name of the dataset is CTU 13 DATASET, I need to understand comprehensively “how each machine infected and become bo t” . I know to right way however how can I implement it I am not sure yet, I need to search files that are downloaded to the malicious websites accessed payload content in the information column of pcap files. Especially, in this part definitely I need ideas because need to choose features then extract them via “tshark” or something like that and then I can implement them on machine learning software. Well I am not completely sure but if I understood rightly, need to focus on “Info”. I thought if I consider spesific word in packet like website name and focus on” frame contains -website name-” I can extract the files, do you think is it right approach? However, if it is right approach, I need to form normal/threat column as you see in image, but I am not sure how can I identify them as a threat or normal utilizing info? Because I need to write formula for this like, for length, length<1000: normal, lenght=>1000.

This is the link to the datasets.

After this process, using tshark or I can convert into the file txt format and using any programming language I need to extract features, then implement them on machine learning software. However, addition to wireshark general features I also need to create a "normal or threat" column in csv format then I can implement it on machine learning software.

If you have any alternative idea, I would like to read as well. Thank you very much; All in best.

asked 03 Aug '17, 05:08

alfrego129's gravatar image

accept rate: 0%

edited 03 Aug '17, 05:28

sindy's gravatar image


One Answer:


I am not completely sure but if I understood rightly, need to focus on “Info”

Not really. The Info column of the packet list provides just a summary information chosen by the author of the dissector of the topmost layer protocol. So for tcp packets which transport http payload, Info contains http-related information, while for tcp packets carrying no payload, Info contains tcp-related information.

What else to do depends on your actual task. If your goal is to teach the automated system to recognize the characteristic patterns from that particular trace set in a real life traffic, you need to analyse at different layers, not necessarily only the payload ones. If your goal is to define a set of rules allowing the automated system to detect any traffic similar to malware one, that's a much more complex task.

answered 03 Aug '17, 05:25

sindy's gravatar image

accept rate: 24%

It will be not complex to much, because it is just student projects. For instance, in my previous attempt, I focused on SYN flooding and using this formula "Tcp.flags.syn<=0: Normal Tcp.flags.syn>0: Threat" I identify them as a threat or normal. Therefore I need to find a simple way to identify “how each machine infected and become bo t" and need to find a way to create a threat or normal column for them, need to find approx 3-4 features, they will be enough according to my lecturer.

(03 Aug '17, 05:34) alfrego129

This is quite far from the purpose of this site, but:

Characteristic patterns in packet payload may be used as warning signs. However, any single packet won't tell you much about how the machine became a bot, you need to follow the full exchange (like repeated attempts to log in using ssh, or file downloads from suspicious sources,...)

(03 Aug '17, 05:45) sindy

Thank you very much @sindy I will.

(03 Aug '17, 05:51) alfrego129