This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

How to make dataset such as KDDCup99 via wireshark?

1

I am going to make a dataset such as KDDCup99 for machine learning purposes, but I don't know how can i extract intrinsic and time-based attributes from wireshark analyzer!! KDDCup99 introduces 43 attributes (intrinsic, time-based and host-based attributes), and I am going to extract this attributes from wireshark analyzer. How can i do it?

asked 02 Oct '12, 23:49

Bluebit's gravatar image

Bluebit
21114
accept rate: 0%

edited 02 Oct '12, 23:51

You might like to consider https://www.itoc.usma.edu/research/dataset/ also. This is a more recent unlabelled IDS dataset with more sophisticated attacks than the (as I look at it now) outdated KDDCup99.

(28 Feb '13, 20:17) pds

2 Answers:

0

Jaap is mostly right.

One option is to:

  • Use tshark to log packet data to CSV format.
  • Post process that dataset to produce the 'connection' and 'two-second time window' attribute sets.
  • Do some other logging to get 'root_shell','su_attempted', etc attributes. (In Linux: history, last/lastb and /var/log/secure may help.)

A second option, if you need KDDCup99 data fields collected in real-time is to:

  • download the Wireshark source code: SVN Repo
  • hand-code the collection and processing in real-time using *shark's pre-parsed protocol fields in C;
  • then print to file using CSV file format.

The following should help in producing the CSV output from tshark CLI to 'logfile.csv':

tshark 
-i <interface> 
-w logfile.pcap
-c 100
-T fields
-E header=y -E separator=, -E quote=d -E occurrence=f
-e ip.src -e ip.dst -e ip.proto -e ip.checksum -e tcp.srcport -e tcp.dstport
> logfile.csv

Use Wireshark's packet header browser/details panel to choose which attributes you want to log, then add those attributes to the -e arguments list.

answered 18 Dec '12, 10:03

pds's gravatar image

pds
262
accept rate: 100%

edited 12 Feb '13, 11:43

hey Friends ...me too working on kdd99cup data-set...my query is "how to trim (cut) data-set in 10% kdd99 cup...what are the factors we need to consider while trimming data"....kindly help me with algorithm or code to cut the data-set in to 10% of original...thanks

(28 Feb '13, 19:32) sac

1

Tshark and post process the text output?

answered 03 Oct '12, 02:16

Jaap's gravatar image

Jaap ♦
11.7k16101
accept rate: 14%

your comment is not clear for me!

(03 Oct '12, 04:02) Bluebit