Preprocessor

The Preprocessor object transforms data to from .pcap files to Flow.

class preprocessor.Preprocessor(verbose=False)

Preprocessor object for preprocessing flows from pcap files

reader

pcap Reader object for reading .pcap files

Type:reader.Reader
flow_generator

Flow generator object for generating Flow objects

Type:flows.FlowGenerator
Preprocessor.__init__(verbose=False)

Preprocessor object for preprocessing flows from pcap files

Process data

The process method extracts all flows and labels (currently the file name) from a given input .pcap file.

Preprocessor.process(files, labels)

Extract data from files and attach given labels.

Parameters:
  • files (iterable of string) – Paths from which to extract data.
  • labels (iterable of int) – Label corresponding to each path.
Returns:

  • X (np.array of shape=(n_samples, n_features)) – Features extracted from files.
  • y (np.array of shape=(n_samples,)) – Labels for each flow extracted from files.

I/O methods

As this process can take a long time, especially when using the pyshark backend (see Reader), the Preprocessor offers methods to save and load data through the means of pickling.

Preprocessor.save(outfile, X, y)

Save data to given outfile.

Parameters:
  • outfile (string) – Path of file to save data to.
  • X (np.array of shape=(n_samples, n_features)) – Features extracted from files.
  • y (np.array of shape=(n_samples,)) – Labels for each flow extracted from files.
Preprocessor.load(infile)

Load data from given infile.

Parameters:infile (string) – Path of file from which to load data.
Returns:
  • X (np.array of shape=(n_samples, n_features)) – Features extracted from files.
  • y (np.array of shape=(n_samples,)) – Labels for each flow extracted from files.