Preprocessor

The Preprocessor object transforms data to from .pcap files to Flow.

class preprocessor.Preprocessor(verbose=False)[source]

Preprocessor object for preprocessing flows from pcap files

reader

pcap Reader object for reading .pcap files

Type:reader.Reader
flow_generator

Flow generator object for generating Flow objects

Type:flows.FlowGenerator
Preprocessor.__init__(verbose=False)[source]

Preprocessor object for preprocessing flows from pcap files

Process data

The process method extracts all flows and labels (currently the file name) from a given input .pcap file.

Preprocessor.process(files, labels)[source]

Extract data from files and attach given labels.

Parameters:
  • files (iterable of string) – Paths from which to extract data.
  • labels (iterable of int) – Label corresponding to each path.
Returns:

  • X (np.array of shape=(n_samples, n_features)) – Features extracted from files.
  • y (np.array of shape=(n_samples,)) – Labels for each flow extracted from files.

I/O methods

As this process can take a long time, especially when using the pyshark backend (see Reader), the Preprocessor offers methods to save and load data through the means of pickling.

Preprocessor.save(outfile, X, y)[source]

Save data to given outfile.

Parameters:
  • outfile (string) – Path of file to save data to.
  • X (np.array of shape=(n_samples, n_features)) – Features extracted from files.
  • y (np.array of shape=(n_samples,)) – Labels for each flow extracted from files.
Preprocessor.load(infile)[source]

Load data from given infile.

Parameters:infile (string) – Path of file from which to load data.
Returns:
  • X (np.array of shape=(n_samples, n_features)) – Features extracted from files.
  • y (np.array of shape=(n_samples,)) – Labels for each flow extracted from files.