Cluster

After performing feature extraction, FlowPrint clusters all Flow’s into NetworkDestination according equal (destination IP, destination port)-tuple or TLS certificates.

class cluster.Cluster(load=None)[source]

Cluster object for clustering flows by network destination

samples

Samples used to fit Cluster

Type:np.array of shape=(n_samples,)
counter

Counter for total number of NetworkDestinations generated

Type:int
dict_destination

Dicationary of (dst IP, dst port) -> NetworkDestination

Type:dict
dict_certificate

Dicationary of TLS certificate -> NetworkDestination

Type:dict
Cluster.__init__(load=None)[source]

Cluster flows by network destinations

Parameters:load (string, default=None) – If given, load cluster from json file from ‘load’ path.

Generating clusters

We can create clusters from Flow’s by fitting the Cluster object cluster.Cluster.fit() method. After fitting the cluster, we can use the cluster.Cluster.predict() method to get all cluster labels as numbers. The cluster.Cluster.fit_predict() method combines both methods into a single action.

Cluster.fit(X, y=None)[source]

Fit the clustering algorithm with flow samples X.

Parameters:
  • X (array-like of shape=(n_samples, n_features)) – Flow samples to fit cluster object.
  • y (array-like of shape=(n_samples,), optional) – If given, add labels to each cluster.
Returns:

result – Returns self

Return type:

self

Cluster.predict(X)[source]

Predict cluster labels of X.

Parameters:X (array-like of shape=(n_samples, n_features)) – Samples for which to predict NetworkDestination cluster.
Returns:result – Labels of NetworkDestination cluster corresponding to cluster of fitted samples. Has a value of -1 if no cluster could be matched
Return type:array-like of shape=(n_samples,)
Cluster.fit_predict(X)[source]

Fit and predict cluster with given samples.

Parameters:X (array-like of shape=(n_samples, n_features)) – Samples to fit cluster object.
Returns:result – Labels of cluster corresponding to cluster of fitted samples. Has a value of -1 if no cluster could be matched.
Return type:array-like of shape=(n_samples,)

Cluster views

We extract the different NetworkDestination’s generated by the cluster either as a set or as a dictionary of identifier -> NetworkDestination.

Cluster.clusters()[source]

Return a set of NetworkDestinations in the current cluster object.

Returns:result – Set of NetworkDestinations in cluster.
Return type:set
Cluster.cluster_dict()[source]

Return a dictionary of id -> NetworkDestination.

Returns:result – Dict of NetworkDestination.identifier -> NetworkDestination
Return type:dict

I/O methods

A cluster can be saved and loaded for further analysis. Additionally you can get a copy of the current Cluster.

Cluster.save(outfile)[source]

Saves cluster object to json file.

Parameters:outfile (string) – Path to json file in which to store the cluster object.
Cluster.load(infile)[source]

Loads cluster object from json file.

Parameters:infile (string) – Path to json file from which to load the cluster object.
Cluster.copy()[source]

Returns a (semi-deep) copy of self. The resulting cluster is a deep copy apart from the samples X. Has a tremendous speedup compared to copy.deepcopy(self)

Returns:result – Copy of self
Return type:Cluster

Visualisation

To get a visual representation of the generated clusters we offer the cluster.Cluster.plot() method.

Cluster.plot(annotate=False)[source]

Plot cluster NetworkDestinations.

Parameters:annotate (boolean, default=False) – If True, annotate each cluster