CrossCorrelationGraph¶
The CrossCorrelationGraph is used to compute correlations between different cluster.NetworkDestination
’s and extract cliques.
-
class
cross_correlation_graph.
CrossCorrelationGraph
(window=30, correlation=0.1)[source]¶ CrossCorrelationGraph for computing correlation between clusters
-
window
¶ Threshold for the window size in seconds
Type: float
-
correlation
¶ Threshold for the minimum required correlation
Type: float
-
graph
¶ Cross correlation graph containing all correlations Note that each node in the graph represents an ‘activity signature’ to avoid duplicates. The NetworkDestinations corresponding to each signature are stored in the ‘mapping’ attribute.
Note
IMPORTANT: The CrossCorrelation.graph object is an optimised graph. Each node does not represent a network destination, but represents an activity fingerprint. E.g. when destinations A and B are both only active at time slices 3 and 7, then these destinations are represented by a single node. We use the self.mapping to extract the network destinations from each graph node. This is a huge optimisation for finding cliques as the number of different network destinations theoretically covers the entire IP space, whereas the number of activity fingerprints is bounded by 2^(batch / window), in our work 2^(300/30) = 2^10 = 1024. If these parameters change, the complexity may increase, but never beyond the original bounds. Hence, this optimisation never has a worse time complexity.
Type: nx.Graph
-
mapping
¶ NetworkDestinations corresponding to each node in the graph
Type: dict
-
-
CrossCorrelationGraph.
__init__
(window=30, correlation=0.1)[source]¶ CrossCorrelationGraph for computing correlation between clusters
Parameters: - window (float, default=30) – Threshold for the window size in seconds
- correlation (float, default=0.1) – Threshold for the minimum required correlation
Graph creation¶
We use the cross_correlation_graph.CrossCorrelationGraph.fit()
method to create the CrossCorrelationGraph.
Afterwards, we can detect cliques using the cross_correlation_graph.CrossCorrelationGraph.predict()
method.
Or do all in one step using the cross_correlation_graph.CrossCorrelationGraph.fit_predict()
method.
-
CrossCorrelationGraph.
fit
(cluster, y=None)[source]¶ Fit Cross Correlation Graph.
Parameters: - cluster (Cluster) – Cluster to fit graph, cluster must be populated with flows
- y (ignored) –
Returns: result – Returns self
Return type: self
-
CrossCorrelationGraph.
predict
(X=None, y=None)[source]¶ Fit Cross Correlation Graph and return cliques.
Parameters: - X (ignored) –
- y (ignored) –
Returns: result – Generator of all cliques in the graph
Return type: Generator of cliques
-
CrossCorrelationGraph.
fit_predict
(cluster, y=None)[source]¶ Fit cross correlation graph with clusters from X and return cliques.
Parameters: - cluster (Cluster) – Cluster to fit graph, cluster must be populated with flows
- y (ignored) –
Returns: result – Generator of all cliques in the graph
Return type: Generator of cliques
Graph export¶
The CrossCorrelationGraph can be exported using the export function. This can be useful for further investigation using graphical tools such as Gephi.
-
CrossCorrelationGraph.
export
(outfile, dense=True, format='gexf')[source]¶ Export CrossCorrelationGraph to outfile for further analysis
Parameters: - outfile (string) – File to export CrossCorrelationGraph
- dense (boolean, default=True) –
If True export the dense graph (see IMPORTANT note at graph), this means that each node is represented by the time slices in which they were active. Each node still has the information of all correlated nodes.
If False export the complete graph. Note that these graphs can get very large with lots of edges, therefore, for manual inspection it is recommended to use dense=True instead.
- format (('gexf'|'gml'), default='gexf') – Format in which to export, currently only ‘gexf’, ‘gml’ are supported.
The CrossCorrelationGraph stores its graph object as a dense version of the graph, where each node is represented by its activity window.
See the note at cross_correlation_graph.CrossCorrelationGraph
.
To get an unpacked, i.e., non-dense version of the graph, we provide the unpack()
method.
This method is called when dense=True
in export()
.
-
CrossCorrelationGraph.
unpack
()[source]¶ Unpack an optimized graph. Unpacks a dense graph (see IMPORTANT note at graph) into a graph where every NetworkDestination has its own node. Note that these graphs can get very large with lots of edges, therefore, for manual inspection it is recommended to use the regular graph instead.
Returns: graph – Unpacked graph Return type: nx.Graph