Skip to content

Clustering analysis

This page described how to use the available algorithms in RaGraph's ragraph.analysis.cluster module as well as some in the ragraph.analysis.heuristics module.

In general, clustering a Graph involves the grouping of components that have many mutual or very strong dependencies (edges). One can also compute complete hierarchies by clustering nodes in an iterative fashion (e.g. clusters of clusters).

Single level clustering

You can tweak some parameters later, but at minimal it suffices to supply a Graph. Let's apply the Markov clustering algorithm ragraph.analysis.cluster.markov to the Ford Climate Control System dataset:

1
2
3
4
5
6
from ragraph import datasets
from ragraph.analysis import cluster

g = datasets.get("climate_control")
g, parents = cluster.markov(g, names=True, inplace=True)
assert parents == ['node.node0', 'node.node1', 'node.node2']

Which shows that our graph has been clustered into three new nodes, with some default naming for them. Lets review the hierarchy dictionary to view this in more detail:

from ragraph import datasets
from ragraph.analysis import cluster

g = datasets.get("climate_control")
g, parents = cluster.markov(g, names=True, inplace=True)
h = g.get_hierarchy_dict()

assert h == {
    "node.node0": {
        "Radiator": {},
        "Engine Fan": {},
        "Condenser": {},
        "Compressor": {},
        "Evaporator Core": {},
        "Accumulator": {},
    },
    "node.node1": {
        "Heater Core": {},
        "Heater Hoses": {},
        "Evaporator Case": {},
        "Actuators": {},
        "Blower Controller": {},
        "Blower Motor": {},
    },
    "node.node2": {
        "Refrigeration Controls": {},
        "Air Controls": {},
        "Sensors": {},
        "Command Distribution": {},
    },
}

The markov algorithm is a single-level clustering algorithm, that gives you a single set of parent nodes (if any) for the given leaf nodes as input. If you only supply a Graph it uses all the graph's leaf nodes.

from ragraph import datasets, plot
from ragraph.analysis import cluster

#from ragraph.node import Node

g = datasets.get("climate_control")
g, parents = cluster.markov(g, names=True, inplace=True)

#g.add_node(Node("system", children=g.roots))

fig = plot.mdm(
    leafs=g.leafs,
    edges=g.edges,
)


fig.write_image("./docs/generated/dsm_clustering_climate_control.svg")

Clustered system architecture of the climate control dataset.

Clustered system architecture of the climate control dataset. Within the system, there are 3 identified clusters.

In the example, we can identify three clusters that each has their own root. For many systems, it is intuitive to include an overarching system parent node. We can do so by uncommenting the add_node function, and using the identified roots as the children of the overarching system node. Even better, we can employ a hierarchical clustering algorithm.

Hierarchical clustering

An example of hierarchical clustering using the Hierarchical Markov clustering algorithm ragraph.analysis.cluster.hierarchical_markov would be the following:

from ragraph import datasets
from ragraph.analysis import cluster

g = datasets.get("climate_control")
g, roots = cluster.hierarchical_markov(g, inplace=True)
h = g.get_hierarchy_dict()
assert h == {
    "node.node3": {
        "node.node0": {
            "Radiator": {},
            "Engine Fan": {},
            "Condenser": {},
            "Compressor": {},
            "Evaporator Core": {},
            "Accumulator": {},
        },
        "node.node1": {
            "Heater Core": {},
            "Heater Hoses": {},
            "Evaporator Case": {},
            "Actuators": {},
            "Blower Controller": {},
            "Blower Motor": {},
        },
        "node.node2": {
            "Refrigeration Controls": {},
            "Air Controls": {},
            "Sensors": {},
            "Command Distribution": {},
        },
    }
}
assert len(roots) == 1, "We found only one root."

Where we can see that all three earlier found clusters have now also been recursively grouped together into a single root because of the interactions between their leaf nodes.

Hierarchical clustering with bus detection

We can also combine hierarchical clustering with bus detection. For this we are going to use the Aircraft Engine dataset. An example of using the Hierarchical Markov clustering with bus detection ragraph.analysis.heuristics.markov_gamma is the following:

from ragraph import datasets, plot
from ragraph.analysis import heuristics

g = datasets.get("aircraft_engine")
g,roots = heuristics.markov_gamma(g, gamma=2.0, names=True, inplace=True)
h = g.get_hierarchy_dict()

fig = plot.mdm(
    leafs=g.leafs,
    edges=g.edges,
)

fig.write_image("./docs/generated/markov_gamma_aircraft_engine.svg")

Clustered system architecture of the climate control dataset.

Clustered system architecture of the climate control dataset. Within the system, there are 3 identified clusters.

In the figure, we can find 11 clusters, and 3 bus components. The bus component spanning rows (1-8) is a global bus component, while the bus component at rows 47-54 and rows 55-60 are embedded in node3 and node1.

We can increase the granularity even further by repeating the Hierarchical Markov clustering with bus detection ragraph.analysis.heuristics.markov_gamma for a specific cluster (root). The clustering parameters such as gamma can be different for each implementation. For demonstration purposes, we add node5 and node2 as roots to the markov_gamma. It is the responsibility of the modeller to determine if the granularity and parameters make sense.

An example could be:

from ragraph import datasets, plot
from ragraph.analysis import heuristics

g = datasets.get("aircraft_engine")
g,roots = heuristics.markov_gamma(g, gamma=2.0, names=True, inplace=True)

g, roots = heuristics.markov_gamma(g, gamma=1.5, names=True, inplace=True, root=g["node.node5"])
g, roots = heuristics.markov_gamma(g, gamma=3.0, names=True, inplace=True, root=g["node.node2"])


fig = plot.mdm(
    leafs=g.leafs,
    edges=g.edges,
)

fig.write_image("./docs/generated/markov_gamma_aircraft_engine_granularity.svg")

Clustered system architecture of the climate control dataset with increased granularity.

Clustered system architecture of the climate control dataset with increased granularity.

Now, node5 and node3 are recursively grouped with more clusters, and new local bus components are found.

Warning

The Hierarchical Markov clustering with bus detection ragraph.analysis.heuristics.markov_gamma can be found under the module Heuristics, as it is a combination of algorithms. Single level clustering and bus detection can be found at ragraph.analysis.cluster and ragraph.analysis.bus.