ragraph.analysis.similarity#

Graph similarity is often expressed as a metric, where nodes and edges are scanned for similar patterns, properties, or other aspects. There are three levels of equivalence, being structural, automorphic, or regular equivalence. Where each of the former implies all latter equivalences, respectively.

Available analyses#

The following algorithms are directly accessible after importing ragraph.analysis.similarity:

  • jaccard_index: Jaccard Similarity Index of two objects based on the number properties they both possess divided by the number of properties either of them have.

  • jaccard_matrix: Jaccard Similarity Index between a set of objects stored in a square matrix.

Note

Both Jaccard methods require a callable that takes an object and returns a list of booleans representing the possession of a property (the on argument). Some examples are included in the ragraph.analysis.similarity.utils module, like ragraph.analysis.similarity.utils.on_hasattrs().

Submodules#

Package Contents#

Classes#

SimilarityAnalysis

Similarity analysis of nodes based upon mutual mapping relations.

Functions#

jaccard_index(→ float)

Calculate the Jaccard Similarity Index between to objects based on an object

jaccard_matrix(→ numpy.ndarray)

Calculate the Jaccard Similarity Index for a set of objects based on an object

ragraph.analysis.similarity.jaccard_index(obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]) float#

Calculate the Jaccard Similarity Index between to objects based on an object description function.

Parameters:
  • obj1 – First object to compare.

  • obj2 – Second object to compare.

  • on – Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

Returns:

Jaccard Similarity between two objects, which is calculated as the size of the overlap in properties divided by total size of properties they posess.

ragraph.analysis.similarity.jaccard_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) numpy.ndarray#

Calculate the Jaccard Similarity Index for a set of objects based on an object description function.

Parameters:
  • objects – List of objects to generate a similarity matrix for.

  • on – Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

class ragraph.analysis.similarity.SimilarityAnalysis(rows: List[ragraph.node.Node], cols: List[ragraph.node.Node], edges: List[ragraph.edge.Edge], col_sim_threshold: float = 0.0, row_sim_threshold: float = 0.0)#

Similarity analysis of nodes based upon mutual mapping relations.

Parameters:
  • cols – List of column nodes.

  • rows – List of row nodes.

  • edges – List of edges from column nodes to row nodes to be used in similarity analysis.

  • col_sim_threshold – Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

  • row_sim_threshold – Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

Class Attributes:

similarity_kind: Edge kind for similarity edges. Defaults to similarity.

Note

A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.

property rows: List[ragraph.node.Node]#

List of rows nodes.

property cols: List[ragraph.node.Node]#

List of column nodes.

property edges: List[ragraph.edge.Edge]#

List of edges.

property row_sim_threshold: float#

Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.

property col_sim_threshold: float#

Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.

property graph: ragraph.graph.Graph#

Graph containing similarity edges.

property row_similarity_matrix: numpy.ndarray#

The row similarity matrix based on their mapping row.

property col_similarity_matrix: numpy.ndarray#

The column similarity matrix based on their mapping column.

similarity_kind = similarity#
update_graph() None#

Update Internal similarity graph

update_row_similarity() None#

Update Jaccard Row Similarity Index edges between (clustered) rows.

update_col_similarity() None#

Update Jaccard Column Similarity Index edges between (clustered) columns.

_update_similarity(nodes: List[ragraph.node.Node], mat: numpy.ndarray) None#

Update Jaccard Similarity Index edges between (clustered) nodes.

cluster_rows(algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None#

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:
  • algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

  • **algo_args – Algorithm arguments. See cluster.markov for sensible defaults.

cluster_cols(algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None#

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:
  • algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

  • **algo_args – Algorithm arguments. See cluster.markov for sensible defaults.

_cluster(leafs: List[ragraph.node.Node], algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None#

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:
  • leafs – List of row or column nodes to be clustered.

  • algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

  • **algo_args – Algorithm arguments. See cluster.markov for sensible defaults.

row_mapping(row: ragraph.node.Node) List[bool]#

Boolean possession checklist for a row node w.r.t. self.cols.

col_mapping(col: ragraph.node.Node) List[bool]#

Boolean possession checklist for a column node w.r.t. self.rows.

check_mapping(col: ragraph.node.Node, row: ragraph.node.Node) bool#

Check whether a column node maps to a row node.