ragraph.analysis.similarity._similarity#

Module Contents#

Classes#

SimilarityAnalysis

Similarity analysis of nodes based upon mutual mapping relations.

class ragraph.analysis.similarity._similarity.SimilarityAnalysis(rows: List[ragraph.node.Node], cols: List[ragraph.node.Node], edges: List[ragraph.edge.Edge], col_sim_threshold: float = 0.0, row_sim_threshold: float = 0.0)#

Similarity analysis of nodes based upon mutual mapping relations.

Parameters:
  • cols – List of column nodes.

  • rows – List of row nodes.

  • edges – List of edges from column nodes to row nodes to be used in similarity analysis.

  • col_sim_threshold – Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

  • row_sim_threshold – Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

Class Attributes:

similarity_kind: Edge kind for similarity edges. Defaults to similarity.

Note

A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.

property rows: List[ragraph.node.Node]#

List of rows nodes.

property cols: List[ragraph.node.Node]#

List of column nodes.

property edges: List[ragraph.edge.Edge]#

List of edges.

property row_sim_threshold: float#

Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.

property col_sim_threshold: float#

Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.

property graph: ragraph.graph.Graph#

Graph containing similarity edges.

property row_similarity_matrix: numpy.ndarray#

The row similarity matrix based on their mapping row.

property col_similarity_matrix: numpy.ndarray#

The column similarity matrix based on their mapping column.

similarity_kind = similarity#
update_graph() None#

Update Internal similarity graph

update_row_similarity() None#

Update Jaccard Row Similarity Index edges between (clustered) rows.

update_col_similarity() None#

Update Jaccard Column Similarity Index edges between (clustered) columns.

_update_similarity(nodes: List[ragraph.node.Node], mat: numpy.ndarray) None#

Update Jaccard Similarity Index edges between (clustered) nodes.

cluster_rows(algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None#

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:
  • algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

  • **algo_args – Algorithm arguments. See cluster.markov for sensible defaults.

cluster_cols(algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None#

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:
  • algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

  • **algo_args – Algorithm arguments. See cluster.markov for sensible defaults.

_cluster(leafs: List[ragraph.node.Node], algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None#

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:
  • leafs – List of row or column nodes to be clustered.

  • algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

  • **algo_args – Algorithm arguments. See cluster.markov for sensible defaults.

row_mapping(row: ragraph.node.Node) List[bool]#

Boolean possession checklist for a row node w.r.t. self.cols.

col_mapping(col: ragraph.node.Node) List[bool]#

Boolean possession checklist for a column node w.r.t. self.rows.

check_mapping(col: ragraph.node.Node, row: ragraph.node.Node) bool#

Check whether a column node maps to a row node.