ragraph.analysis.similarity._similarity
#
Module Contents#
Classes#
Similarity analysis of nodes based upon mutual mapping relations. |
- class ragraph.analysis.similarity._similarity.SimilarityAnalysis(rows: List[ragraph.node.Node], cols: List[ragraph.node.Node], edges: List[ragraph.edge.Edge], col_sim_threshold: float = 0.0, row_sim_threshold: float = 0.0)#
Similarity analysis of nodes based upon mutual mapping relations.
- Parameters:
cols – List of column nodes.
rows – List of row nodes.
edges – List of edges from column nodes to row nodes to be used in similarity analysis.
col_sim_threshold – Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).
row_sim_threshold – Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).
- Class Attributes:
similarity_kind: Edge kind for similarity edges. Defaults to
similarity
.
Note
A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.
- property rows: List[ragraph.node.Node]#
List of rows nodes.
- property cols: List[ragraph.node.Node]#
List of column nodes.
- property edges: List[ragraph.edge.Edge]#
List of edges.
- property row_sim_threshold: float#
Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.
- property col_sim_threshold: float#
Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.
- property graph: ragraph.graph.Graph#
Graph containing similarity edges.
- property row_similarity_matrix: numpy.ndarray#
The row similarity matrix based on their mapping row.
- property col_similarity_matrix: numpy.ndarray#
The column similarity matrix based on their mapping column.
- similarity_kind = similarity#
- update_graph() None #
Update Internal similarity graph
- update_row_similarity() None #
Update Jaccard Row Similarity Index edges between (clustered) rows.
- update_col_similarity() None #
Update Jaccard Column Similarity Index edges between (clustered) columns.
- _update_similarity(nodes: List[ragraph.node.Node], mat: numpy.ndarray) None #
Update Jaccard Similarity Index edges between (clustered) nodes.
- cluster_rows(algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None #
Cluster column nodes based on their similarity. Updates Graph in-place.
- Parameters:
algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to
cluster.markov
.**algo_args – Algorithm arguments. See
cluster.markov
for sensible defaults.
- cluster_cols(algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None #
Cluster column nodes based on their similarity. Updates Graph in-place.
- Parameters:
algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to
cluster.markov
.**algo_args – Algorithm arguments. See
cluster.markov
for sensible defaults.
- _cluster(leafs: List[ragraph.node.Node], algo: Callable[[ragraph.graph.Graph, Any], Tuple[List[ragraph.node.Node]]] = cluster.markov, **algo_args) None #
Cluster column nodes based on their similarity. Updates Graph in-place.
- Parameters:
leafs – List of row or column nodes to be clustered.
algo – Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to
cluster.markov
.**algo_args – Algorithm arguments. See
cluster.markov
for sensible defaults.
- row_mapping(row: ragraph.node.Node) List[bool] #
Boolean possession checklist for a row node w.r.t.
self.cols
.
- col_mapping(col: ragraph.node.Node) List[bool] #
Boolean possession checklist for a column node w.r.t.
self.rows
.
- check_mapping(col: ragraph.node.Node, row: ragraph.node.Node) bool #
Check whether a column node maps to a row node.