# ragraph.analysis.similarity¶

## Similarity analyses¶

Graph similarity is often expressed as a metric, where nodes and edges are scanned for similar patterns, properties, or other aspects. There are three levels of equivalence, being structural, automorphic, or regular equivalence. Where each of the former implies all latter equivalences, respectively.

### Available analyses¶

The following algorithms are directly accessible after importing ragraph.analysis.similarity:

• jaccard_index: Jaccard Similarity Index of two objects based on the number properties they both possess divided by the number of properties either of them have.
• jaccard_matrix: Jaccard Similarity Index between a set of objects stored in a square matrix.
Note

Both Jaccard methods require a callable that takes an object and returns a list of booleans representing the possession of a property (the on argument). Some examples are included in the ragraph.analysis.similarity.utils module, like ragraph.analysis.similarity.utils.on_hasattrs.

## SimilarityAnalysis¶

 1 2 3 4 5 6 7 SimilarityAnalysis( rows: List[Node], cols: List[Node], edges: List[Edge], col_sim_threshold: float = 0.0, row_sim_threshold: float = 0.0, ) 

Similarity analysis of nodes based upon mutual mapping relations.

Parameters:

Name Type Description Default
cols List[Node]

List of column nodes.

required
rows List[Node]

List of row nodes.

required
edges List[Edge]

List of edges from column nodes to row nodes to be used in similarity analysis.

required
col_sim_threshold float

Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

0.0
row_sim_threshold float

Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

0.0
Class Attributes
Note

A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.

Source code in ragraph/analysis/similarity/_similarity.py
 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 def __init__( self, rows: List[Node], cols: List[Node], edges: List[Edge], col_sim_threshold: float = 0.0, row_sim_threshold: float = 0.0, ): self._cols: List[Node] = [] self._rows: List[Node] = [] self._edges: List[Edge] = [] self._col_sim_threshold = 0.0 self._row_sim_threshold = 0.0 self._graph: Graph = None # type: ignore self.cols = cols self.rows = rows self.edges = edges self.col_sim_threshold = col_sim_threshold self.row_sim_threshold = row_sim_threshold 

### col_sim_threshold property writable ¶

 1 col_sim_threshold: float 

Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.

### col_similarity_matrix property ¶

 1 col_similarity_matrix: np.ndarray 

The column similarity matrix based on their mapping column.

### cols property writable ¶

 1 cols: List[Node] 

List of column nodes.

### edges property writable ¶

 1 edges: List[Edge] 

List of edges.

### graph property ¶

 1 graph: Graph 

Graph containing similarity edges.

### row_sim_threshold property writable ¶

 1 row_sim_threshold: float 

Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.

### row_similarity_matrix property ¶

 1 row_similarity_matrix: np.ndarray 

The row similarity matrix based on their mapping row.

### rows property writable ¶

 1 rows: List[Node] 

List of rows nodes.

### _cluster¶

 1 2 3 4 5 6 7 _cluster( leafs: List[Node], algo: Callable[ [Graph, Any], Tuple[List[Node]] ] = cluster.markov, **algo_args: Any ) -> None 

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
leafs List[Node]

List of row or column nodes to be clustered.

required
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 def _cluster( self, leafs: List[Node], algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov, **algo_args: Any, ) -> None: """Cluster column nodes based on their similarity. Updates Graph in-place. Arguments: leafs: List of row or column nodes to be clustered. algo: Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to [cluster.markov][ragraph.analysis.cluster.markov]. **algo_args: Algorithm arguments. See [cluster.markov][ragraph.analysis.cluster.markov] for sensible defaults. """ algo(self.graph, leafs=leafs, **algo_args, inplace=True) # type: ignore 

### _update_similarity¶

 1 2 3 _update_similarity( nodes: List[Node], mat: np.ndarray ) -> None 

Update Jaccard Similarity Index edges between (clustered) nodes.

Source code in ragraph/analysis/similarity/_similarity.py
 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 def _update_similarity(self, nodes: List[Node], mat: np.ndarray) -> None: """Update Jaccard Similarity Index edges between (clustered) nodes.""" if not self.graph: self.update_graph() for e in [ edge for edge in self.graph.edges_between_all(nodes, nodes) if edge.kind == self.similarity_kind ]: self.graph.del_edge(e) for row, target in enumerate(nodes): for col, source in enumerate(nodes): if row == col: continue sim = mat[row][col] if not sim: continue self.graph.add_edge( Edge( source, target, kind=self.similarity_kind, weights=dict(similarity=mat[row][col]), ) ) 

### check_mapping¶

 1 check_mapping(col: Node, row: Node) -> bool 

Check whether a column node maps to a row node.

Source code in ragraph/analysis/similarity/_similarity.py
 275 276 277 278 279 280 def check_mapping(self, col: Node, row: Node) -> bool: """Check whether a column node maps to a row node.""" if len(self.graph.directed_edges[col.name][row.name]) > 0: return True else: return False 

### cluster_cols¶

 1 2 3 4 5 6 cluster_cols( algo: Callable[ [Graph, Any], Tuple[List[Node]] ] = cluster.markov, **algo_args: Any ) -> None 

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
 233 234 235 236 237 238 239 240 241 242 243 244 245 246 def cluster_cols( self, algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov, **algo_args: Any, ) -> None: """Cluster column nodes based on their similarity. Updates Graph in-place. Arguments: algo: Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to [cluster.markov][ragraph.analysis.cluster.markov]. **algo_args: Algorithm arguments. See [cluster.markov][ragraph.analysis.cluster.markov] for sensible defaults. """ self._cluster(self.cols, algo, **algo_args) 

### cluster_rows¶

 1 2 3 4 5 6 cluster_rows( algo: Callable[ [Graph, Any], Tuple[List[Node]] ] = cluster.markov, **algo_args: Any ) -> None 

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
 218 219 220 221 222 223 224 225 226 227 228 229 230 231 def cluster_rows( self, algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov, **algo_args: Any, ) -> None: """Cluster column nodes based on their similarity. Updates Graph in-place. Arguments: algo: Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to [cluster.markov][ragraph.analysis.cluster.markov]. **algo_args: Algorithm arguments. See [cluster.markov][ragraph.analysis.cluster.markov] for sensible defaults. """ self._cluster(self.rows, algo, **algo_args) 

### col_mapping¶

 1 col_mapping(col: Node) -> List[bool] 

Boolean possession checklist for a column node w.r.t. self.rows.

Source code in ragraph/analysis/similarity/_similarity.py
 270 271 272 273 def col_mapping(self, col: Node) -> List[bool]: """Boolean possession checklist for a column node w.r.t. [self.rows][ragraph.analysis.similarity.SimilarityAnalysis.rows].""" return [self.check_mapping(col, row) for row in self.rows] 

### row_mapping¶

 1 row_mapping(row: Node) -> List[bool] 

Boolean possession checklist for a row node w.r.t. self.cols.

Source code in ragraph/analysis/similarity/_similarity.py
 265 266 267 268 def row_mapping(self, row: Node) -> List[bool]: """Boolean possession checklist for a row node w.r.t. [self.cols][ragraph.analysis.similarity.SimilarityAnalysis.cols].""" return [self.check_mapping(col, row) for col in self.cols] 

### update_col_similarity¶

 1 update_col_similarity() -> None 

Update Jaccard Column Similarity Index edges between (clustered) columns.

Source code in ragraph/analysis/similarity/_similarity.py
 186 187 188 def update_col_similarity(self) -> None: """Update Jaccard Column Similarity Index edges between (clustered) columns.""" self._update_similarity(self.cols, self.col_similarity_matrix) 

### update_graph¶

 1 update_graph() -> None 

Update Internal similarity graph

Source code in ragraph/analysis/similarity/_similarity.py
 176 177 178 179 180 def update_graph(self) -> None: """Update Internal similarity graph""" self._graph = Graph(nodes=self.cols + self.rows, edges=self.edges) self.update_col_similarity() self.update_row_similarity() 

### update_row_similarity¶

 1 update_row_similarity() -> None 

Update Jaccard Row Similarity Index edges between (clustered) rows.

Source code in ragraph/analysis/similarity/_similarity.py
 182 183 184 def update_row_similarity(self) -> None: """Update Jaccard Row Similarity Index edges between (clustered) rows.""" self._update_similarity(self.rows, self.row_similarity_matrix) 

## jaccard_index¶

 1 2 3 jaccard_index( obj1: Any, obj2: Any, on: Callable[[Any], List[bool]] ) -> float 

Calculate the Jaccard Similarity Index between to objects based on an object description function.

Parameters:

Name Type Description Default
obj1 Any

First object to compare.

required
obj2 Any

Second object to compare.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required

Returns:

Type Description
float

Jaccard Similarity between two objects, which is calculated as the size of the

float

overlap in properties divided by total size of properties they posess.

Source code in ragraph/analysis/similarity/_jaccard.py
 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 def jaccard_index(obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]) -> float: """Calculate the Jaccard Similarity Index between to objects based on an object description function. Arguments: obj1: First object to compare. obj2: Second object to compare. on: Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property. Returns: Jaccard Similarity between two objects, which is calculated as the size of the overlap in properties divided by total size of properties they posess. """ props1 = np.array(on(obj1)) props2 = np.array(on(obj2)) return _calculate(props1, props2) 

## jaccard_matrix¶

 1 2 3 jaccard_matrix( objects: List[Any], on: Callable[[Any], List[bool]] ) -> np.ndarray 

Calculate the Jaccard Similarity Index for a set of objects based on an object description function.

Parameters:

Name Type Description Default
objects List[Any]

List of objects to generate a similarity matrix for.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required
Source code in ragraph/analysis/similarity/_jaccard.py
 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 def jaccard_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray: """Calculate the Jaccard Similarity Index for a set of objects based on an object description function. Arguments: objects: List of objects to generate a similarity matrix for. on: Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property. """ dim = len(objects) mapping = mapping_matrix(objects, on) matrix = np.eye(dim, dtype=float) for i, obj_i in enumerate(objects): for j, obj_j in enumerate(objects): if j <= i: continue value = _calculate(mapping[i, :], mapping[j, :]) matrix[i, j] = value matrix[j, i] = value return matrix 

## _jaccard¶

### Jaccard Similarity Index¶

The index compares two objects, and is calculated as the size of the overlap in properties divided by total size of properties they possess.

For examples on 'object description functions', please refer to the similarity utilities.

References

Kosub, S. (2016). A note on the triangle inequality for the Jaccard distance. Retrieved from arXiv.org Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bulletin de La Société Vaudoise Des Sciences Naturelles. DOI: 10.5169/seals-266450

### _calculate¶

 1 _calculate(props1: np.array, props2: np.array) -> float 

Calculate the Jaccard Index by the boolean object description arrays.

Source code in ragraph/analysis/similarity/_jaccard.py
 21 22 23 24 25 26 27 def _calculate(props1: np.array, props2: np.array) -> float: """Calculate the Jaccard Index by the boolean object description arrays.""" both = np.logical_and(props1, props2).sum() either = np.logical_or(props1, props2).sum() if either: return both / either return 0.0 

### jaccard_index¶

 1 2 3 jaccard_index( obj1: Any, obj2: Any, on: Callable[[Any], List[bool]] ) -> float 

Calculate the Jaccard Similarity Index between to objects based on an object description function.

Parameters:

Name Type Description Default
obj1 Any

First object to compare.

required
obj2 Any

Second object to compare.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required

Returns:

Type Description
float

Jaccard Similarity between two objects, which is calculated as the size of the

float

overlap in properties divided by total size of properties they posess.

Source code in ragraph/analysis/similarity/_jaccard.py
 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 def jaccard_index(obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]) -> float: """Calculate the Jaccard Similarity Index between to objects based on an object description function. Arguments: obj1: First object to compare. obj2: Second object to compare. on: Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property. Returns: Jaccard Similarity between two objects, which is calculated as the size of the overlap in properties divided by total size of properties they posess. """ props1 = np.array(on(obj1)) props2 = np.array(on(obj2)) return _calculate(props1, props2) 

### jaccard_matrix¶

 1 2 3 jaccard_matrix( objects: List[Any], on: Callable[[Any], List[bool]] ) -> np.ndarray 

Calculate the Jaccard Similarity Index for a set of objects based on an object description function.

Parameters:

Name Type Description Default
objects List[Any]

List of objects to generate a similarity matrix for.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required
Source code in ragraph/analysis/similarity/_jaccard.py
 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 def jaccard_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray: """Calculate the Jaccard Similarity Index for a set of objects based on an object description function. Arguments: objects: List of objects to generate a similarity matrix for. on: Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property. """ dim = len(objects) mapping = mapping_matrix(objects, on) matrix = np.eye(dim, dtype=float) for i, obj_i in enumerate(objects): for j, obj_j in enumerate(objects): if j <= i: continue value = _calculate(mapping[i, :], mapping[j, :]) matrix[i, j] = value matrix[j, i] = value return matrix 

### mapping_matrix¶

 1 2 3 mapping_matrix( objects: List[Any], on: Callable[[Any], List[bool]] ) -> np.ndarray 

Calculate an object-property mapping matrix where each entry (i,j) indicates the possession of property j by object i.

Parameters:

Name Type Description Default
objects List[Any]

List of objects to describe.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required
Source code in ragraph/analysis/similarity/_jaccard.py
 72 73 74 75 76 77 78 79 80 81 def mapping_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray: """Calculate an object-property mapping matrix where each entry (i,j) indicates the possession of property j by object i. Arguments: objects: List of objects to describe. on: Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property. """ return np.array([on(obj) for obj in objects]) 

## _similarity¶

### SimilarityAnalysis¶

 1 2 3 4 5 6 7 SimilarityAnalysis( rows: List[Node], cols: List[Node], edges: List[Edge], col_sim_threshold: float = 0.0, row_sim_threshold: float = 0.0, ) 

Similarity analysis of nodes based upon mutual mapping relations.

Parameters:

Name Type Description Default
cols List[Node]

List of column nodes.

required
rows List[Node]

List of row nodes.

required
edges List[Edge]

List of edges from column nodes to row nodes to be used in similarity analysis.

required
col_sim_threshold float

Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

0.0
row_sim_threshold float

Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

0.0
Class Attributes
Note

A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.

Source code in ragraph/analysis/similarity/_similarity.py
 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 def __init__( self, rows: List[Node], cols: List[Node], edges: List[Edge], col_sim_threshold: float = 0.0, row_sim_threshold: float = 0.0, ): self._cols: List[Node] = [] self._rows: List[Node] = [] self._edges: List[Edge] = [] self._col_sim_threshold = 0.0 self._row_sim_threshold = 0.0 self._graph: Graph = None # type: ignore self.cols = cols self.rows = rows self.edges = edges self.col_sim_threshold = col_sim_threshold self.row_sim_threshold = row_sim_threshold 

#### col_sim_threshold property writable ¶

 1 col_sim_threshold: float 

Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.

#### col_similarity_matrix property ¶

 1 col_similarity_matrix: np.ndarray 

The column similarity matrix based on their mapping column.

#### cols property writable ¶

 1 cols: List[Node] 

List of column nodes.

#### edges property writable ¶

 1 edges: List[Edge] 

List of edges.

#### graph property ¶

 1 graph: Graph 

Graph containing similarity edges.

#### row_sim_threshold property writable ¶

 1 row_sim_threshold: float 

Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.

#### row_similarity_matrix property ¶

 1 row_similarity_matrix: np.ndarray 

The row similarity matrix based on their mapping row.

#### rows property writable ¶

 1 rows: List[Node] 

List of rows nodes.

#### _cluster¶

 1 2 3 4 5 6 7 _cluster( leafs: List[Node], algo: Callable[ [Graph, Any], Tuple[List[Node]] ] = cluster.markov, **algo_args: Any ) -> None 

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
leafs List[Node]

List of row or column nodes to be clustered.

required
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 def _cluster( self, leafs: List[Node], algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov, **algo_args: Any, ) -> None: """Cluster column nodes based on their similarity. Updates Graph in-place. Arguments: leafs: List of row or column nodes to be clustered. algo: Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to [cluster.markov][ragraph.analysis.cluster.markov]. **algo_args: Algorithm arguments. See [cluster.markov][ragraph.analysis.cluster.markov] for sensible defaults. """ algo(self.graph, leafs=leafs, **algo_args, inplace=True) # type: ignore 

#### _update_similarity¶

 1 2 3 _update_similarity( nodes: List[Node], mat: np.ndarray ) -> None 

Update Jaccard Similarity Index edges between (clustered) nodes.

Source code in ragraph/analysis/similarity/_similarity.py
 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 def _update_similarity(self, nodes: List[Node], mat: np.ndarray) -> None: """Update Jaccard Similarity Index edges between (clustered) nodes.""" if not self.graph: self.update_graph() for e in [ edge for edge in self.graph.edges_between_all(nodes, nodes) if edge.kind == self.similarity_kind ]: self.graph.del_edge(e) for row, target in enumerate(nodes): for col, source in enumerate(nodes): if row == col: continue sim = mat[row][col] if not sim: continue self.graph.add_edge( Edge( source, target, kind=self.similarity_kind, weights=dict(similarity=mat[row][col]), ) ) 

#### check_mapping¶

 1 check_mapping(col: Node, row: Node) -> bool 

Check whether a column node maps to a row node.

Source code in ragraph/analysis/similarity/_similarity.py
 275 276 277 278 279 280 def check_mapping(self, col: Node, row: Node) -> bool: """Check whether a column node maps to a row node.""" if len(self.graph.directed_edges[col.name][row.name]) > 0: return True else: return False 

#### cluster_cols¶

 1 2 3 4 5 6 cluster_cols( algo: Callable[ [Graph, Any], Tuple[List[Node]] ] = cluster.markov, **algo_args: Any ) -> None 

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
 233 234 235 236 237 238 239 240 241 242 243 244 245 246 def cluster_cols( self, algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov, **algo_args: Any, ) -> None: """Cluster column nodes based on their similarity. Updates Graph in-place. Arguments: algo: Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to [cluster.markov][ragraph.analysis.cluster.markov]. **algo_args: Algorithm arguments. See [cluster.markov][ragraph.analysis.cluster.markov] for sensible defaults. """ self._cluster(self.cols, algo, **algo_args) 

#### cluster_rows¶

 1 2 3 4 5 6 cluster_rows( algo: Callable[ [Graph, Any], Tuple[List[Node]] ] = cluster.markov, **algo_args: Any ) -> None 

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
 218 219 220 221 222 223 224 225 226 227 228 229 230 231 def cluster_rows( self, algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov, **algo_args: Any, ) -> None: """Cluster column nodes based on their similarity. Updates Graph in-place. Arguments: algo: Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to [cluster.markov][ragraph.analysis.cluster.markov]. **algo_args: Algorithm arguments. See [cluster.markov][ragraph.analysis.cluster.markov] for sensible defaults. """ self._cluster(self.rows, algo, **algo_args) 

#### col_mapping¶

 1 col_mapping(col: Node) -> List[bool] 

Boolean possession checklist for a column node w.r.t. self.rows.

Source code in ragraph/analysis/similarity/_similarity.py
 270 271 272 273 def col_mapping(self, col: Node) -> List[bool]: """Boolean possession checklist for a column node w.r.t. [self.rows][ragraph.analysis.similarity.SimilarityAnalysis.rows].""" return [self.check_mapping(col, row) for row in self.rows] 

#### row_mapping¶

 1 row_mapping(row: Node) -> List[bool] 

Boolean possession checklist for a row node w.r.t. self.cols.

Source code in ragraph/analysis/similarity/_similarity.py
 265 266 267 268 def row_mapping(self, row: Node) -> List[bool]: """Boolean possession checklist for a row node w.r.t. [self.cols][ragraph.analysis.similarity.SimilarityAnalysis.cols].""" return [self.check_mapping(col, row) for col in self.cols] 

#### update_col_similarity¶

 1 update_col_similarity() -> None 

Update Jaccard Column Similarity Index edges between (clustered) columns.

Source code in ragraph/analysis/similarity/_similarity.py
 186 187 188 def update_col_similarity(self) -> None: """Update Jaccard Column Similarity Index edges between (clustered) columns.""" self._update_similarity(self.cols, self.col_similarity_matrix) 

#### update_graph¶

 1 update_graph() -> None 

Update Internal similarity graph

Source code in ragraph/analysis/similarity/_similarity.py
 176 177 178 179 180 def update_graph(self) -> None: """Update Internal similarity graph""" self._graph = Graph(nodes=self.cols + self.rows, edges=self.edges) self.update_col_similarity() self.update_row_similarity() 

#### update_row_similarity¶

 1 update_row_similarity() -> None 

Update Jaccard Row Similarity Index edges between (clustered) rows.

Source code in ragraph/analysis/similarity/_similarity.py
 182 183 184 def update_row_similarity(self) -> None: """Update Jaccard Row Similarity Index edges between (clustered) rows.""" self._update_similarity(self.rows, self.row_similarity_matrix) 

## utils¶

### on_checks¶

 1 2 3 on_checks( checks: List[Callable[[Any], bool]] ) -> Callable[[Any], List[bool]] 

Get an object description function that runs a predefined set of checks (which should be in a fixed order) and returns their boolean results.

Parameters:

Name Type Description Default
checks List[Callable[[Any], bool]]

Checks to perform.

required

Returns:

Type Description
Callable[[Any], List[bool]]

Object description function indicating check passings.

Source code in ragraph/analysis/similarity/utils.py
 18 19 20 21 22 23 24 25 26 27 28 def on_checks(checks: List[Callable[[Any], bool]]) -> Callable[[Any], List[bool]]: """Get an object description function that runs a predefined set of checks (which should be in a fixed order) and returns their boolean results. Arguments: checks: Checks to perform. Returns: Object description function indicating check passings. """ return lambda obj: [check(obj) for check in checks] 

### on_contains¶

 1 2 3 on_contains( contents: List[Any], ) -> Callable[[Any], List[bool]] 

Check whether an object contains certain contents.

Parameters:

Name Type Description Default
contents List[Any]

Contents to check for with lambda x: x in obj.

required

Returns:

Type Description
Callable[[Any], List[bool]]

Object description function indicating content presence.

Source code in ragraph/analysis/similarity/utils.py
 45 46 47 48 49 50 51 52 53 54 def on_contains(contents: List[Any]) -> Callable[[Any], List[bool]]: """Check whether an object contains certain contents. Arguments: contents: Contents to check for with lambda x: x in obj. Returns: Object description function indicating content presence. """ return lambda obj: [content in obj for content in contents] 

### on_hasattrs¶

 1 2 3 on_hasattrs( attrs: List[str], ) -> Callable[[Any], List[bool]] 

Get an object description function that checks whether an instance possesses certain attributes. It does not check the values thereof!

Parameters:

Name Type Description Default
attrs List[str]

List of attributes to check the existence of.

required

Returns:

Type Description
Callable[[Any], List[bool]]

Object description function indicating attribute possession.

Source code in ragraph/analysis/similarity/utils.py
  5 6 7 8 9 10 11 12 13 14 15 def on_hasattrs(attrs: List[str]) -> Callable[[Any], List[bool]]: """Get an object description function that checks whether an instance possesses certain attributes. It does not check the values thereof! Arguments: attrs: List of attributes to check the existence of. Returns: Object description function indicating attribute possession. """ return lambda obj: [hasattr(obj, attr) for attr in attrs] 

### on_hasweights¶

 1 2 3 on_hasweights( weights: List[str], threshold: float = 0.0 ) -> Callable[[Any], List[bool]] 

Check whether an objects has certain weights above a threshold in its weights dictionary property.

Parameters:

Name Type Description Default
weights List[str]

Keys to the obj.weights dictionary to check.

required
threshold float

Threshold to verify against.

0.0

Returns:

Type Description
Callable[[Any], List[bool]]

Object description function indicating weights exceeding a threshold.

Source code in ragraph/analysis/similarity/utils.py
 31 32 33 34 35 36 37 38 39 40 41 42 def on_hasweights(weights: List[str], threshold: float = 0.0) -> Callable[[Any], List[bool]]: """Check whether an objects has certain weights above a threshold in its weights dictionary property. Arguments: weights: Keys to the obj.weights dictionary to check. threshold: Threshold to verify against. Returns: Object description function indicating weights exceeding a threshold. """ return lambda obj: [obj.weights.get(w, 0.0) >= threshold for w in weights]