ragraph.analysis.similarity ¶

Similarity analyses¶

Graph similarity is often expressed as a metric, where nodes and edges are scanned for similar patterns, properties, or other aspects. There are three levels of equivalence, being structural, automorphic, or regular equivalence. Where each of the former implies all latter equivalences, respectively.

Available analyses¶

The following algorithms are directly accessible after importing ragraph.analysis.similarity:

jaccard_index: Jaccard Similarity Index of two objects based on the number properties they both possess divided by the number of properties either of them have.
jaccard_matrix: Jaccard Similarity Index between a set of objects stored in a square matrix.

Note

Both Jaccard methods require a callable that takes an object and returns a list of booleans representing the possession of a property (the on argument). Some examples are included in the ragraph.analysis.similarity.utils module, like ragraph.analysis.similarity.utils.on_hasattrs.

SimilarityAnalysis ¶

SimilarityAnalysis(
    rows: List[Node],
    cols: List[Node],
    edges: List[Edge],
    col_sim_threshold: float = 0.0,
    row_sim_threshold: float = 0.0,
)

Similarity analysis of nodes based upon mutual mapping relations.

Parameters:

Name	Type	Description	Default
`cols`	`List[Node]`	List of column nodes.	required
`rows`	`List[Node]`	List of row nodes.	required
`edges`	`List[Edge]`	List of edges from column nodes to row nodes to be used in similarity analysis.	required
`col_sim_threshold`	`float`	Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).	`0.0`
`row_sim_threshold`	`float`	Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).	`0.0`

Class Attributes

Note

A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.

Source code in src/ragraph/analysis/similarity/_similarity.py

def __init__(
    self,
    rows: List[Node],
    cols: List[Node],
    edges: List[Edge],
    col_sim_threshold: float = 0.0,
    row_sim_threshold: float = 0.0,
):
    self._cols: List[Node] = []
    self._rows: List[Node] = []
    self._edges: List[Edge] = []
    self._col_sim_threshold = 0.0
    self._row_sim_threshold = 0.0
    self._graph: Graph = None  # type: ignore

    self.cols = cols
    self.rows = rows
    self.edges = edges
    self.col_sim_threshold = col_sim_threshold
    self.row_sim_threshold = row_sim_threshold

col_sim_threshold `property` `writable` ¶

col_sim_threshold: float

Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.

col_similarity_matrix `property` ¶

col_similarity_matrix: ndarray

The column similarity matrix based on their mapping column.

cols `property` `writable` ¶

cols: List[Node]

List of column nodes.

edges `property` `writable` ¶

edges: List[Edge]

List of edges.

graph `property` ¶

graph: Graph

Graph containing similarity edges.

row_sim_threshold `property` `writable` ¶

row_sim_threshold: float

Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.

row_similarity_matrix `property` ¶

row_similarity_matrix: ndarray

The row similarity matrix based on their mapping row.

rows `property` `writable` ¶

rows: List[Node]

List of rows nodes.

_cluster ¶

_cluster(
    leafs: List[Node],
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name	Type	Description	Default
`leafs`	`List[Node]`	List of row or column nodes to be clustered.	required
`algo`	`Callable[[Graph, Any], Tuple[List[Node]]]`	Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to `cluster.markov`.	`markov`
`**algo_args`	`Any`	Algorithm arguments. See `cluster.markov` for sensible defaults.	`{}`

Source code in src/ragraph/analysis/similarity/_similarity.py

def _cluster(
    self,
    leafs: List[Node],
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        leafs: List of row or column nodes to be clustered.
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    algo(self.graph, leafs=leafs, **algo_args, inplace=True)  # type: ignore

_update_similarity ¶

_update_similarity(nodes: List[Node], mat: ndarray) -> None

Update Jaccard Similarity Index edges between (clustered) nodes.

Source code in src/ragraph/analysis/similarity/_similarity.py

def _update_similarity(self, nodes: List[Node], mat: np.ndarray) -> None:
    """Update Jaccard Similarity Index edges between (clustered) nodes."""
    if not self.graph:
        self.update_graph()

    for e in [
        edge
        for edge in self.graph.edges_between_all(nodes, nodes)
        if edge.kind == self.similarity_kind
    ]:
        self.graph.del_edge(e)

    for row, target in enumerate(nodes):
        for col, source in enumerate(nodes):
            if row == col:
                continue
            sim = mat[row][col]
            if not sim:
                continue
            self.graph.add_edge(
                Edge(
                    source,
                    target,
                    kind=self.similarity_kind,
                    weights=dict(similarity=mat[row][col]),
                )
            )

check_mapping ¶

check_mapping(col: Node, row: Node) -> bool

Check whether a column node maps to a row node.

Source code in src/ragraph/analysis/similarity/_similarity.py

def check_mapping(self, col: Node, row: Node) -> bool:
    """Check whether a column node maps to a row node."""
    if len(self.graph.directed_edges[col.name][row.name]) > 0:
        return True
    else:
        return False

cluster_cols ¶

cluster_cols(
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name	Type	Description	Default
`algo`	`Callable[[Graph, Any], Tuple[List[Node]]]`	Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to `cluster.markov`.	`markov`
`**algo_args`	`Any`	Algorithm arguments. See `cluster.markov` for sensible defaults.	`{}`

Source code in src/ragraph/analysis/similarity/_similarity.py

def cluster_cols(
    self,
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    self._cluster(self.cols, algo, **algo_args)

cluster_rows ¶

cluster_rows(
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name	Type	Description	Default
`algo`	`Callable[[Graph, Any], Tuple[List[Node]]]`	Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to `cluster.markov`.	`markov`
`**algo_args`	`Any`	Algorithm arguments. See `cluster.markov` for sensible defaults.	`{}`

Source code in src/ragraph/analysis/similarity/_similarity.py

def cluster_rows(
    self,
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    self._cluster(self.rows, algo, **algo_args)

col_mapping ¶

col_mapping(col: Node) -> List[bool]

Boolean possession checklist for a column node w.r.t. self.rows.

Source code in src/ragraph/analysis/similarity/_similarity.py

def col_mapping(self, col: Node) -> List[bool]:
    """Boolean possession checklist for a column node w.r.t.
    [`self.rows`][ragraph.analysis.similarity.SimilarityAnalysis.rows]."""
    return [self.check_mapping(col, row) for row in self.rows]

row_mapping ¶

row_mapping(row: Node) -> List[bool]

Boolean possession checklist for a row node w.r.t. self.cols.

Source code in src/ragraph/analysis/similarity/_similarity.py

def row_mapping(self, row: Node) -> List[bool]:
    """Boolean possession checklist for a row node w.r.t.
    [`self.cols`][ragraph.analysis.similarity.SimilarityAnalysis.cols]."""
    return [self.check_mapping(col, row) for col in self.cols]

update_col_similarity ¶

update_col_similarity() -> None

Update Jaccard Column Similarity Index edges between (clustered) columns.

Source code in src/ragraph/analysis/similarity/_similarity.py

def update_col_similarity(self) -> None:
    """Update Jaccard Column Similarity Index edges between (clustered) columns."""
    self._update_similarity(self.cols, self.col_similarity_matrix)

update_graph ¶

update_graph() -> None

Update Internal similarity graph

Source code in src/ragraph/analysis/similarity/_similarity.py

def update_graph(self) -> None:
    """Update Internal similarity graph"""
    self._graph = Graph(nodes=self.cols + self.rows, edges=self.edges)
    self.update_col_similarity()
    self.update_row_similarity()

update_row_similarity ¶

update_row_similarity() -> None

Update Jaccard Row Similarity Index edges between (clustered) rows.

Source code in src/ragraph/analysis/similarity/_similarity.py

def update_row_similarity(self) -> None:
    """Update Jaccard Row Similarity Index edges between (clustered) rows."""
    self._update_similarity(self.rows, self.row_similarity_matrix)

jaccard_index ¶

jaccard_index(
    obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]
) -> float

Calculate the Jaccard Similarity Index between to objects based on an object description function.

Parameters:

Name	Type	Description	Default
`obj1`	`Any`	First object to compare.	required
`obj2`	`Any`	Second object to compare.	required
`on`	`Callable[[Any], List[bool]]`	Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.	required

Returns:

Type	Description
`float`	Jaccard Similarity between two objects, which is calculated as the size of the
`float`	overlap in properties divided by total size of properties they posess.

Source code in src/ragraph/analysis/similarity/_jaccard.py

def jaccard_index(obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]) -> float:
    """Calculate the Jaccard Similarity Index between to objects based on an object
    description function.

    Arguments:
        obj1: First object to compare.
        obj2: Second object to compare.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.

    Returns:
        Jaccard Similarity between two objects, which is calculated as the size of the
        overlap in properties divided by total size of properties they posess.
    """
    props1 = np.array(on(obj1))
    props2 = np.array(on(obj2))
    return _calculate(props1, props2)

jaccard_matrix ¶

jaccard_matrix(
    objects: List[Any], on: Callable[[Any], List[bool]]
) -> ndarray

Calculate the Jaccard Similarity Index for a set of objects based on an object description function.

Parameters:

Name	Type	Description	Default
`objects`	`List[Any]`	List of objects to generate a similarity matrix for.	required
`on`	`Callable[[Any], List[bool]]`	Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.	required

Source code in src/ragraph/analysis/similarity/_jaccard.py

def jaccard_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray:
    """Calculate the Jaccard Similarity Index for a set of objects based on an object
    description function.

    Arguments:
        objects: List of objects to generate a similarity matrix for.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.
    """
    dim = len(objects)
    mapping = mapping_matrix(objects, on)

    matrix = np.eye(dim, dtype=float)
    for i, obj_i in enumerate(objects):
        for j, obj_j in enumerate(objects):
            if j <= i:
                continue
            value = _calculate(mapping[i, :], mapping[j, :])
            matrix[i, j] = value
            matrix[j, i] = value
    return matrix

_jaccard ¶

Jaccard Similarity Index¶

The index compares two objects, and is calculated as the size of the overlap in properties divided by total size of properties they possess.

For examples on 'object description functions', please refer to the similarity utilities.

References

Kosub, S. (2016). A note on the triangle inequality for the Jaccard distance. Retrieved from arXiv.org Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bulletin de La Société Vaudoise Des Sciences Naturelles. DOI: 10.5169/seals-266450

_calculate ¶

_calculate(props1: array, props2: array) -> float

Calculate the Jaccard Index by the boolean object description arrays.

Source code in src/ragraph/analysis/similarity/_jaccard.py

def _calculate(props1: np.array, props2: np.array) -> float:
    """Calculate the Jaccard Index by the boolean object description arrays."""
    both = np.logical_and(props1, props2).sum()
    either = np.logical_or(props1, props2).sum()
    if either:
        return both / either
    return 0.0

jaccard_index ¶

jaccard_index(
    obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]
) -> float

Calculate the Jaccard Similarity Index between to objects based on an object description function.

Parameters:

Name	Type	Description	Default
`obj1`	`Any`	First object to compare.	required
`obj2`	`Any`	Second object to compare.	required
`on`	`Callable[[Any], List[bool]]`	Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.	required

Returns:

Type	Description
`float`	Jaccard Similarity between two objects, which is calculated as the size of the
`float`	overlap in properties divided by total size of properties they posess.

Source code in src/ragraph/analysis/similarity/_jaccard.py

def jaccard_index(obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]) -> float:
    """Calculate the Jaccard Similarity Index between to objects based on an object
    description function.

    Arguments:
        obj1: First object to compare.
        obj2: Second object to compare.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.

    Returns:
        Jaccard Similarity between two objects, which is calculated as the size of the
        overlap in properties divided by total size of properties they posess.
    """
    props1 = np.array(on(obj1))
    props2 = np.array(on(obj2))
    return _calculate(props1, props2)

jaccard_matrix ¶

jaccard_matrix(
    objects: List[Any], on: Callable[[Any], List[bool]]
) -> ndarray

Calculate the Jaccard Similarity Index for a set of objects based on an object description function.

Parameters:

Name	Type	Description	Default
`objects`	`List[Any]`	List of objects to generate a similarity matrix for.	required
`on`	`Callable[[Any], List[bool]]`	Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.	required

Source code in src/ragraph/analysis/similarity/_jaccard.py

def jaccard_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray:
    """Calculate the Jaccard Similarity Index for a set of objects based on an object
    description function.

    Arguments:
        objects: List of objects to generate a similarity matrix for.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.
    """
    dim = len(objects)
    mapping = mapping_matrix(objects, on)

    matrix = np.eye(dim, dtype=float)
    for i, obj_i in enumerate(objects):
        for j, obj_j in enumerate(objects):
            if j <= i:
                continue
            value = _calculate(mapping[i, :], mapping[j, :])
            matrix[i, j] = value
            matrix[j, i] = value
    return matrix

mapping_matrix ¶

mapping_matrix(
    objects: List[Any], on: Callable[[Any], List[bool]]
) -> ndarray

Calculate an object-property mapping matrix where each entry (i,j) indicates the possession of property j by object i.

Parameters:

Name	Type	Description	Default
`objects`	`List[Any]`	List of objects to describe.	required
`on`	`Callable[[Any], List[bool]]`	Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.	required

Source code in src/ragraph/analysis/similarity/_jaccard.py

def mapping_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray:
    """Calculate an object-property mapping matrix where each entry (i,j) indicates the
    possession of property j by object i.

    Arguments:
        objects: List of objects to describe.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.
    """
    return np.array([on(obj) for obj in objects])

_similarity ¶

Similarity analysis¶

SimilarityAnalysis ¶

SimilarityAnalysis(
    rows: List[Node],
    cols: List[Node],
    edges: List[Edge],
    col_sim_threshold: float = 0.0,
    row_sim_threshold: float = 0.0,
)

Similarity analysis of nodes based upon mutual mapping relations.

Parameters:

Name	Type	Description	Default
`cols`	`List[Node]`	List of column nodes.	required
`rows`	`List[Node]`	List of row nodes.	required
`edges`	`List[Edge]`	List of edges from column nodes to row nodes to be used in similarity analysis.	required
`col_sim_threshold`	`float`	Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).	`0.0`
`row_sim_threshold`	`float`	Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).	`0.0`

Class Attributes

Note

A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.

Source code in src/ragraph/analysis/similarity/_similarity.py

def __init__(
    self,
    rows: List[Node],
    cols: List[Node],
    edges: List[Edge],
    col_sim_threshold: float = 0.0,
    row_sim_threshold: float = 0.0,
):
    self._cols: List[Node] = []
    self._rows: List[Node] = []
    self._edges: List[Edge] = []
    self._col_sim_threshold = 0.0
    self._row_sim_threshold = 0.0
    self._graph: Graph = None  # type: ignore

    self.cols = cols
    self.rows = rows
    self.edges = edges
    self.col_sim_threshold = col_sim_threshold
    self.row_sim_threshold = row_sim_threshold

col_sim_threshold `property` `writable` ¶

col_sim_threshold: float

Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.

col_similarity_matrix `property` ¶

col_similarity_matrix: ndarray

The column similarity matrix based on their mapping column.

cols `property` `writable` ¶

cols: List[Node]

List of column nodes.

edges `property` `writable` ¶

edges: List[Edge]

List of edges.

graph `property` ¶

graph: Graph

Graph containing similarity edges.

row_sim_threshold `property` `writable` ¶

row_sim_threshold: float

Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.

row_similarity_matrix `property` ¶

row_similarity_matrix: ndarray

The row similarity matrix based on their mapping row.

rows `property` `writable` ¶

rows: List[Node]

List of rows nodes.

_cluster ¶

_cluster(
    leafs: List[Node],
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name	Type	Description	Default
`leafs`	`List[Node]`	List of row or column nodes to be clustered.	required
`algo`	`Callable[[Graph, Any], Tuple[List[Node]]]`	Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to `cluster.markov`.	`markov`
`**algo_args`	`Any`	Algorithm arguments. See `cluster.markov` for sensible defaults.	`{}`

Source code in src/ragraph/analysis/similarity/_similarity.py

def _cluster(
    self,
    leafs: List[Node],
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        leafs: List of row or column nodes to be clustered.
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    algo(self.graph, leafs=leafs, **algo_args, inplace=True)  # type: ignore

_update_similarity ¶

_update_similarity(nodes: List[Node], mat: ndarray) -> None

Update Jaccard Similarity Index edges between (clustered) nodes.

Source code in src/ragraph/analysis/similarity/_similarity.py

def _update_similarity(self, nodes: List[Node], mat: np.ndarray) -> None:
    """Update Jaccard Similarity Index edges between (clustered) nodes."""
    if not self.graph:
        self.update_graph()

    for e in [
        edge
        for edge in self.graph.edges_between_all(nodes, nodes)
        if edge.kind == self.similarity_kind
    ]:
        self.graph.del_edge(e)

    for row, target in enumerate(nodes):
        for col, source in enumerate(nodes):
            if row == col:
                continue
            sim = mat[row][col]
            if not sim:
                continue
            self.graph.add_edge(
                Edge(
                    source,
                    target,
                    kind=self.similarity_kind,
                    weights=dict(similarity=mat[row][col]),
                )
            )

check_mapping ¶

check_mapping(col: Node, row: Node) -> bool

Check whether a column node maps to a row node.

Source code in src/ragraph/analysis/similarity/_similarity.py

def check_mapping(self, col: Node, row: Node) -> bool:
    """Check whether a column node maps to a row node."""
    if len(self.graph.directed_edges[col.name][row.name]) > 0:
        return True
    else:
        return False

cluster_cols ¶

cluster_cols(
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name	Type	Description	Default
`algo`	`Callable[[Graph, Any], Tuple[List[Node]]]`	Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to `cluster.markov`.	`markov`
`**algo_args`	`Any`	Algorithm arguments. See `cluster.markov` for sensible defaults.	`{}`

Source code in src/ragraph/analysis/similarity/_similarity.py

def cluster_cols(
    self,
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    self._cluster(self.cols, algo, **algo_args)

cluster_rows ¶

cluster_rows(
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name	Type	Description	Default
`algo`	`Callable[[Graph, Any], Tuple[List[Node]]]`	Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to `cluster.markov`.	`markov`
`**algo_args`	`Any`	Algorithm arguments. See `cluster.markov` for sensible defaults.	`{}`

Source code in src/ragraph/analysis/similarity/_similarity.py

def cluster_rows(
    self,
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    self._cluster(self.rows, algo, **algo_args)

col_mapping ¶

col_mapping(col: Node) -> List[bool]

Boolean possession checklist for a column node w.r.t. self.rows.

Source code in src/ragraph/analysis/similarity/_similarity.py

def col_mapping(self, col: Node) -> List[bool]:
    """Boolean possession checklist for a column node w.r.t.
    [`self.rows`][ragraph.analysis.similarity.SimilarityAnalysis.rows]."""
    return [self.check_mapping(col, row) for row in self.rows]

row_mapping ¶

row_mapping(row: Node) -> List[bool]

Boolean possession checklist for a row node w.r.t. self.cols.

Source code in src/ragraph/analysis/similarity/_similarity.py

def row_mapping(self, row: Node) -> List[bool]:
    """Boolean possession checklist for a row node w.r.t.
    [`self.cols`][ragraph.analysis.similarity.SimilarityAnalysis.cols]."""
    return [self.check_mapping(col, row) for col in self.cols]

update_col_similarity ¶

update_col_similarity() -> None

Update Jaccard Column Similarity Index edges between (clustered) columns.

Source code in src/ragraph/analysis/similarity/_similarity.py

def update_col_similarity(self) -> None:
    """Update Jaccard Column Similarity Index edges between (clustered) columns."""
    self._update_similarity(self.cols, self.col_similarity_matrix)

update_graph ¶

update_graph() -> None

Update Internal similarity graph

Source code in src/ragraph/analysis/similarity/_similarity.py

def update_graph(self) -> None:
    """Update Internal similarity graph"""
    self._graph = Graph(nodes=self.cols + self.rows, edges=self.edges)
    self.update_col_similarity()
    self.update_row_similarity()

update_row_similarity ¶

update_row_similarity() -> None

Update Jaccard Row Similarity Index edges between (clustered) rows.

Source code in src/ragraph/analysis/similarity/_similarity.py

def update_row_similarity(self) -> None:
    """Update Jaccard Row Similarity Index edges between (clustered) rows."""
    self._update_similarity(self.rows, self.row_similarity_matrix)

utils ¶

Similarity analysis utilities¶

on_checks ¶

on_checks(
    checks: List[Callable[[Any], bool]]
) -> Callable[[Any], List[bool]]

Get an object description function that runs a predefined set of checks (which should be in a fixed order) and returns their boolean results.

Parameters:

Name	Type	Description	Default
`checks`	`List[Callable[[Any], bool]]`	Checks to perform.	required

Returns:

Type	Description
`Callable[[Any], List[bool]]`	Object description function indicating check passings.

Source code in src/ragraph/analysis/similarity/utils.py

def on_checks(checks: List[Callable[[Any], bool]]) -> Callable[[Any], List[bool]]:
    """Get an object description function that runs a predefined set of checks (which should be in a
    fixed order) and returns their boolean results.

    Arguments:
        checks: Checks to perform.

    Returns:
        Object description function indicating check passings.
    """
    return lambda obj: [check(obj) for check in checks]

on_contains ¶

on_contains(
    contents: List[Any],
) -> Callable[[Any], List[bool]]

Check whether an object contains certain contents.

Parameters:

Name	Type	Description	Default
`contents`	`List[Any]`	Contents to check for with `lambda x: x in obj`.	required

Returns:

Type	Description
`Callable[[Any], List[bool]]`	Object description function indicating content presence.

Source code in src/ragraph/analysis/similarity/utils.py

def on_contains(contents: List[Any]) -> Callable[[Any], List[bool]]:
    """Check whether an object contains certain contents.

    Arguments:
        contents: Contents to check for with `lambda x: x in obj`.

    Returns:
        Object description function indicating content presence.
    """
    return lambda obj: [content in obj for content in contents]

on_hasattrs ¶

on_hasattrs(
    attrs: List[str],
) -> Callable[[Any], List[bool]]

Get an object description function that checks whether an instance possesses certain attributes. It does not check the values thereof!

Parameters:

Name	Type	Description	Default
`attrs`	`List[str]`	List of attributes to check the existence of.	required

Returns:

Type	Description
`Callable[[Any], List[bool]]`	Object description function indicating attribute possession.

Source code in src/ragraph/analysis/similarity/utils.py

def on_hasattrs(attrs: List[str]) -> Callable[[Any], List[bool]]:
    """Get an object description function that checks whether an instance possesses certain
    attributes. It does not check the values thereof!

    Arguments:
        attrs: List of attributes to check the existence of.

    Returns:
        Object description function indicating attribute possession.
    """
    return lambda obj: [hasattr(obj, attr) for attr in attrs]

on_hasweights ¶

on_hasweights(
    weights: List[str], threshold: float = 0.0
) -> Callable[[Any], List[bool]]

Check whether an objects has certain weights above a threshold in its weights dictionary property.

Parameters:

Name	Type	Description	Default
`weights`	`List[str]`	Keys to the `obj.weights` dictionary to check.	required
`threshold`	`float`	Threshold to verify against.	`0.0`

Returns:

Type	Description
`Callable[[Any], List[bool]]`	Object description function indicating weights exceeding a threshold.

Source code in src/ragraph/analysis/similarity/utils.py

def on_hasweights(weights: List[str], threshold: float = 0.0) -> Callable[[Any], List[bool]]:
    """Check whether an objects has certain weights above a threshold in its weights dictionary
    property.

    Arguments:
        weights: Keys to the `obj.weights` dictionary to check.
        threshold: Threshold to verify against.

    Returns:
        Object description function indicating weights exceeding a threshold.
    """
    return lambda obj: [obj.weights.get(w, 0.0) >= threshold for w in weights]

ragraph.analysis.similarity ¶

Similarity analyses¶

Available analyses¶

SimilarityAnalysis ¶

col_sim_threshold property writable ¶

col_similarity_matrix property ¶

cols property writable ¶

edges property writable ¶

graph property ¶

row_sim_threshold property writable ¶

row_similarity_matrix property ¶

rows property writable ¶

_cluster ¶

_update_similarity ¶

check_mapping ¶

cluster_cols ¶

cluster_rows ¶

col_mapping ¶

row_mapping ¶

update_col_similarity ¶

update_graph ¶

update_row_similarity ¶

jaccard_index ¶

jaccard_matrix ¶

_jaccard ¶

Jaccard Similarity Index¶

_calculate ¶

jaccard_index ¶

jaccard_matrix ¶

mapping_matrix ¶

_similarity ¶

Similarity analysis¶

SimilarityAnalysis ¶

col_sim_threshold property writable ¶

col_similarity_matrix property ¶

cols property writable ¶

edges property writable ¶

graph property ¶

row_sim_threshold property writable ¶

row_similarity_matrix property ¶

rows property writable ¶

_cluster ¶

_update_similarity ¶

check_mapping ¶

cluster_cols ¶

cluster_rows ¶

col_mapping ¶

row_mapping ¶

update_col_similarity ¶

update_graph ¶

update_row_similarity ¶

utils ¶

Similarity analysis utilities¶

on_checks ¶

on_contains ¶

on_hasattrs ¶

on_hasweights ¶

col_sim_threshold `property` `writable` ¶

col_similarity_matrix `property` ¶

cols `property` `writable` ¶

edges `property` `writable` ¶

graph `property` ¶

row_sim_threshold `property` `writable` ¶

row_similarity_matrix `property` ¶

rows `property` `writable` ¶

col_sim_threshold `property` `writable` ¶

col_similarity_matrix `property` ¶

cols `property` `writable` ¶

edges `property` `writable` ¶

graph `property` ¶

row_sim_threshold `property` `writable` ¶

row_similarity_matrix `property` ¶

rows `property` `writable` ¶