Skip to content

ragraph.analysis.similarity

Similarity analyses

Graph similarity is often expressed as a metric, where nodes and edges are scanned for similar patterns, properties, or other aspects. There are three levels of equivalence, being structural, automorphic, or regular equivalence. Where each of the former implies all latter equivalences, respectively.

Available analyses

The following algorithms are directly accessible after importing ragraph.analysis.similarity:

  • jaccard_index: Jaccard Similarity Index of two objects based on the number properties they both possess divided by the number of properties either of them have.
  • jaccard_matrix: Jaccard Similarity Index between a set of objects stored in a square matrix.
Note

Both Jaccard methods require a callable that takes an object and returns a list of booleans representing the possession of a property (the on argument). Some examples are included in the ragraph.analysis.similarity.utils module, like ragraph.analysis.similarity.utils.on_hasattrs.

SimilarityAnalysis

1
2
3
4
5
6
7
SimilarityAnalysis(
    rows: List[Node],
    cols: List[Node],
    edges: List[Edge],
    col_sim_threshold: float = 0.0,
    row_sim_threshold: float = 0.0,
)

Similarity analysis of nodes based upon mutual mapping relations.

Parameters:

Name Type Description Default
cols List[Node]

List of column nodes.

required
rows List[Node]

List of row nodes.

required
edges List[Edge]

List of edges from column nodes to row nodes to be used in similarity analysis.

required
col_sim_threshold float

Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

0.0
row_sim_threshold float

Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

0.0
Class Attributes
Note

A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.

Source code in ragraph/analysis/similarity/_similarity.py
def __init__(
    self,
    rows: List[Node],
    cols: List[Node],
    edges: List[Edge],
    col_sim_threshold: float = 0.0,
    row_sim_threshold: float = 0.0,
):
    self._cols: List[Node] = []
    self._rows: List[Node] = []
    self._edges: List[Edge] = []
    self._col_sim_threshold = 0.0
    self._row_sim_threshold = 0.0
    self._graph: Graph = None  # type: ignore

    self.cols = cols
    self.rows = rows
    self.edges = edges
    self.col_sim_threshold = col_sim_threshold
    self.row_sim_threshold = row_sim_threshold

col_sim_threshold property writable

col_sim_threshold: float

Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.

col_similarity_matrix property

col_similarity_matrix: np.ndarray

The column similarity matrix based on their mapping column.

cols property writable

cols: List[Node]

List of column nodes.

edges property writable

edges: List[Edge]

List of edges.

graph property

graph: Graph

Graph containing similarity edges.

row_sim_threshold property writable

row_sim_threshold: float

Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.

row_similarity_matrix property

row_similarity_matrix: np.ndarray

The row similarity matrix based on their mapping row.

rows property writable

rows: List[Node]

List of rows nodes.

_cluster

1
2
3
4
5
6
7
_cluster(
    leafs: List[Node],
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = cluster.markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
leafs List[Node]

List of row or column nodes to be clustered.

required
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
def _cluster(
    self,
    leafs: List[Node],
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        leafs: List of row or column nodes to be clustered.
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    algo(self.graph, leafs=leafs, **algo_args, inplace=True)  # type: ignore

_update_similarity

1
2
3
_update_similarity(
    nodes: List[Node], mat: np.ndarray
) -> None

Update Jaccard Similarity Index edges between (clustered) nodes.

Source code in ragraph/analysis/similarity/_similarity.py
def _update_similarity(self, nodes: List[Node], mat: np.ndarray) -> None:
    """Update Jaccard Similarity Index edges between (clustered) nodes."""
    if not self.graph:
        self.update_graph()

    for e in [
        edge
        for edge in self.graph.edges_between_all(nodes, nodes)
        if edge.kind == self.similarity_kind
    ]:
        self.graph.del_edge(e)

    for row, target in enumerate(nodes):
        for col, source in enumerate(nodes):
            if row == col:
                continue
            sim = mat[row][col]
            if not sim:
                continue
            self.graph.add_edge(
                Edge(
                    source,
                    target,
                    kind=self.similarity_kind,
                    weights=dict(similarity=mat[row][col]),
                )
            )

check_mapping

check_mapping(col: Node, row: Node) -> bool

Check whether a column node maps to a row node.

Source code in ragraph/analysis/similarity/_similarity.py
def check_mapping(self, col: Node, row: Node) -> bool:
    """Check whether a column node maps to a row node."""
    if len(self.graph.directed_edges[col.name][row.name]) > 0:
        return True
    else:
        return False

cluster_cols

1
2
3
4
5
6
cluster_cols(
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = cluster.markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
def cluster_cols(
    self,
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    self._cluster(self.cols, algo, **algo_args)

cluster_rows

1
2
3
4
5
6
cluster_rows(
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = cluster.markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
def cluster_rows(
    self,
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    self._cluster(self.rows, algo, **algo_args)

col_mapping

col_mapping(col: Node) -> List[bool]

Boolean possession checklist for a column node w.r.t. self.rows.

Source code in ragraph/analysis/similarity/_similarity.py
def col_mapping(self, col: Node) -> List[bool]:
    """Boolean possession checklist for a column node w.r.t.
    [`self.rows`][ragraph.analysis.similarity.SimilarityAnalysis.rows]."""
    return [self.check_mapping(col, row) for row in self.rows]

row_mapping

row_mapping(row: Node) -> List[bool]

Boolean possession checklist for a row node w.r.t. self.cols.

Source code in ragraph/analysis/similarity/_similarity.py
def row_mapping(self, row: Node) -> List[bool]:
    """Boolean possession checklist for a row node w.r.t.
    [`self.cols`][ragraph.analysis.similarity.SimilarityAnalysis.cols]."""
    return [self.check_mapping(col, row) for col in self.cols]

update_col_similarity

update_col_similarity() -> None

Update Jaccard Column Similarity Index edges between (clustered) columns.

Source code in ragraph/analysis/similarity/_similarity.py
def update_col_similarity(self) -> None:
    """Update Jaccard Column Similarity Index edges between (clustered) columns."""
    self._update_similarity(self.cols, self.col_similarity_matrix)

update_graph

update_graph() -> None

Update Internal similarity graph

Source code in ragraph/analysis/similarity/_similarity.py
def update_graph(self) -> None:
    """Update Internal similarity graph"""
    self._graph = Graph(nodes=self.cols + self.rows, edges=self.edges)
    self.update_col_similarity()
    self.update_row_similarity()

update_row_similarity

update_row_similarity() -> None

Update Jaccard Row Similarity Index edges between (clustered) rows.

Source code in ragraph/analysis/similarity/_similarity.py
def update_row_similarity(self) -> None:
    """Update Jaccard Row Similarity Index edges between (clustered) rows."""
    self._update_similarity(self.rows, self.row_similarity_matrix)

jaccard_index

1
2
3
jaccard_index(
    obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]
) -> float

Calculate the Jaccard Similarity Index between to objects based on an object description function.

Parameters:

Name Type Description Default
obj1 Any

First object to compare.

required
obj2 Any

Second object to compare.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required

Returns:

Type Description
float

Jaccard Similarity between two objects, which is calculated as the size of the

float

overlap in properties divided by total size of properties they posess.

Source code in ragraph/analysis/similarity/_jaccard.py
def jaccard_index(obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]) -> float:
    """Calculate the Jaccard Similarity Index between to objects based on an object
    description function.

    Arguments:
        obj1: First object to compare.
        obj2: Second object to compare.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.

    Returns:
        Jaccard Similarity between two objects, which is calculated as the size of the
        overlap in properties divided by total size of properties they posess.
    """
    props1 = np.array(on(obj1))
    props2 = np.array(on(obj2))
    return _calculate(props1, props2)

jaccard_matrix

1
2
3
jaccard_matrix(
    objects: List[Any], on: Callable[[Any], List[bool]]
) -> np.ndarray

Calculate the Jaccard Similarity Index for a set of objects based on an object description function.

Parameters:

Name Type Description Default
objects List[Any]

List of objects to generate a similarity matrix for.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required
Source code in ragraph/analysis/similarity/_jaccard.py
def jaccard_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray:
    """Calculate the Jaccard Similarity Index for a set of objects based on an object
    description function.

    Arguments:
        objects: List of objects to generate a similarity matrix for.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.
    """
    dim = len(objects)
    mapping = mapping_matrix(objects, on)

    matrix = np.eye(dim, dtype=float)
    for i, obj_i in enumerate(objects):
        for j, obj_j in enumerate(objects):
            if j <= i:
                continue
            value = _calculate(mapping[i, :], mapping[j, :])
            matrix[i, j] = value
            matrix[j, i] = value
    return matrix

_jaccard

Jaccard Similarity Index

The index compares two objects, and is calculated as the size of the overlap in properties divided by total size of properties they possess.

For examples on 'object description functions', please refer to the similarity utilities.

References

Kosub, S. (2016). A note on the triangle inequality for the Jaccard distance. Retrieved from arXiv.org Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bulletin de La Société Vaudoise Des Sciences Naturelles. DOI: 10.5169/seals-266450

_calculate

_calculate(props1: np.array, props2: np.array) -> float

Calculate the Jaccard Index by the boolean object description arrays.

Source code in ragraph/analysis/similarity/_jaccard.py
def _calculate(props1: np.array, props2: np.array) -> float:
    """Calculate the Jaccard Index by the boolean object description arrays."""
    both = np.logical_and(props1, props2).sum()
    either = np.logical_or(props1, props2).sum()
    if either:
        return both / either
    return 0.0

jaccard_index

1
2
3
jaccard_index(
    obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]
) -> float

Calculate the Jaccard Similarity Index between to objects based on an object description function.

Parameters:

Name Type Description Default
obj1 Any

First object to compare.

required
obj2 Any

Second object to compare.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required

Returns:

Type Description
float

Jaccard Similarity between two objects, which is calculated as the size of the

float

overlap in properties divided by total size of properties they posess.

Source code in ragraph/analysis/similarity/_jaccard.py
def jaccard_index(obj1: Any, obj2: Any, on: Callable[[Any], List[bool]]) -> float:
    """Calculate the Jaccard Similarity Index between to objects based on an object
    description function.

    Arguments:
        obj1: First object to compare.
        obj2: Second object to compare.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.

    Returns:
        Jaccard Similarity between two objects, which is calculated as the size of the
        overlap in properties divided by total size of properties they posess.
    """
    props1 = np.array(on(obj1))
    props2 = np.array(on(obj2))
    return _calculate(props1, props2)

jaccard_matrix

1
2
3
jaccard_matrix(
    objects: List[Any], on: Callable[[Any], List[bool]]
) -> np.ndarray

Calculate the Jaccard Similarity Index for a set of objects based on an object description function.

Parameters:

Name Type Description Default
objects List[Any]

List of objects to generate a similarity matrix for.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required
Source code in ragraph/analysis/similarity/_jaccard.py
def jaccard_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray:
    """Calculate the Jaccard Similarity Index for a set of objects based on an object
    description function.

    Arguments:
        objects: List of objects to generate a similarity matrix for.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.
    """
    dim = len(objects)
    mapping = mapping_matrix(objects, on)

    matrix = np.eye(dim, dtype=float)
    for i, obj_i in enumerate(objects):
        for j, obj_j in enumerate(objects):
            if j <= i:
                continue
            value = _calculate(mapping[i, :], mapping[j, :])
            matrix[i, j] = value
            matrix[j, i] = value
    return matrix

mapping_matrix

1
2
3
mapping_matrix(
    objects: List[Any], on: Callable[[Any], List[bool]]
) -> np.ndarray

Calculate an object-property mapping matrix where each entry (i,j) indicates the possession of property j by object i.

Parameters:

Name Type Description Default
objects List[Any]

List of objects to describe.

required
on Callable[[Any], List[bool]]

Callable that takes an object and describes it with a list of booleans. Each entry indicates the possession of a property.

required
Source code in ragraph/analysis/similarity/_jaccard.py
def mapping_matrix(objects: List[Any], on: Callable[[Any], List[bool]]) -> np.ndarray:
    """Calculate an object-property mapping matrix where each entry (i,j) indicates the
    possession of property j by object i.

    Arguments:
        objects: List of objects to describe.
        on: Callable that takes an object and describes it with a list of booleans.
            Each entry indicates the possession of a property.
    """
    return np.array([on(obj) for obj in objects])

_similarity

Similarity analysis

SimilarityAnalysis

1
2
3
4
5
6
7
SimilarityAnalysis(
    rows: List[Node],
    cols: List[Node],
    edges: List[Edge],
    col_sim_threshold: float = 0.0,
    row_sim_threshold: float = 0.0,
)

Similarity analysis of nodes based upon mutual mapping relations.

Parameters:

Name Type Description Default
cols List[Node]

List of column nodes.

required
rows List[Node]

List of row nodes.

required
edges List[Edge]

List of edges from column nodes to row nodes to be used in similarity analysis.

required
col_sim_threshold float

Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

0.0
row_sim_threshold float

Column similarity threshold. Values below this threshold are pruned from the similarity matrix and the corresponding edges are removed. Defaults to 0.0 (no threshold).

0.0
Class Attributes
Note

A mapping matrix relating M column nodes to N row nodes is used as input for the similarity analysis.

Source code in ragraph/analysis/similarity/_similarity.py
def __init__(
    self,
    rows: List[Node],
    cols: List[Node],
    edges: List[Edge],
    col_sim_threshold: float = 0.0,
    row_sim_threshold: float = 0.0,
):
    self._cols: List[Node] = []
    self._rows: List[Node] = []
    self._edges: List[Edge] = []
    self._col_sim_threshold = 0.0
    self._row_sim_threshold = 0.0
    self._graph: Graph = None  # type: ignore

    self.cols = cols
    self.rows = rows
    self.edges = edges
    self.col_sim_threshold = col_sim_threshold
    self.row_sim_threshold = row_sim_threshold

col_sim_threshold property writable

col_sim_threshold: float

Similarity threshold. Values below this threshold are pruned from the column similarity matrix and the corresponding edges are removed.

col_similarity_matrix property

col_similarity_matrix: np.ndarray

The column similarity matrix based on their mapping column.

cols property writable

cols: List[Node]

List of column nodes.

edges property writable

edges: List[Edge]

List of edges.

graph property

graph: Graph

Graph containing similarity edges.

row_sim_threshold property writable

row_sim_threshold: float

Similarity threshold. Values below this threshold are pruned from the row similarity matrix and the corresponding edges are removed.

row_similarity_matrix property

row_similarity_matrix: np.ndarray

The row similarity matrix based on their mapping row.

rows property writable

rows: List[Node]

List of rows nodes.

_cluster

1
2
3
4
5
6
7
_cluster(
    leafs: List[Node],
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = cluster.markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
leafs List[Node]

List of row or column nodes to be clustered.

required
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
def _cluster(
    self,
    leafs: List[Node],
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        leafs: List of row or column nodes to be clustered.
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    algo(self.graph, leafs=leafs, **algo_args, inplace=True)  # type: ignore

_update_similarity

1
2
3
_update_similarity(
    nodes: List[Node], mat: np.ndarray
) -> None

Update Jaccard Similarity Index edges between (clustered) nodes.

Source code in ragraph/analysis/similarity/_similarity.py
def _update_similarity(self, nodes: List[Node], mat: np.ndarray) -> None:
    """Update Jaccard Similarity Index edges between (clustered) nodes."""
    if not self.graph:
        self.update_graph()

    for e in [
        edge
        for edge in self.graph.edges_between_all(nodes, nodes)
        if edge.kind == self.similarity_kind
    ]:
        self.graph.del_edge(e)

    for row, target in enumerate(nodes):
        for col, source in enumerate(nodes):
            if row == col:
                continue
            sim = mat[row][col]
            if not sim:
                continue
            self.graph.add_edge(
                Edge(
                    source,
                    target,
                    kind=self.similarity_kind,
                    weights=dict(similarity=mat[row][col]),
                )
            )

check_mapping

check_mapping(col: Node, row: Node) -> bool

Check whether a column node maps to a row node.

Source code in ragraph/analysis/similarity/_similarity.py
def check_mapping(self, col: Node, row: Node) -> bool:
    """Check whether a column node maps to a row node."""
    if len(self.graph.directed_edges[col.name][row.name]) > 0:
        return True
    else:
        return False

cluster_cols

1
2
3
4
5
6
cluster_cols(
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = cluster.markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
def cluster_cols(
    self,
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    self._cluster(self.cols, algo, **algo_args)

cluster_rows

1
2
3
4
5
6
cluster_rows(
    algo: Callable[
        [Graph, Any], Tuple[List[Node]]
    ] = cluster.markov,
    **algo_args: Any
) -> None

Cluster column nodes based on their similarity. Updates Graph in-place.

Parameters:

Name Type Description Default
algo Callable[[Graph, Any], Tuple[List[Node]]]

Clustering algorithm. Should take a graph as first argument and cluster it in-place. Defaults to cluster.markov.

markov
**algo_args Any

Algorithm arguments. See cluster.markov for sensible defaults.

{}
Source code in ragraph/analysis/similarity/_similarity.py
def cluster_rows(
    self,
    algo: Callable[[Graph, Any], Tuple[List[Node]]] = cluster.markov,
    **algo_args: Any,
) -> None:
    """Cluster column nodes based on their similarity. Updates Graph in-place.

    Arguments:
        algo: Clustering algorithm. Should take a graph as first argument and cluster it
            in-place. Defaults to [`cluster.markov`][ragraph.analysis.cluster.markov].
        **algo_args: Algorithm arguments. See
            [`cluster.markov`][ragraph.analysis.cluster.markov] for sensible defaults.
    """
    self._cluster(self.rows, algo, **algo_args)

col_mapping

col_mapping(col: Node) -> List[bool]

Boolean possession checklist for a column node w.r.t. self.rows.

Source code in ragraph/analysis/similarity/_similarity.py
def col_mapping(self, col: Node) -> List[bool]:
    """Boolean possession checklist for a column node w.r.t.
    [`self.rows`][ragraph.analysis.similarity.SimilarityAnalysis.rows]."""
    return [self.check_mapping(col, row) for row in self.rows]

row_mapping

row_mapping(row: Node) -> List[bool]

Boolean possession checklist for a row node w.r.t. self.cols.

Source code in ragraph/analysis/similarity/_similarity.py
def row_mapping(self, row: Node) -> List[bool]:
    """Boolean possession checklist for a row node w.r.t.
    [`self.cols`][ragraph.analysis.similarity.SimilarityAnalysis.cols]."""
    return [self.check_mapping(col, row) for col in self.cols]

update_col_similarity

update_col_similarity() -> None

Update Jaccard Column Similarity Index edges between (clustered) columns.

Source code in ragraph/analysis/similarity/_similarity.py
def update_col_similarity(self) -> None:
    """Update Jaccard Column Similarity Index edges between (clustered) columns."""
    self._update_similarity(self.cols, self.col_similarity_matrix)

update_graph

update_graph() -> None

Update Internal similarity graph

Source code in ragraph/analysis/similarity/_similarity.py
def update_graph(self) -> None:
    """Update Internal similarity graph"""
    self._graph = Graph(nodes=self.cols + self.rows, edges=self.edges)
    self.update_col_similarity()
    self.update_row_similarity()

update_row_similarity

update_row_similarity() -> None

Update Jaccard Row Similarity Index edges between (clustered) rows.

Source code in ragraph/analysis/similarity/_similarity.py
def update_row_similarity(self) -> None:
    """Update Jaccard Row Similarity Index edges between (clustered) rows."""
    self._update_similarity(self.rows, self.row_similarity_matrix)

utils

Similarity analysis utilities

on_checks

1
2
3
on_checks(
    checks: List[Callable[[Any], bool]]
) -> Callable[[Any], List[bool]]

Get an object description function that runs a predefined set of checks (which should be in a fixed order) and returns their boolean results.

Parameters:

Name Type Description Default
checks List[Callable[[Any], bool]]

Checks to perform.

required

Returns:

Type Description
Callable[[Any], List[bool]]

Object description function indicating check passings.

Source code in ragraph/analysis/similarity/utils.py
def on_checks(checks: List[Callable[[Any], bool]]) -> Callable[[Any], List[bool]]:
    """Get an object description function that runs a predefined set of checks (which should be in a
    fixed order) and returns their boolean results.

    Arguments:
        checks: Checks to perform.

    Returns:
        Object description function indicating check passings.
    """
    return lambda obj: [check(obj) for check in checks]

on_contains

1
2
3
on_contains(
    contents: List[Any],
) -> Callable[[Any], List[bool]]

Check whether an object contains certain contents.

Parameters:

Name Type Description Default
contents List[Any]

Contents to check for with lambda x: x in obj.

required

Returns:

Type Description
Callable[[Any], List[bool]]

Object description function indicating content presence.

Source code in ragraph/analysis/similarity/utils.py
def on_contains(contents: List[Any]) -> Callable[[Any], List[bool]]:
    """Check whether an object contains certain contents.

    Arguments:
        contents: Contents to check for with `lambda x: x in obj`.

    Returns:
        Object description function indicating content presence.
    """
    return lambda obj: [content in obj for content in contents]

on_hasattrs

1
2
3
on_hasattrs(
    attrs: List[str],
) -> Callable[[Any], List[bool]]

Get an object description function that checks whether an instance possesses certain attributes. It does not check the values thereof!

Parameters:

Name Type Description Default
attrs List[str]

List of attributes to check the existence of.

required

Returns:

Type Description
Callable[[Any], List[bool]]

Object description function indicating attribute possession.

Source code in ragraph/analysis/similarity/utils.py
def on_hasattrs(attrs: List[str]) -> Callable[[Any], List[bool]]:
    """Get an object description function that checks whether an instance possesses certain
    attributes. It does not check the values thereof!

    Arguments:
        attrs: List of attributes to check the existence of.

    Returns:
        Object description function indicating attribute possession.
    """
    return lambda obj: [hasattr(obj, attr) for attr in attrs]

on_hasweights

1
2
3
on_hasweights(
    weights: List[str], threshold: float = 0.0
) -> Callable[[Any], List[bool]]

Check whether an objects has certain weights above a threshold in its weights dictionary property.

Parameters:

Name Type Description Default
weights List[str]

Keys to the obj.weights dictionary to check.

required
threshold float

Threshold to verify against.

0.0

Returns:

Type Description
Callable[[Any], List[bool]]

Object description function indicating weights exceeding a threshold.

Source code in ragraph/analysis/similarity/utils.py
def on_hasweights(weights: List[str], threshold: float = 0.0) -> Callable[[Any], List[bool]]:
    """Check whether an objects has certain weights above a threshold in its weights dictionary
    property.

    Arguments:
        weights: Keys to the `obj.weights` dictionary to check.
        threshold: Threshold to verify against.

    Returns:
        Object description function indicating weights exceeding a threshold.
    """
    return lambda obj: [obj.weights.get(w, 0.0) >= threshold for w in weights]