# Similarity analysis#

The design of a product family requires the creation of a basic product architecture from which all product family members can be derived. By analyzing the similarity of products within an existing product portfolio one can get an indication of which and how many product variants have to be supported by the architecture.

## Input#

The input for a similarity analysis is a domain mapping matrix (or bipartite graph) that
maps individual products from the portfolio to attributes that characterize the
functionality and designs of the products. The `datasets`

module contains
an example data set for performing a similarity analysis. So let’s get started and load
the ‘similarity’ dataset:

```
>>> from ragraph import datasets
>>> graph = datasets.get("similarity")
```

and subsequently visualize it using `dmm`

:

```
>>> import ragraph.plot
>>> fig = ragraph.plot.dmm(
... rows=[n for n in graph.nodes if n.kind == "attribute"],
... cols=[n for n in graph.nodes if n.kind == "product"],
... edges=graph.edges,
... sort=False,
... )
>>> fig.write_image(dmm_path) # e.g.: "dmm.svg"
```

which results in Figure 15, which shows the resulting domain mapping matrix (DMM), in which twelve products (columns) are mapped to ten attributes (rows). A mark at position \(i,j\) indicates that product \(j\) is characterized by attribute \(i\).

In selecting attributes for performing a similarity analysis you should balance the attribute granularity level. That is, very fine grained (detailed) attributes will yield a very sparse mapping matrix, while very coarse (high level) attributes will yield a very dense mapping matrix. Moreover, you should take care in ensuring that the attributes are non-overlapping. In general, it is advised to use attributes that define functionality, working-principles, and embodiments of product(s) (modules).

Once you have defined your graph, mapping products to attributes, you can use the
`SimilarityAnalysis`

object to perform the similarity
analysis. First, you have to instantiate the
`SimilarityAnalysis`

object for which a minimal
example is shown below:

```
>>> from ragraph.analysis.similarity import SimilarityAnalysis
>>> sa = SimilarityAnalysis(
... rows=[n for n in graph.nodes if n.kind == "attribute"],
... cols=[n for n in graph.nodes if n.kind == "product"],
... edges=graph.edges,
... )
```

The `SimilarityAnalysis`

object requires three
parameters: `rows`

which is a list of `Node`

objects which are
the row elements of a DMM (in this case attributes), `cols`

which is a list of
`Node`

objects which are the column elements of a DMM (in this case
products), and `edges`

which is a list of `Edge`

objects that
map column elements to row elements.

Internally, the `jaccard_matrix`

is calculated for
both the columns elements as the row elements, which are stored within the
`row_similarity_matrix`

and
`col_similarity_matrix`

attributes:

```
>>> print(sa.row_similarity_matrix)
[[1. 0. 0. 0.25 0. 0.25 0.17 0.2 0. 0. ]
[0. 1. 0. 0. 0. 0. 0.2 0. 0. 0.33]
[0. 0. 1. 0. 0.5 0. 0.17 0. 0.25 0. ]
[0.25 0. 0. 1. 0. 0. 0. 0.25 0. 0. ]
[0. 0. 0.5 0. 1. 0. 0.17 0. 0.25 0. ]
[0.25 0. 0. 0. 0. 1. 0.2 0.25 0. 0. ]
[0.17 0.2 0.17 0. 0.17 0.2 1. 0. 0. 0.2 ]
[0.2 0. 0. 0.25 0. 0.25 0. 1. 0. 0. ]
[0. 0. 0.25 0. 0.25 0. 0. 0. 1. 0. ]
[0. 0.33 0. 0. 0. 0. 0.2 0. 0. 1. ]]
>>> print(sa.col_similarity_matrix)
[[1. 0.25 0.33 0. 0.33 0. 0. 0. 0. 0.25 0. 0. ]
[0.25 1. 0. 0.25 0.25 0.25 0.25 0. 0. 0.2 0. 0. ]
[0.33 0. 1. 0. 0.33 0. 0. 0. 0. 0. 0. 0. ]
[0. 0.25 0. 1. 0. 0. 0.33 0. 0. 0. 0.33 0. ]
[0.33 0.25 0.33 0. 1. 0. 0. 0. 0. 0.25 0. 0. ]
[0. 0.25 0. 0. 0. 1. 0.33 0. 0. 0. 0.33 0. ]
[0. 0.25 0. 0.33 0. 0.33 1. 0. 0. 0. 0.33 0. ]
[0. 0. 0. 0. 0. 0. 0. 1. 0.33 0.25 0. 0.33]
[0. 0. 0. 0. 0. 0. 0. 0.33 1. 0.67 0. 0.33]
[0.25 0.2 0. 0. 0.25 0. 0. 0.25 0.67 1. 0. 0.25]
[0. 0. 0. 0.33 0. 0.33 0.33 0. 0. 0. 1. 0. ]
[0. 0. 0. 0. 0. 0. 0. 0.33 0.33 0.25 0. 1. ]]
```

The data from these matrices are stored within the internal
`graph`

property. This graph is a
regular `Graph`

object which you can be visualized using the
`mdm`

module:

```
>>> import ragraph.plot
>>> fig = ragraph.plot.mdm(
... leafs=sa.graph.leafs,
... edges=sa.graph.edges,
... style=ragraph.plot.Style(
... piemap=dict(
... display="weights",
... fields=["similarity"]
... )
... ),
... )
>>> fig.write_image(smdm1_path)
```

Figure 16 shows the resulting product - attribute multi-domain-matrix (MDM) in which similarity weights are displayed.

## Pruning#

If a similarity matrix is very dense, you may want to prune the matrix by removing all
values below a certain threshold. You can prune the matrices by setting the values of
the `col_sim_threshold`

and
`row_sim_threshold`

attributes:

```
>>> sa.col_sim_threshold = 0.30
>>> sa.row_sim_threshold = 0.20
```

By changing the value of these attributes, the
`graph`

attribute will
automatically update, as show in Figure 17 in which all edges with a similarity
weight below the respective threshold are removed.

```
>>> import ragraph.plot
>>> fig = ragraph.plot.mdm(
... leafs=sa.graph.leafs,
... edges=sa.graph.edges,
... style=ragraph.plot.Style(
... piemap=dict(
... display="weights",
... fields=["similarity"]
... )
... ),
... )
>>> fig.write_image(smdm2_path)
```

## Clustering#

The aim of a similarity analysis is to find groups (clusters) of products that are
similar and could therefore be standardized. Similarly, one could search for groups
(clusters) of attributes that have high similarity, which implies that they are often
found together within products. To highlight these clusters you can use the
`cluster_rows`

and
`cluster_cols`

methods:

```
>>> sa.cluster_rows(alpha=2.0, beta=2.0, mu=2.0)
>>> sa.cluster_cols(alpha=2.0, beta=2.0, mu=2.0)
```

By default the `markov`

algorithm. You could, however,
provide a different algorithm by setting the `algo`

argument if desired.

After clustering, you can re-visualize the product - attribute MDM:

```
>>> import ragraph.plot
>>> sa.row_sim_threshold = 0.0
>>> sa.col_sim_threshold = 0.0
>>> fig = ragraph.plot.mdm(
... leafs=sa.graph.leafs,
... edges=sa.graph.edges,
... style=ragraph.plot.Style(
... piemap=dict(
... display="weights",
... fields=["similarity"]
... )
... ),
... )
>>> fig.write_image(smdm3_path) # e.g.: "dmm.svg"
```

Note that we reset the similarity thresholds to 0.0 to ensure that all data is displayed. Figure 18 shows the resulting MDM in which one can observe three product clusters and three attribute clusters. The first product cluster maps to the first attribute cluster, the second product cluster primarily maps to the third attribute cluster, and the third product cluster primarily maps to the second attribute clusters. This indicates the presences of three product families within the product portfolio, each with a distinct set of attributes that characterize their functionality and design.