.. _sequencing: ################### Sequencing analysis ################### This page describes what a sequencing analysis is and how to perform one using RaGraph. Sequencing is an analysis form where one usually attempts to find an optimal ordering of nodes according to some objective (function). If nodes represent dependent tasks with deliverables, you can imagine that you would want to start a task only when (most) of the deliverables it depends on are done first. Or, if you are modeling a system of components by their input and output dependencies, you might want to view them more or less chronologically, too. The sequencing algorithms are included in the :obj:`ragraph.analysis.sequence` module, which is usually imported as follows: .. doctest:: >>> from ragraph.analysis import sequence With that out of the way, let's import the :obj:`ragraph.datasets.tarjans8` dataset that is an example from literature. .. doctest:: >>> from ragraph import datasets >>> g = datasets.get("tarjans8") >>> g.get_ascii_art(show=True) ┌───┬───┬───┬───┬───┬───┬───┬───┐ 1┥ ■ │ │ │ │ │ │ │ X │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 2┥ X │ ■ │ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 3┥ │ X │ ■ │ │ X │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 4┥ │ │ X │ ■ │ │ │ X │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 5┥ │ │ │ X │ ■ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 6┥ │ │ │ │ X │ ■ │ X │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 7┥ │ │ X │ │ │ │ ■ │ X │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 8┥ │ X │ │ │ │ │ │ ■ │ └───┴───┴───┴───┴───┴───┴───┴───┘ Here, the marks above the diagonal represent **feedback marks**, meaning that you are missing a deliverible if you would do the activities in the order given on the left hand side (the same labels are always assumed to be in identical order on the X-axis). If you were to make an assumption regarding these deliverables, you could still start your tasks in this order, but might find that this assumption is wrong in the end. The later you find this mistake, the higher the probability that it cascades through your design process. Or in other words, the further the feedback mark is from the diagonal (e.g. the top right), the higher the risk. Plenty of metrics have been introduced to quantify the **badness** or **penalty** of a given sequence of nodes. Some of these have been included in :obj:`ragraph.analysis.metrics`. Let's quantify the baseline of the graph we just imported. An often used metric is the **feedback distance with respect to the diagonal**. E.g. the sum of distances to the diagonal of each feedback mark that is present: .. doctest:: >>> score, contrib = sequence.metrics.feedback_distance(g.get_adjacency_matrix()) >>> print(score) 14.0 >>> print(contrib) [[ 0. 0. 0. 0. 0. 0. 0. 7.] [-0. 0. 0. 0. 0. 0. 0. 0.] [-0. -0. 0. 0. 2. 0. 0. 0.] [-0. -0. -0. 0. 0. 0. 3. 0.] [-0. -0. -0. -0. 0. 0. 0. 0.] [-0. -0. -0. -0. -0. 0. 1. 0.] [-0. -0. -0. -0. -0. -0. 0. 1.] [-0. -0. -0. -0. -0. -0. -0. 0.]] Here you can see both the penalty score (14.0) and the contribution of each cell in the matrix. Let's see if we can improve things |:chart_with_downwards_trend:|! .. doctest:: >>> g, seq = sequence.markov(g, inf=1.0, dep=1.0, mu=2.0, scale=False, names=False) >>> g.get_ascii_art(nodes=seq, show=True) ┌───┬───┬───┬───┬───┬───┬───┬───┐ 1┥ ■ │ X │ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 8┥ │ ■ │ X │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 2┥ X │ │ ■ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 7┥ │ X │ │ ■ │ X │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 3┥ │ │ X │ │ ■ │ │ X │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 4┥ │ │ │ X │ X │ ■ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 5┥ │ │ │ │ │ X │ ■ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 6┥ │ │ │ X │ │ │ X │ ■ │ └───┴───┴───┴───┴───┴───┴───┴───┘ >>> score, contrib = sequence.metrics.feedback_distance(g.get_adjacency_matrix(nodes=seq)) >>> print(score) 5.0 >>> print(contrib) [[ 0. 1. 0. 0. 0. 0. 0. 0.] [-0. 0. 1. 0. 0. 0. 0. 0.] [-0. -0. 0. 0. 0. 0. 0. 0.] [-0. -0. -0. 0. 1. 0. 0. 0.] [-0. -0. -0. -0. 0. 0. 2. 0.] [-0. -0. -0. -0. -0. 0. 0. 0.] [-0. -0. -0. -0. -0. -0. 0. 0.] [-0. -0. -0. -0. -0. -0. -0. 0.]] Much better! There are now less feedback marks and their distance is quite close to the diagonal. This also shows from the overall penalty (6.0) and the contribution matrix. By the way, it's impossible to sequence this :obj:`~ragraph.graph.Graph` without any feedback marks, as there are cycles present |:recycle:|. To check whether there are any cycles at all, take a look at the :obj:`ragraph.analysis.heuristics.johnson` heuristic: .. doctest:: >>> from ragraph.analysis import heuristics >>> cycles = heuristics.johnson(g, names=True) # Returns a generator. >>> next(cycles) ['1', '2', '8'] Nice! It found a cycle |:recycle:|. Or to view all cycles at once: .. doctest:: >>> list(heuristics.johnson(g, names=True)) [['1', '2', '8'], ['3', '7', '4', '5'], ['3', '4', '5']] Can you now see the cycles, too? This algorithm provides a nice insight. If, and only if, there are **no cycles present**, you can use the :obj:`ragraph.analysis.sequence.tarjans_dfs` method to quickly find a sequence **without any feedback marks** at all. Other algorithms might find this, too, but this one in particular is one of the more efficient algorithms for this use case. **************************** Breaking cycles with tearing **************************** The computational advantage of not having cycles is huge, especially for large datasets. This has led to sequencing strategies that incorporate **tearing**. Tearing is a two-fold procedure. You pick a node from the first cycle and fix its position as the next in the sequence. You then try to see if the remainder of the problem is (partially) free of cycles and continue to tear any subsequent cycle by fixing a node there, too. If you tear often enough, you can eventually sequence the remainder of the graph as a directed acyclic graph. This leaves one problem: How to decide which node to tear? .. _scc_tearing: SCC tearing =========== This is where :obj:`ragraph.analysis.sequence.scc_tear` comes into play. It's a sequencing algorithm (or heuristic, really) that uses Tarjans Strongly Connected Components (SCC) to identify the largest possible cycles. These cycles themselves form a directed acyclic graph by definition. A nice byproduct of the SCC algorithm is that it outputs the cycles in the (reversed) ideal ordering to put them in for a topological sort (ideal sequence so to speak). These SCCs are then torn using a decision function. By default it utilizes the same calculations as the Markov sequencing algorithm to pick the node with the lowest penalty score (:obj:`~ragraph.analysis.sequence.utils.markov_decision`). An example: .. doctest:: >>> g, seq = sequence.scc_tearing(g, names=False) >>> g.get_ascii_art(nodes=seq, show=True) ┌───┬───┬───┬───┬───┬───┬───┬───┐ 1┥ ■ │ │ X │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 2┥ X │ ■ │ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 8┥ │ X │ ■ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 3┥ │ X │ │ ■ │ │ │ X │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 7┥ │ │ X │ X │ ■ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 4┥ │ │ │ X │ X │ ■ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 5┥ │ │ │ │ │ X │ ■ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 6┥ │ │ │ │ X │ │ X │ ■ │ └───┴───┴───┴───┴───┴───┴───┴───┘ >>> score, contrib = sequence.metrics.feedback_distance(g.get_adjacency_matrix(nodes=seq)) >>> print(score) 5.0 >>> print(contrib) [[ 0. 0. 2. 0. 0. 0. 0. 0.] [-0. 0. 0. 0. 0. 0. 0. 0.] [-0. -0. 0. 0. 0. 0. 0. 0.] [-0. -0. -0. 0. 0. 0. 3. 0.] [-0. -0. -0. -0. 0. 0. 0. 0.] [-0. -0. -0. -0. -0. 0. 0. 0.] [-0. -0. -0. -0. -0. -0. 0. 0.] [-0. -0. -0. -0. -0. -0. -0. 0.]] SCC tearing with options ======================== Which is not bad, although it features one slightly larger feedback loop than the regular Markov sequencing example. However, we can tweak the Markov decision function to always take into account the complete context of nodes when calculating the flow matrices behind the scenes. To do this, we can explicitly provide arguments that should be passed to the (this time explicitly set) decision function: .. doctest:: >>> g, seq = sequence.scc_tearing( ... g, ... names=False, ... decision=sequence.utils.markov_decision, ... decision_args=dict(context=g.nodes), ... ) >>> g.get_ascii_art(nodes=seq, show=True) ┌───┬───┬───┬───┬───┬───┬───┬───┐ 8┥ ■ │ │ X │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 1┥ X │ ■ │ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 2┥ │ X │ ■ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 7┥ X │ │ │ ■ │ X │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 3┥ │ │ X │ │ ■ │ │ X │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 4┥ │ │ │ X │ X │ ■ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 5┥ │ │ │ │ │ X │ ■ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤ 6┥ │ │ │ X │ │ │ X │ ■ │ └───┴───┴───┴───┴───┴───┴───┴───┘ >>> score, contrib = sequence.metrics.feedback_distance(g.get_adjacency_matrix(nodes=seq)) >>> print(score) 5.0 >>> print(contrib) [[ 0. 0. 2. 0. 0. 0. 0. 0.] [-0. 0. 0. 0. 0. 0. 0. 0.] [-0. -0. 0. 0. 0. 0. 0. 0.] [-0. -0. -0. 0. 1. 0. 0. 0.] [-0. -0. -0. -0. 0. 0. 2. 0.] [-0. -0. -0. -0. -0. 0. 0. 0.] [-0. -0. -0. -0. -0. -0. 0. 0.] [-0. -0. -0. -0. -0. -0. -0. 0.]] Et voilà! Back to a score of 5. Calculating the complete context over and over comes at a slight computational cost, though.