This page describes what a sequencing analysis is and how to perform one using RaGraph. Sequencing
is an analysis form where one usually attempts to find an optimal ordering of nodes according to
some objective (function). If nodes represent dependent tasks with deliverables, you can imagine
that you would want to start a task only when (most) of the deliverables it depends on are done
first. Or, if you are modeling a system of components by their input and output dependencies, you
might want to view them more or less chronologically, too.
fromragraphimportdatasetsg=datasets.get("tarjans8")print(g.get_ascii_art())""" ┌───┬───┬───┬───┬───┬───┬───┬───┐1┥ ■ │ │ │ │ │ │ │ X │ ├───┼───┼───┼───┼───┼───┼───┼───┤2┥ X │ ■ │ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤3┥ │ X │ ■ │ │ X │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤4┥ │ │ X │ ■ │ │ │ X │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤5┥ │ │ │ X │ ■ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤6┥ │ │ │ │ X │ ■ │ X │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤7┥ │ │ X │ │ │ │ ■ │ X │ ├───┼───┼───┼───┼───┼───┼───┼───┤8┥ │ X │ │ │ │ │ │ ■ │ └───┴───┴───┴───┴───┴───┴───┴───┘"""
Here, the marks above the diagonal represent feedback marks, meaning that you are missing a
deliverable if you would do the activities in the order given on the left hand side (the same labels
are always assumed to be in identical order on the X-axis). If you were to make an assumption
regarding these deliverables, you could still start your tasks in this order, but might find that
this assumption is wrong in the end. The later you find this mistake, the higher the probability
that it cascades through your design process. Or in other words, the further the feedback mark is
from the diagonal (e.g. the top right), the higher the risk.
Plenty of metrics have been introduced to quantify the badness or penalty of a given
sequence of nodes. Some of these have been included in
ragraph.analysis.sequence.metrics. Let's quantify the
baseline of the graph we just imported. An often used metric is the feedback distance with respect
to the diagonal. E.g. the sum of distances to the diagonal of each feedback mark that is present:
fromragraphimportdatasetsfromragraph.analysisimportsequenceg=datasets.get("tarjans8")g,seq=sequence.markov(g,inf=1.0,dep=1.0,mu=2.0,scale=False,names=False)print(g.get_ascii_art(nodes=seq))""" ┌───┬───┬───┬───┬───┬───┬───┬───┐1┥ ■ │ X │ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤8┥ │ ■ │ X │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤2┥ X │ │ ■ │ │ │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤7┥ │ X │ │ ■ │ X │ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤3┥ │ │ X │ │ ■ │ │ X │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤4┥ │ │ │ X │ X │ ■ │ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤5┥ │ │ │ │ │ X │ ■ │ │ ├───┼───┼───┼───┼───┼───┼───┼───┤6┥ │ │ │ X │ │ │ X │ ■ │ └───┴───┴───┴───┴───┴───┴───┴───┘"""score,contrib=sequence.metrics.feedback_distance(g.get_adjacency_matrix(nodes=seq))assertscore==5.0,"Significant improvement!"assertcontrib.tolist()==[[0,1,0,0,0,0,0,0],[-0,0,1,0,0,0,0,0],[-0,-0,0,0,0,0,0,0],[-0,-0,-0,0,1,0,0,0],[-0,-0,-0,-0,0,0,2,0],[-0,-0,-0,-0,-0,0,0,0],[-0,-0,-0,-0,-0,-0,0,0],[-0,-0,-0,-0,-0,-0,-0,0],]
Much better! There are now less feedback marks and the distance of the remaining ones is quite close
to the diagonal. This also shows from the overall penalty score that now sits at 5.0 and the
contribution matrix.
By the way, it's impossible to sequence this Graph without any feedback
marks, as there are cycles present! It's not a Directed Acyclic Graph (DAG) so there is no trivial
solution.
fromragraphimportdatasetsfromragraph.analysisimportheuristicsg=datasets.get("tarjans8")cycles=heuristics.johnson(g,names=True)# Returns a generator.first=next(cycles)assertfirstisnotNone,"We should have a cycle here!"assertfirst==["1","2","8",],"The node names, in case you were wondering."
Yup, we found our first cycle!
To list all cycles at once, you can use the following snippet. Be careful however, for large graphs
the amount of cycles can be very long and keep your CPU occupied for a while.
Can you spot the cycles in the graph now, too? This heuristics provides a nice insight. If, and only
if, there are no cycles present, you can use the
ragraph.analysis.sequence.tarjans_dfs method to quickly
find a sequence without any feedback marks at all. Other algorithms might find this, too, but
this one in particular is one of the more efficient algorithms for this use case.
The computational advantage of not having cycles is huge, especially for large datasets. This has
led to sequencing strategies that incorporate tearing. Tearing is a two-fold procedure. You pick
a node from the first cycle and fix its position as the next in the sequence. You then try to see if
the remainder of the problem is (partially) free of cycles and continue to tear any subsequent cycle
by fixing a node there, too.
If you tear often enough, you can eventually sequence the remainder of the graph as a directed
acyclic graph. This leaves one problem:
This is where ragraph.analysis.sequence.scc_tearing comes
into play. It's a sequencing algorithm (or heuristic, really) that uses Tarjans Strongly Connected
Components (SCC) to identify the largest possible cycles. These cycles themselves form a directed
acyclic graph by definition. A nice byproduct of the SCC algorithm is that it outputs the cycles in
the (reversed) ideal ordering to put them in for a topological sort (ideal sequence so to speak).
These SCCs are then torn using a decision function. By default it utilizes the same calculations as
the Markov sequencing algorithm to pick the node with the lowest penalty score
(ragraph.analysis.sequence.utils.markov_decision).
An example:
The last result wasn't bad, although it features one slightly larger feedback loop than the regular
Markov sequencing example. However, we can tweak the Markov decision function to always take into
account the complete context of nodes when calculating the "flow matrices" behind the scenes. To do
this, we can explicitly provide arguments that should be passed to the (this time explicitly set)
decision function: