Communicating model architecture with Python Diagrams

Model diagrams are ubiquitous in science and engineering research and practice. A model diagram is any visualization of the various components of a model and their interactions. Depending on what you are trying to convey, the components of such a diagram could be: (1) broad concepts or systems and their modeled interactions (e.g., interconnected food-energy-water systems); (2) individual steps within a multi-part sequential experiment; (3) software modules and their interactions and dependencies (e.g., UML diagrams – see this previous blog post by Tom Wild); or (4) interacting nodes within a network simulation model. Regardless of the type of component, model diagrams are critical for communicating model architecture for presentations, papers, and collaborative work. They can also be very helpful internally for model developers when brainstorming and developing new models and workflows.

The development of a model diagram typically begins with pen and paper or marker and whiteboard, while the polished final version is typically created in PowerPoint or Adobe Illustrator. Illustrator makes it possible to create highly professional diagrams (see this blog post by Jazmin Zatarain Salazar for some tips), but the process is tedious and time consuming. More recently, I’ve begun experimenting with automated model diagramming workflows as an intermediate step. Software tools are available that allow the user to easily create model diagrams using simple scripts. Although the resulting diagram typically doesn’t look as polished as what you could create in Illustrator, these tools can be valuable for “quick and dirty” mockups at the brainstorming stage. They also make it possible to automatically update the model diagram each time the underlying model is updated, which can be helpful throughout the model development process.

In this blog post, I will demonstrate how to use the Diagrams package for Python to create a model diagram. Diagrams is an open-source Python package which is built on top of the Graphviz visualization software. Both tools are primarily used for prototyping software system architecture diagrams.

Example model diagram, from Diagrams package documentation.

However, they are flexible enough to be applied to any type of model diagram. In this post, I will demonstrate the use of the Diagrams package to visualize the nodes and links in a water resources simulation model. Interested readers can find the full source code associated with this blog post in this GitHub repository.

Water resources simulation model context

I am currently in the process of writing a paper that will introduce Pywr-DRB, our new open-source water resources simulation model of the Delaware River Basin. This model includes 17 reservoir nodes plus a similar number of river junction nodes and interbasin transfer nodes. Each of these “primary nodes” is associated with a variety of “secondary nodes” with functions such as applying catchment inflows, water withdrawals/consumptions, and time lags.

Given this size and complexity, it was important to develop a model diagram which could clearly convey the relationships between different nodes. There are several pieces of information I wanted to convey with this diagram: (1) the labeled set of primary nodes and the links between them corresponding to the river network; (2) the “type” of primary node: NYC reservoirs vs non-NYC reservoirs vs interbasin transfers vs river nodes; (3) the types of secondary nodes associated with each primary node (e.g., catchment inflows, reservoir, observation gage); and (4) the time delay between each nodes (i.e., the time in days that it takes for water to flow from upstream node to downstream node).

Here is the model diagram I created with the Diagrams Python package to meet these needs. Although I have since turned to Illustrator to make a more polished final version of this diagram, I found the Diagrams package to be a very useful intermediate step when brainstorming alternative diagram structures and different ways of conveying information. In the following section, I will walk through the code used to make it.

Model diagram for a water resources simulation model, created with Diagrams package.

Creating a model diagram with Python

I would highly suggest that interested readers first check out the Diagrams documentation, which has concise and helpful examples for getting started with the package. What follows is not an exhaustive introduction, but rather a targeted introduction to Diagrams based on the

Here is the code I used for creating my diagram. I have broken it down into Sections which will be explained in sequence in the text following the code block.

##### Section 1
from diagrams import Diagram, Cluster, Edge
from diagrams.custom import Custom


##### Section 2
### filename for output png file
filename='diagrams/Pywr-DRB_model_diagram'

### location of icons, downloaded from https://thenounproject.com/
### Note: file paths need to be relative to filename above (e.g., use '../' to back out of diagrams/ directory)
reservoir_icon = '../icons/reservoir.png'
river_icon = '../icons/river.png'
gage_icon = '../icons/measurement.png'
diversion_icon = '../icons/demand.png'


##### Section 3
### customize graphviz attributes
graph_attr = {
    'fontsize': '40',
    'splines': 'spline',
}

##### Section 4
### create a diagram to depict the node network
with Diagram("", filename=filename, show=False, graph_attr=graph_attr, direction='LR'):

    ##### Section 5
    ### diversion nodes
    graph_attr['bgcolor'] = 'mediumseagreen'

    with Cluster('NYC Diversion', graph_attr=graph_attr):
        NYCDiversion = Custom('', diversion_icon)

    with Cluster('NJ Diversion', graph_attr=graph_attr):
        NJDiversion = Custom('', diversion_icon)


    ##### Section 6
    ### function for creating edge with linestyle based on time delay between nodes (days)
    def create_edge(lag_days):
        penwidth = '4'
        if lag_days == 0:
            return Edge(color='black', style='solid', penwidth=penwidth)
        elif lag_days == 1:
            return Edge(color='black', style='dashed', penwidth=penwidth)
        elif lag_days == 2:
            return Edge(color='black', style='dotted', penwidth=penwidth)


    ##### Section 7
    ### cluster of minor nodes within major node
    def create_node_cluster(label, has_reservoir, has_gage):
        if has_reservoir and label in ['Cannonsville', 'Pepacton', 'Neversink']:
            bgcolor='firebrick'
        elif has_reservoir:
            bgcolor='lightcoral'
        else:
            bgcolor='cornflowerblue'
        graph_attr['bgcolor'] = bgcolor

        with Cluster(label, graph_attr=graph_attr):
            cluster_river = Custom('', river_icon)

            if has_reservoir:
                cluster_reservoir = Custom('', reservoir_icon)
                cluster_river >> create_edge(0) >> cluster_reservoir
                if has_gage:
                    cluster_gage = Custom('', gage_icon)
                    cluster_reservoir >> create_edge(0) >> cluster_gage
                    return {'river': cluster_river, 'reservoir': cluster_reservoir, 'out': cluster_gage}
                else:
                    return {'river': cluster_river, 'reservoir': cluster_reservoir, 'out': cluster_reservoir}
            else:
                if has_gage:
                    cluster_gage = Custom('', gage_icon)
                    cluster_river >> create_edge(0) >> cluster_gage
                    return {'river': cluster_river, 'reservoir': None, 'out': cluster_gage}
                else:
                    return {'river': cluster_river, 'reservoir': None, 'out': cluster_river}


    ##### Section 8
    ### river nodes
    Lordville = create_node_cluster('Lordville', has_reservoir=False, has_gage=True)
    Montague = create_node_cluster('Montague', has_reservoir=False, has_gage=True)
    Trenton1 = create_node_cluster('Trenton 1', has_reservoir=False, has_gage=False)
    Trenton2  = create_node_cluster('Trenton 2', has_reservoir=False, has_gage=True)
    DelawareBay  = create_node_cluster('Delaware Bay', has_reservoir=False, has_gage=True)

    ### reservoir nodes
    Cannonsville = create_node_cluster('Cannonsville', has_reservoir=True, has_gage=True)
    Pepacton = create_node_cluster('Pepacton', has_reservoir=True, has_gage=True)
    Neversink = create_node_cluster('Neversink', has_reservoir=True, has_gage=True)
    Prompton = create_node_cluster('Prompton', has_reservoir=True, has_gage=False)
    Wallenpaupack = create_node_cluster('Wallenpaupack', has_reservoir=True, has_gage=False)
    ShoholaMarsh = create_node_cluster('Shohola Marsh', has_reservoir=True, has_gage=True)
    Mongaup = create_node_cluster('Mongaup', has_reservoir=True, has_gage=True)
    Beltzville = create_node_cluster('Beltzville', has_reservoir=True, has_gage=True)
    FEWalter = create_node_cluster('F.E. Walter', has_reservoir=True, has_gage=True)
    MerrillCreek = create_node_cluster('Merrill Creek', has_reservoir=True, has_gage=False)
    Hopatcong = create_node_cluster('Hopatcong', has_reservoir=True, has_gage=False)
    Nockamixon = create_node_cluster('Nockamixon', has_reservoir=True, has_gage=False)
    Assunpink = create_node_cluster('Assunpink', has_reservoir=True, has_gage=True)
    StillCreek = create_node_cluster('Still Creek', has_reservoir=True, has_gage=False)
    Ontelaunee = create_node_cluster('Ontelaunee', has_reservoir=True, has_gage=False)
    BlueMarsh = create_node_cluster('Blue Marsh', has_reservoir=True, has_gage=True)
    GreenLane = create_node_cluster('Green Lane', has_reservoir=True, has_gage=False)


    ##### Section 9
    ### tie them all together, with edge linestyles designating time delay between nodes.
    Cannonsville['reservoir'] >> create_edge(0) >> NYCDiversion
    Pepacton['reservoir'] >> create_edge(0) >> NYCDiversion
    Neversink['reservoir'] >> create_edge(0) >> NYCDiversion
    Cannonsville['out'] >> create_edge(0) >> Lordville['river']
    Pepacton['out'] >> create_edge(0) >> Lordville['river']
    Lordville['out'] >> create_edge(2) >> Montague['river']
    Neversink['out'] >> create_edge(1) >> Montague['river']
    Prompton['out'] >> create_edge(1) >> Montague['river']
    Wallenpaupack['out'] >> create_edge(1) >> Montague['river']
    ShoholaMarsh['out'] >> create_edge(1) >> Montague['river']
    Mongaup['out'] >> create_edge(0) >> Montague['river']
    Montague['out'] >> create_edge(2) >> Trenton1['river']
    Beltzville['out'] >> create_edge(2) >> Trenton1['river']
    FEWalter['out'] >> create_edge(2) >> Trenton1['river']
    MerrillCreek['out'] >> create_edge(1) >> Trenton1['river']
    Hopatcong['out'] >> create_edge(1) >> Trenton1['river']
    Nockamixon['out'] >> create_edge(0) >> Trenton1['river']
    Trenton1['out'] >> create_edge(0) >> Trenton2['river']
    Trenton1['out'] >> create_edge(0) >> NJDiversion
    Trenton2['out'] >> create_edge(0) >> DelawareBay['river']
    Assunpink['out'] >> create_edge(0) >> DelawareBay['river']
    Ontelaunee['out'] >> create_edge(2) >> DelawareBay['river']
    StillCreek['out'] >> create_edge(2) >> DelawareBay['river']
    BlueMarsh['out'] >> create_edge(2) >> DelawareBay['river']
    GreenLane['out'] >> create_edge(1) >> DelawareBay['river']

In Section 1, I import the Diagram, Cluster, Edge, and Custom classes from the Diagrams package. The Diagrams package is written in an object-oriented way that makes it very easy to create and link together different visual objects such as nodes, edges, and clusters. The meaning of these classes will become clear in the following paragraphs.

In Section 2, I define the filename for the output graphic that will be created, as well as the filenames for input graphics that are used as icons within the diagram to represent inflows, reservoirs, observation gages, and diversions. I downloaded these graphics from The Noun Project, a great resource for finding icons. I have not included the icons in the GitHub repository because I don’t own the rights to distribute them; if you are trying to replicate this code, you will need to download icons of your own, name them consistent with the code above, and place them in the “icons” directory. It is also worth noting that the Diagrams package your output filename directory (in this case “diagrams/”) as the working directory, which means that the file paths for the icons must be relative to this working directory.

In Section 3, I overwrite some default parameters for the diagram: the font size and the arrow drawing methodology. These parameters are written in a dictionary called “graph_attr”. More details on the available parameters can be found in the Diagrams and GraphViz documentation pages.

In Section 4, I create the model diagram itself with a call to the Diagram class. This call is made within a context manager (“with … :”) so that all code that follows will directly impact the final diagram. The “direction” is given as “LR”, meaning that the major flow of components will be from left to right; other options include “TB” for top to bottom, as well as “RL” and “BT”. Trial and error is the best way to find out which direction is most efficient and intuitive for a given model diagram.

In Section 5, I create the first two primary nodes associated with interbasin transfer diversions to NYC and NJ. Each node is created by calling the Custom class. The Custom class in Diagrams is used to represent a node using a custom graphic (in this case, the “diversion” icon). Diagrams also has many built-in node types that use standard icons from major software companies (e.g., AWS) or shapes common in UML-type software architecture diagrams (“programming“). However, for our purposes the Custom class is best as it allows us to provide our own icons that improve the clarity of the diagram.

The other important thing to note from Section 5 is the use of the Cluster class, again within a context manager. This creates the rectangular background around each primary node. Although the diversion nodes here only have a single secondary node inside of them, we will see later that it is possible to group multiple secondary nodes inside of a single primary node “cluster”, which is helpful for organizing the diagram. The primary node is given a label using the first argument of the Cluster class (note that the Custom class also allows labeling, but I decided for clarity to only label at the Cluster level, which is why the first argument of the Custom call is an empty string). Specific attributes of the Cluster such as the color can be specified by supplying a new graph_attr dictionary directly to the Cluster call.

In Section 6, I define a function used to create customized arrows between nodes using the Edge class. The style of the arrow is customized based on the time lag between two nodes (with solid, dashed, or dotted lines representing 0, 1, or 2 days lag, respectively).

In Section 7, I define a function used to automatically create custom Clusters for any primary node type other than a diversion. The color is based on the primary node type: NYC reservoir (firebrick), non-NYC reservoir (lightcoral), or river junction (cornflowerblue). Each cluster is automatically populated with a catchment inflows icon and/or a reservoir icon and/or an observation gage icon based on the secondary nodes included in that primary node, as specified in the function arguments. Each secondary node is created with a call to the Custom class, and the secondary nodes within each primary node are linked with solid arrows (zero-day lag) via a call to the create_edge function. We return a dictionary of secondary nodes so that they can be linked to other nodes across the broader network.

Section 8 is where all of the primary nodes associated with river junctions and reservoirs in the model are actually created via calls to the create_node_cluster function. And finally, Section 9 is where the cross-node links are created via calls to the create_edge function, with arrows formatted according to lag times within the river network.

And that’s it! The diagram can be created by running this script in Python. The Diagrams package will automatically assign the layout of the different components in a way that tries to maximize flow and minimize overlap. It is also possible to choose between a few alternative methods for assigning the layout, or even to dictate the location of some or all components manually if you desire a greater degree of control. The interested reader is referred to the documentation for more details.

Leave a comment