Recovering Nix derivation attributes of runtime dependencies

October 8, 2019


A coworker recently asked me to compile a list of licenses used at runtime by some of our build products. Most nixpkgs derivations have a bunch of metadata associated with them, why not use that? Turns out it’s not that trivial.

It’s nothing ground breaking, but was a pretty nice puzzle to crack. Let’s get started!

You can find the full code as a gist.

Tonight’s Program

The idea here is that we want to use the original derivation object (for instance the meta field) of packages that are depended on at runtime. That means you can generate a “runtime report” like this:

let
    pkgs = import <nixpkgs> {};
    runtimeReport = ... ;
in runtimeReport pkgs.hello

and that’d give us:

  ---------------------------------
  |        OFFICIAL REPORT        |
  |   requested by: the lawyers   |
  |    written by: yours truly    |
  |    TOP SECRET - TOP SECRET    |
  ---------------------------------

runtime dependencies of hello-2.10:

- glibc-2.27 (lgpl2Plus) maintained by Eelco Dolstra
- hello-2.10 (gpl3Plus) maintained by Eelco Dolstra

This is the “runtime report”, i.e. what you’ll be sending to your lawyers, your boss, your mom, anybody who can enjoy a good runtime report. It consists of a few “dependency reports” which (in this example) contain information about runtime dependencies:

- <name> (<license>) maintained by Eelco Dolstra <maintainer>

Now let’s talk about how this works. The “runtime report” is created in two steps:

We’re using the following terminology:

Most of the “buildtime” reports won’t even be used, because most buildtime dependencies are buildtime-only dependencies. Why generate them? Nix does not give us a way of retrieving the derivation attributes of runtime dependencies, but we can twist its arm to:

Here’s the hack: when going through the buildtime dependencies, we tag them with their (expected) store paths, which we then cross check against the list of runtime store paths. If it’s a match, we keep it. Otherwise, we discard it.

Let’s look at some code. We’ll go top-down, starting with the function creating the final string.

The "Function Tree". Today we'll follow the red arrow.
Not sure why "tree". It looks upside down.
Or maybe a weeping willow, from afar.
This is terribly small, you shouldn't be reading this.

It’s Showtime

I hope you like jq:

runtimeReport = drv:
  runCommandNoCC "${drv.name}-report" { buildInputs = [ jq ]; }
  ''
    (
      echo "  ---------------------------------"
      echo "  |        OFFICIAL REPORT        |"
      echo "  |   requested by: the lawyers   |"
      echo "  |    written by: yours truly    |"
      echo "  |    TOP SECRET - TOP SECRET    |"
      echo "  ---------------------------------"
      echo
      echo "runtime dependencies of ${drv.name}:"
      cat ${buildtimeReports drv} |\
        jq -r --slurpfile runtime ${cinfo drv} \
          ' # First, we strip away (path-)duplicates.
            unique_by(.path)
            # Then we map over each build-time derivation and use `select()`
            # to keep only the ones that show up in $runtime

          | map(    # this little beauty checks if "obj.path" is in "runtime"
                select(. as $obj | $runtime | any(.[] | . == $obj.path))
              | .report)
          | .[]'
    ) > $out
  '';

The most important thing to notice here is the absolutely delicious argument name slurpfile. Once you get over that, and if you look at it long enough, you can convince yourself that this function doesn’t do much: it prints the record field of all the objects in buildtimeReports, as long as those objects have a path that is found in cinfo. You guessed right: cinfo are the paths of the runtime dependencies, and builtimeReports are… the buildtimeReports.

We’re not going to discuss cinfo much. It’s a wrapper around closureInfo, which itself is a helper for using exportReferencesGraph, which … basically, cinfo gives you the list of store paths used (or referenced) at runtime.

Instead let’s look at buildtimeReports, which creates reports for all of drv’s buildtime dependencies. Each element in the resulting list has two fields:

The path is the store path of the dependency, and report is where you can go crazy:

buildtimeReports = drv: writeText "${drv.name}-runtime" ( toJSON (
  map (obj:
    { path = unsafeDiscardStringContext obj.key;
      report = mkReport obj.drv;
    }
  )
  (buildtimeDerivations drv)
  ));

mkReport = drv: "something interesting with ${drv.meta}, although it will most likely mention Eelco or peti";

> blah blah blah _something about `unsafeDiscardStringContext`_ blah blah blah

I know, right!?! My cat jumped on my keyboard and somehow typed that in! I thought it was gonna break everything but it didn’t, instead it just super sped up the build when downloading from the binary cache! I really love my cat, I can blame anything on it it cares about the size of my /nix/store!

This function grabs the buildtimeDerivations and creates a report for each. It also tags those derivations with their path which, as we’ll see in a second, is stored in the object’s .key (whereas the derivation itself is in .drv).

Build-time derivation objects

Getting the paths of runtime dependencies — that we got for free from cinfo. As we walk down the tree of functions, we get to the other big piece of the puzzle: laying our hands on the original derivations of all the buildtime dependencies.

My cat is responsible for this as well.

This one’s called buildtimeDerivations and does something very simple. It takes the original derivation object (e.g. pkgs.hello) and goes through all the attributes. Whenever you hit another derivation (for instance from buildInputs), put it in your backpack, and continue. Once you’ve found them all, open your backpack and recurse into all your newly found derivations, ad nauseam.

There are no guarantees that this will find all inputs, but it works well enough in practice:

buildtimeDerivations = drv0:
  let
    drvDeps = attrs: ...; # recurses into the attrs to find other derivations
  in
    let wrap = drv: { key = drv.outPath ; inherit drv; }; in genericClosure
    { startSet = map wrap (drvOutputs drv0) ;
      operator = obj: map wrap
        ( concatLists (drvDeps obj.drv.drvAttrs) ) ;
    };

The safest, and probably fastest way to traverse a graph in Nix is genericClosure. Give it a startSet, an operator to generate new nodes from the current node, and you’ve got yourself a full traversal. What’s a “node”? It’s anything that has a key field. Here we use drv.outPath as the key, and it should now be clear where the key came from in buildtimeReports.

And just like that, we’ve reached the bottom of the function tree. Let’s grab our ladder, go back up and look at the bigger picture.

IFD vs jq-galore

By far the most frightening piece of code today was:

unique_by(.path) | map(select(. as $obj | $runtime | any(.[] | . == $obj.path))| .report)| .[]

Don’t remember seeing it? That’s “selective memory”. Probably took me a full hour to come up with it.

Given that Nix has readFile and fromJSON, we could have lifted cinfo’s output to a Nix value, then performed all the matching on that value, and generated the report as a Nix string directly! Added bonus, the “buildtime-only” reports would never have been generated, thanks to laziness!

The problem from readFile-ing cinfo is … import from derivation. I stay away from it, but YMMV, try it! Leave your thoughts in the comment section below.

                   ..--""|
                   |     |
                   | .---'
             (\-.--| |-----------.
            / \) \ | |            \
            |:.  | | |             |
            |:.  | |o|  >/dev/null |
            |:.  | `"`             |
            |:.  |_  __   __ _  __ /
            `""""`""""|=`|"""""""`
                      |=_|
                      |= |

I lied, there's no comment section. Artwork by Joan G. Stark.


Like Nix and build systems? Here's more on the topic: