Friday, September 27, 2019

A look into building C++ modules with a scanner

At CppCon there was a presentation on building C++ modules using a standalone dependency scanner executable provided by the compiler toolchain. The integration (as I understand it) would go something like this:

  1. The build system creates a Ninja file as usual
  2. It sets up a dependency so that every compilation job depends on a prescan step.
  3. The scanner goes through all source files (using compilation_commands.json), determines module interdependencies and writes this out to a file in a format that Ninja understands.
  4. After the scan step, Ninja will load the file and use it to execute commands in the correct order.
This seems like an interesting approach for experimentation, but unfortunately it depends on functionality that is not yet in Ninja. It is unclear if and when these would be added to Ninja, as its current maintainers are extremely conservative in adding any new code. It is quite difficult to run experiments on approaches that have neither usable code nor all the required features in various parts of the toolchain.

Can we do it regardless? Yes we can!

Enter self-modifying build system code

The basic approach is simple
  1. Write a Ninja file as usual, but make all the top level commands (or, for this test, only all) run a secret internal command.
  2. The command will do the scanning, and change the Ninja file on the fly, rewriting it to have the module dependency information.
  3. Invoke Ninja on the new file giving it a secret target name that runs the actual build.
  4. Build proceeds as usual.
The code that does this can be found in the vsmodtest branch in the main Meson repository. To run it you need to use Visual Studio's module implementation, the test project is in the modtest directory. It actually does work, but there are a ton of disclaimers:
  • incremental builds probably won't work
  • the resulting binary never finishes (it is running a job with exponential complexity)
  • it does not work on any other project than the demo one (but it should be fixable)
  • the dependencies are on object files rather than module BMI files due to a Ninja limitation
  • module dep info is not cached, all files are fully rescanned every time
  • the scanner is not reliable, it does the equivalent of dumb regex parsing
  • any and all things may break at any time and if they do you get to keep both pieces
All in all nothing even close to production ready but a fairly nice experiment for ~100 lines of Python. This is of course a hack and should not go anywhere near production, but assuming Ninja gets all the required extra functionality it probably could be made to work reliably.

Is this the way C++ module building will work?

Probably not, because there is one major use case that this approach (or indeed any content scanning approach) does not support: code generation. Scanning assumes that all source code is available at the same time but if you generate source code on the fly, this is not the case. There would need to be some mechanism of making Ninja invoke the scanner anew every time source files appear and such a mechanism does not exist as far as I know. Even if it does there is a lot of state to transfer between Ninja and the scanner to ensure both reliable and minimal dependency scans.

There are alternative approaches one can take to avoid the need for scanning completely, but they have their own downsides.

No comments:

Post a Comment