In the recent Remembering Buildtool post, I described how setting up a cache of configuration checks was an important step in Buildtool’s installation process. The goal was to avoid pointless repetitive work on every build by performing such common checks once.

Episode 457 of BSD Now featured my post and Allan Jude wondered how much time would be saved in a bulk build of all FreeBSD packages if we could just do that same kind of caching with GNU Autoconf. And, you know what? It is indeed possible to do so. I had mentioned it en passing in my post but I guess I wasn’t clear enough, so let’s elaborate!

A blog on operating systems, programming languages, testing, build systems, my own software projects and even personal productivity. Specifics include FreeBSD, Linux, Rust, Bazel and EndBASIC.

0 subscribers

Follow @jmmv on Mastodon Follow @jmmv on Twitter RSS feed

The problem: Autoconf’s slowness

The configure scripts generated by GNU Autoconf are slow, very slow, to the point where sometimes their execution time is longer than the time it takes to build the package they configure. This is especially true on multi-core systems where these scripts make builds drag along.

Here, take a look at some package build times on an 8-core machine from 2011:

PackageTypeconfiguremake -j8
bmakeSmall C package8s7s
coreutilsMedium C package62s96s
m4Small C package36s9s
pkgconfSmall C package3s2s
kyuaSmall C++ package6s91s
tmuxSmall C package7s8s

For comparison, here are two of the builds above—I did not have the patience to run them all—on an even older single-core PowerBook G4 from 2005:

PackageTypeconfiguremake -j1
bmakeSmall C package44s60s
tmuxSmall C package46s217s

Note the huge cost of the configure run times relative to make.

You might think that slow configure scripts aren’t a big deal, but pause for a second to realize that these scripts plague the entire Unix ecosystem. Almost every package in your standard Linux distribution or BSD system has a configure script of its own, and this script has to run before the package can be built. Considering that this ecosystem favors small source packages, each with its own configure script, the costs add up quickly.

But wait, it gets even worse. All BSD systems and some Linux distributions have some form of bulk build: a long-running process where they rebuild their entire collection of binary packages at once from source. These binary packages are the ones you can later trivially install via, say, pkg on FreeBSD or dnf on Fedora. These bulk builds take several hours at best on the most powerful machines and several weeks (or is it months?) at worst on legacy platforms. I don’t have hard numbers, but based on the simple data presented above, I think it’s fair to assume that a large percentage of the total build time is wasted on configure scripts—and most of this time is stupidly spent doing the same work over and over and over again.

Can we do anything to make these runs faster? Yes, it turns out we can. But before getting into that, lets explore why these scripts are so slow and why they are still a big problem even on modern multi-core machines.

Why are configure scripts slow?

The reasons configure scripts are slow are varied:

  1. They are huge shell scripts. bmake’s configure to take just one example is about 210kb with a total of 7500 lines. The shell is a language that doesn’t win any speed tests.

  2. They are creating, compiling and running1 small programs to verify features of the host system.

  3. They are sequential and mostly I/O-bound (again, they are shell scripts), which is the worst kind of sequential.

’nuf said.

Parallel builds to the rescue?

“Ah, but it doesn’t matter”, you say. “While the configure script of one package may be slow, we are building thousands of packages in a bulk build. Therefore, we can make use of parallelism to hide the costs!” Yeah… not really.

You see, the end-to-end build of a package tends to be bimodal: the configure script is slow, I/O-bound, and sequential, while the build itself is typically reasonably parallel and CPU-bound. (Actually, the end-to-end process is trimodal if we account for the I/O-bound installation step, but let’s ignore that in this post.)

These different kinds of resource consumption at different stages pose problems when trying to parallelize the build of independent packages.

Suppose we have a machine with 8 CPUs and that every package’s build stage is parallel enough to consume up to 4 CPUs at any given time. If we try to build 8 of these packages in parallel to paper over the fact that configure is sequential, we’ll have good cases where we are running the 8 scripts at once and making an OK use of resources. Unfortunately, we’ll also have bad cases where the 8 packages are in their build stage trying to use 4 CPUs each, which means we’ll have 32 CPU-hungry processes scheduling on 8 CPUs. The latter scenario is more likely than the former so this is not great.

To “fix” this under bulk build scenarios, we could say that we don’t want to allow parallel builds within a package (i.e. we restrict each build to make -j1) to keep every package limited to one CPU at most. But if we do that, we’ll introduce major choke points in the bulk build because some packages, like clang, are depended on by almost everything and take forever to build without parallelism.

Repeated work

The worst part of all is that a lot of the work that configure scripts do is pure waste. How many times do you really need to check during the build of multiple packages that your system has a C compiler? And stdlib.h? And uint8_t? And a Fortran compiler? FFS. Most of these checks are useless in most packages (configure scripts are cargoculted and almost nobody understands them), and for those that are useful, their answers aren’t going to change for the duration of the build. Heck, the answers are likely not going to change for the lifetime of the entire system either.

This is particularly frustrating when you want to revive an old machine—like the PowerBook G4 I mentioned above—where the only option to get modern software is to build it yourself. Doing so is exasperating because you spend most of your time witnessing configure scripts doing repeated work and very little time building the code you want to run.

All hope is not lost though. I’m sure you have occasionally noticed this: when you run configure in a large project that has recursive configure invocations, you’ll often see (cached) next to individual checks. In other words, these scripts do know how to reuse results from previous invocations.

So, wouldn’t it be great if we convinced configure to avoid doing repeated work across packages? Couldn’t we check for these details just once and reuse cached results later? Well, yes, yes we can!

Say hello to GNU Autoconf caching

GNU Autoconf does have first-class caching features. Using them within a single package is trivial. All we have to do is pass the --config-cache flag to the script as described in the Cache Files section of the manual and it will maintain a config.cache file with the results of the invocation. You can see the impact of a perfect cache here:

.../m4-1.4.19$ time ./configure --config-cache >/dev/null
./configure > /dev/null  16.32s user 12.18s system 90% cpu 31.660 total

.../m4-1.4.19$ time ./configure --config-cache >/dev/null
./configure --config-cache > /dev/null  1.45s user 0.77s system 108% cpu 2.045 total

In other words, configure’s time went from 31 seconds to just 2 by saving and reusing the previous results. (Note that this is different than running config.status, which just recreates output files, but let’s leave that aside.)

This is nice for a single package, but as it turns out, the --config-cache flag takes an optional parameter to specify the path to the cache file. There is nothing preventing us from passing a path to a central location and reusing the same cache for various packages!

In fact, the GNU Autoconf developers have thought about this problem. On the one hand, the tool supports setting up a system-wide configuration file (known as config.site) as described in the Setting Site Defaults section of the manual. And on the other hand, the default code snippet that they show in the manual has an explicit mention of using a system-wide cache:

# Give Autoconf 2.x generated configure scripts a shared default
# cache file for feature test results, architecture-specific.
if test "$cache_file" = /dev/null; then
  cache_file="$prefix/var/config.cache"
  # A cache file is only valid for one C compiler.
  CC=gcc
fi

It wouldn’t be easier, really, to cache results, right? But then… why aren’t we collectively using this feature more widely? Well, caching configure results willy-nilly can cause random build failures because the checks performed by one package aren’t necessarily equivalent to similar-looking checks in another. An obvious case is when the results of a check depend on the results of a previous check: for cache correctness, any two scripts need to run these two checks in the same order, but there is no guarantee that they’ll do so.

If we want to have system-wide caching of reasonable safety, we need to do better than simply pointing all configure runs to a central cache file. And this is where autoswc enters the picture.

Enter autoswc

autoswc, whose name stands for Automatic System-Wide Configuration and was brought to you by yours truly in 2004, is a little tool that exposes GNU Autoconf system-wide caching facilities in a safe manner.

The key idea behind autoswc is that you (the administrator) create a system-wide configure script with the list of checks you want to cache and then run autoswc to refresh the cache at specific points in time (say before performing a bulk build). Then, any build you perform from within pkgsrc (the tool is specific to this packaging system), will reuse those checks—but these arbitrary builds won’t contaminate the central cache.

Put it another way: autoswc helps define a cache of safe checks and automates the process of using those during bulk builds minimizing the risks of bad things happening due to cache contamination.

Using this tool is easy. I had not used it in years, but installing it from pkgsrc and setting it up only required these steps. Just like with Buildtool, I’m surprised such old code of mine still works:

  1. Install the pkgtools/autoswc package.

  2. Optionally copy /usr/pkg/share/autoswc/configure.ac to /usr/pkg/etc/autoswc/configure.ac and extend the sample script with the checks you want to cache.

  3. Append .sinclude "/usr/pkg/share/autoswc/autoswc.mk" to /etc/mk.conf.

  4. Occasionally run autoswc from the command line to update the cache.

Voila. All package builds done through pkgsrc now benefit from the cached configuration results generated by the files in step 2.

Unfortunately, as good as this may seem, autoswc’s results aren’t impressive. The main problem is that it’s on you (the administrator) to curate the list of checks to cache. This is a very difficult task as it requires looking at what configure scripts are doing throughout a bulk build and determining which checks are safe to cache and which aren’t, and ain’t nobody have time for that.

I think my hope when I created this tool was that we’d get a swarm of people with pkgsrc expertise to curate the predefined list of checks in the sample configure.ac file and then we’d all benefit from the results on our own machines and on the bulk build clusters… but this obviously did not happen. But the feature in GNU Autoconf exists, autoswc is still functional and trivially configurable, and with some effort it could potentially bring some tangible speed improvements to builds—especially on old hardware.

Anyhow, now you know about one more “hidden” feature that GNU Autoconf has and that can potentially speed things up in repeated package builds massively.

Thanks for reading and enjoy the weekend!


  1. Avid readers will note that another consequence of running the test programs that configure creates is that configure scripts are often terrible when trying to cross-build software for other platforms. The test programs must be built for the target system in order to provide correct results, but that means that they cannot be run on the host. ↩︎