My bad opinions

2021/05/24

You've got to upgrade Rebar3

Bad news. You have to upgrade Rebar3. Like right now. We just noticed that SSL validation had been partially disabled for years.

This post covers:

You should definitely read the first two sections to remediate this, the rest is intended to be informative.

What's the Problem

We accidentally disabled all TLS validation when communicating with https://hex.pm in Rebar3 itself, meaning Hex packages you download may have seen only partial validation and could have been subjected to attack. While we do not think any such exploit has happened in the wild, we still treat this as urgent. Git or mercurial dependencies, and any other communications (such as rebar3 local upgrade commands) are unaffected.

All versions starting with Rebar3 3.7.0 released in November 2018 are affected. The specific versions, given OTP compatibility schedules, are:

You can call rebar3 version if you want to know which one you're running for sure. If you are a mix user (with Elixir), you are not at risk: Rebar3 is used by mix only to build code, not to fetch dependencies.

How To Fix It

Upgrade and Update.

The following versions have been tagged, to provide the quickest path to safety for any project on any supported versions in that time period:

You will want to install the newest one your project can tolerate (possibly with rebar3 local upgrade), and then call rebar3 update to bring your local registry up to date.

I'm hoping to soon be able to patch 3.13.x and 3.10.x to work, but the patch that's been tested to work for newer versions won't port cleanly to them, and the realities of OSS maintainership on a skeleton crew is that I can't currently commit the time to patch these up in a hurry (and get 8 year old copies of Erlang to build cleanly for it all).

The patch is not that complex on new versions and should be relatively easy to port to older versions if anyone ends up wanting to help.

Why We Think it Hasn't Been Exploited

Even though a build tool like Rebar3 is fundamentally about running code from random places online on your machine and can never be considered safe software, we still try to defend in depth where we can. The Hex.pm maintainers do the same and have taken previous reports seriously.

If you're on a moderately up-to-date Rebar3, the following things should all take place

These measures exist so that our layered package index mechanism can safely be used with partial mirrors. For example, someone in a corporate setting could have a "blessed" package index with only valid repositories that have been audited in there, and fetch from there as a top priority. Packages that can't be seen could go to a public mirror, and if the mirror is out of date, then they could go to the root hex package.

The locking mechanism we use and the way fetching and validating is done such that if someone were to compromise the public mirror and try to change a package's definition, any locked package version would allow us to detect that and warn about it. In practice, people see this warning pop up if they fetched the new version of a package within the first hour of its publication when the maintainer is still allowed to mutate it briefly (for bug fixing and correcting metadata).

Additionally, Rebar3 does not fetch updated copies of the hex index unless it is specifically told to (rebar3 update), and otherwise only does partial index fetches (per-package) when it is asked to download a version of a library it hasn't seen locally before.

So for someone to succesfully exploit this without anyone noticing, they likely would need to:

  1. Man-in-the-middle (MITM) attack the connection between your dev machine and hex.pm
  2. notice which package you're getting and live-inject a bad one that also matches the hex signatures bundled into the library
  3. MITM attack the connection of some if not all of your contributors in a similar way so they never get a warning about broken packages
  4. MITM attack the connection with your continuous integration or build servers in a similar way so there's never a warning either
  5. Keep it going for as long as you're using the bad package on new devices that might fetch it.

Now there are arguably weaker points in the supply chain: if you 'bootstrap' Rebar3 rather than using a pre-built copy, where the first fetch is done without validation (we need to download the CA Certs bundle package without one), and if this overall chain attack happens when someone happens to publish a package.

Package publishing is the riskiest one here, but it requires an interesting amount of sophistication:

  1. the attacker needs to MITM the connection between you and hex.pm
  2. it needs to intercept the update you're making
  3. malicious code must be injected; this can either be done live (hard) or at a later point in the hour re-using credentials they snooped
  4. You need to not be aware of or ignore the email hex sends you every time a new package you maintain is updated on the registry.

In short, since we've never seen any sign of this, and that it would be quite a convoluted attack, we're not very concerned that it's been seen in practice. If you have seen any of the above symptoms and just didn't know what to make of them, then you might be at risk.

What Happened

The issue was first introduced in Rebar3 3.7.0, a big release with work founded by the IEUG, the pre-cursor of the EEF, which aimed to do major re-work around the tool in order to support using Elixir dependencies in Erlang. This came with an overhaul of the plugin system, how the compiler works, and also bundled changes to how we fetched packages from hex, deferring work to the hex_core library. This latter change came in with extra features around how we built indexes and fetched updates, which was made as lazy as possible (only hit the network when you must).

Erlang's SSL/TLS libraries were never safe and never checked certificates by default. Our advice as OSS contributors has always been to use hackney as an HTTP client since it handles validation of certificate chains out of the box. Unfortunately, Rebar3 cannot rely on Hackney because Rebar3 needs to exist to fetch the hackney package, and early on we wanted to use as few dependencies as possible to prevent unlucky clashes with plugins and dependencies (which incidentally was made less annoying in 3.7.0). The Erlang libraries that come out of the box just don't check validation, have no way of getting the OS's root CA bundle, and before last week's release of OTP-24, wouldn't warn when unsafe calls would be made. Instead, we bundle the certifi along with tls validation functions that we pass in each of our calls.

In 3.7.0, the switching to the new hex_core library subtly replacing a call from:

request(Url, ETag) ->
    HttpOptions = [{ssl, ssl_opts(Url)},
                   {relaxed, true} | rebar_utils:get_proxy_auth()],
    case httpc:request(get, {Url, [{"if-none-match", "\"" ++ ETag ++ "\""}
                                   || ETag =/= false] ++
                             [{"User-Agent", rebar_utils:user_agent()}]},
                       HttpOptions, [{body_format, binary}], rebar) of
    % ...

to instead be:

request(Config, Name, Version, ETag) ->
    Config1 = Config#{http_etag => ETag},
    try hex_repo:get_tarball(Config1, Name, Version) of
    % ...

At this point in time, Rebar3 already supported PROXY environment variables, which were used to set the rebar profile in the built-in httpc client library. As long as that profile is used, the basic set-up around auth and redirection is in place.

This whole thing is a bit far behind, but as a reviewer on these commits I was still operating under the impression that this is where TLS validation was taking place. I was absolutely confident that we were passing the right profile and everything was good, and last week when someone asked how to get rid of the OTP-24 TLS warnings in an unrelated project, I pointed them to our SSL Options code as a way we worked around things and never got the warning.

Now that bit is really interesting because:

  1. The warning appeared only last week on OTP-24 (a very good thing)
  2. I had only seen the warnings in the bootstrap script, when we have to first generate the client to go fetch the CA bundle (which is validated against previously-signed hex index caches) when testing new rebar3 builds from scratch on OTP-24
  3. I had never seen it otherwise on all my computers on OTP-24

But here's where the impression breaks down:

As such, I thought we got none of the warnings because we had solid code in place, but I never got them because Rebar3 is good at not hitting the network when it knows packages and I was mostly working with known packages and getting the warnings in places where I expected them.

At some point yesterday I just decided to go take a look at the code to see how it was wired in, just because something felt funny (had I only seen the warning where I thought I did?):

<MononcQc> what the hell where are ssl opts configured in our stack
<MononcQc> fuck I think we might have broken ssl but I'll need to double-check
<MononcQc> shit we do. God fucking damn it

One of the core things of the validation we do is that it requires the current hostname of a TLS query. When I looked at that, it became apparent that it could not be set on the profile since it needs to be set by each call. And hex_core does not come with any of the required dependencies to do it. So for a few years we had just silently been using unvalidated TLS.

The patch turned out to be quite simple. I made it sort of subtle and non-panicky, in order to let it be reviewed and merged without causing a panic before we had time to rebuild all sorts of packages, get these instructions ready and ship it out. Now you can use it.

Obviously, it feels like we failed people here. Sorry about that one.