Scala 3.3.2 post-mortem analysis

Wednesday 6 March 2024

Sébastien Doeraene

Last week, we announced the release of Scala 3.4.0 and 3.3.3. The announcement mentioned that we had to skip 3.3.2 due to a bug breaking our forward binary compatibility guarantees in a patch release.

In this post, we take a closer look at what happened, why it slipped through our existing testing processes, and what steps we are taking so that it does not happen again in the future.

Symptoms of the bug

First, let us look at the scenario in which the bug surfaces:

  1. Compile a library A with Scala 3.3.2
  2. Compile a project B, depending on A, with Scala 3.3.1 (or 3.3.0)
  3. The older compiler crashes with the following message:
  [error] error while loading SomeClass,
  [error] class file .../SomeClass.class is broken, reading aborted with class java.util.NoSuchElementException
  [error] contextual$
  [error] one error found

The error does not surface in all situations. It happens when SomeClass, coming from library A and required to compile B, contains a method whose result type is a context function type. Nevertheless, it is not rare.

Since the above scenario is supposed to be accepted according to our forward compatibility guarantees, we had to abort the release of Scala 3.3.2. However, the artifacts had already been published to Maven Central, which is an immutable repository. We had no choice but to abandon the 3.3.2 version number, and publish 3.3.3 instead.

The bug in detail

Let us dig a bit deeper on the bug. It appears while the compiler is reading the TASTy files of a dependency. In particular, it cannot read the name of a method parameter.

TASTy is the name given to the intermediate representation of Scala Code. Starting from Scala 3, it is bundled with libraries and read by the compiler to understand the definitions available. Read more here

Names in TASTy and in the compiler are not simple Strings. They have some amount of structure. For example, the name of the class behind an object Foo definition has a structured name ModuleClass("Foo"), instead of the string "Foo$".

Likewise, generated names for anonymous entites, such as implicit parameters, have the structured name UniqueName("evidence$", 1). The string "evidence$" is used in the equality definition of names, so that UniqueName("evidence$", 1) is not the same name as UniqueName("foo$", 1), but it does not otherwise have any semantic meaning in TASTy.

However, for code readability and performance reasons, the compiler maintains a fixed list of the prefixes it generates, and uses a single instance of the associated NameKind, one of its internal data structures. When it reads the string "evidence$", it reuses the instance of NameKind that it uses for that particular string.

This leads us to the cause of our bug. Between 3.3.1 and 3.3.2, we merged PR #18280. That PR starts with what appeared to be a refactoring commit that renamed some NameKinds and their associated strings. Unfortunately, that means the 3.3.2 compiler started emitting UniqueNames with the string "contextual$". When the 3.3.1 compiler reads that structured name, it tries to find the cached instance of NameKind that it should use for "contextual$", and does not find any, because it did not use to generate that string.

In order to fix this in 3.3.3, we reverted to generating UniqueName("evidence$", x) where 3.3.2 would generate UniqueName("contextual$", x).

Why the bug passed our review and testing processing

The root cause is that the compiler assumed that only a fixed set of known strings would ever appear in TASTy UniqueNames, despite the fact that the TASTy format says that any string is valid in that position. From that root cause, an unfortunate combination of events happened.

First, when we reviewed PR #18280, taking the TASTy format into account, we assumed that using new strings would be a perfectly compatible change. We therefore did not mark the PR as “needs-minor-release” as we merged it into the main branch for release in 3.4.0.

Second, the PR was backported to the 3.3.x LTS branch. Normally, such a refactoring would not have met the bar for backporting on its own. We backported the commit because the PR that contained it was improving tooling, and in particular performance of incremental compilation. This is one of the areas for which backports are highly desirable.

Third and last, the backport failed to trigger any Continuous Integration test:

  • the initial set of unit tests for the compiler never mix different compiler versions, and
  • the community builds compile projects using the new compiler with dependencies compiled by the old compiler.

However, our issue only manifests when the old compiler is used with dependencies compiled by the new compiler.

Concretely, we did not have adequate testing scenarios for forward compatibility.

The RC cycles failed to surface the issue as well, for a similar reason. When trying out an RC, users rarely publish the product, even locally, then use it with an older compiler.

Fortunately, as a last line of defense, the issue was caught by the Play Framework, in-between the release of the 3.3.2 artifacts on Maven Central and the official announcement of the release.

Preventing this kind of bug in the future

As we mentioned in the previous section, the main point of failure was a lack of tests for forward compatibility scenarios. Our most important fix to the process will therefore be to add adequate forward compatibility tests in the compiler’s test suite.

In addition, this particular bug highlights the importance of accurately following the specification of the TASTy format. Making additional assumptions on the origin of TASTy files—namely that only our own compiler can produce them—leads to problems like this issue. We will therefore check whether there are other similar assumptions lurking in the compiler’s unpickler, and address them.

Conclusion

In this post, we examined in detail the bug and events that led to Scala 3.3.2 being abandoned. We first dug into the bug’s root cause, then looked into the series of events that allowed it to go unnoticed, and finally mentioned the steps we are going to take to prevent similar bugs from happening in the future.