No Cause for Concern — RxJava and Retrofit Throwing a Tantrum

Last week, we found an interesting API design issue in the Throwable class of the JDK that led to bugs in RxJava and Retrofit. This is a…

Last week, we found an interesting API design issue in the Throwable class of the JDK that led to bugs in RxJava and Retrofit. This is a write-up of how we found those bugs.

Assembly tracking

Monday morning, Nelson Osacky opens a pull request to enable RxJava Assembly Tracking in the debug build of Square Register Android.

Assembly Tracking makes Rx debugging easier by reporting where failing observables were created.

public class RegisterDevelopmentApp extends RegisterApp {
  @Override public void onCreate() {
    RxJavaHooks.enableAssemblyTracking();
    super.onCreate();
  }
}

According to the RxJavaHooks.enableAssemblyTracking() Javadoc:

Sets up hooks that capture the current stacktrace when a source or an operator is instantiated, keeping it in a field for debugging purposes and alters exceptions passing along to hold onto this stacktrace.

Here’s an example (thanks David):

Observable.just(1, 2, 3)
  .single()
  .subscribe();

This Rx chain fails:

rx.exceptions.OnErrorNotImplementedException: Sequence contains too many elements
  at ... lots of Rx internals
  at rx.Observable.subscribe(Observable.java:9989)
  at com.squareup.Example.test(Example.java:116)
Caused by: java.lang.IllegalArgumentException: Sequence contains too many elements
  ... 38 more

An exception is raised when we call subscribe, so the stacktrace shows that something went wrong when subscribing. We’re left trying to understand the error message and figuring out what all these Rx internal classes are about.

Let’s enable assembly tracking:

RxJavaHooks.enableAssemblyTracking();
Observable.just(1, 2, 3)
  .single()
  .subscribe();

We get an extra cause at the end:

rx.exceptions.OnErrorNotImplementedException: Sequence contains too many elements
  at ... lots of Rx internals
  at rx.Observable.subscribe(Observable.java:9989)
  at com.squareup.Example.test(Example.java:116)
  at ...
Caused by: java.lang.IllegalArgumentException: Sequence contains too many elements
  ... 38 more
Caused by: rx.exceptions.AssemblyStackTraceException: Assembly trace:
  at rx.Observable.create(Observable.java:98)
  at rx.Observable.lift(Observable.java:251)
  at rx.Observable.single(Observable.java:9337)
  at com.squareup.Example.test(Example.java:115)

It’s now clear that the failure was called by the creation of the single() observable. single() expects the source observable to emit only a single item.

Let’s enable Assembly Tracking!

Build failed

public class RegisterDevelopmentApp extends RegisterApp {
  @Override public void onCreate() {
    RxJavaHooks.enableAssemblyTracking();
    super.onCreate();
  }
}

Uh-oh, this one-line pull request is breaking a UI test my team wrote. I look at the failing code:

connectService.getClientSettings(clientId)
  .subscribe(
    (response) -> { ... },
    (throwable) -> {
      if (throwable instanceof RetrofitError) {
        ...
      } else {
        Timber.d(throwable);
        throw new AssertionError("Should not happen", throwable);
      }
    }
  );

Weird, a Retrofit (1.x) call should only ever raise a RetrofitError, and we’re getting something else. Here’s the log:

java.lang.IllegalStateException: Cause already initialized
    at rx.internal.operators.OnSubscribeOnAssembly$OnAssemblySubscriber
.onError(OnSubscribeOnAssembly.java:118)
    at ...
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
Caused by: rx.exceptions.AssemblyStackTraceException: Assembly trace:
    at ...
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)

The actual stacktrace of the IllegalStateException is missing.

Digging into RxJava

I look at OnSubscribeOnAssembly.java line 118:

@Override public void onError(Throwable e) {
  new AssemblyStackTraceException(stacktrace).attachTo(e);
  ...
}

Ugh. What is this AssemblyStackTraceException?

A RuntimeException that is stackless but holds onto a textual stacktrace from tracking the assembly location of operators.

Interesting. What does AssemblyStackTraceException.attachTo() do?

public void attachTo(Throwable exception) {
  for (;;) {
    if (exception.getCause() == null) {
      exception.initCause(this);
      return;
    }
    exception = exception.getCause();
  }
}

Whenever an Observable is created, OnSubscribeOnAssembly captures a stacktrace and holds on to it. Then, when an exception is thrown in the corresponding Rx chain, an AssemblyStackTraceException* *is added as a cause at the bottom of the exception chain, with a message string that contains the stacktrace of where the Observable was created. Neat!

We saw the result of this in our initial example:

rx.exceptions.OnErrorNotImplementedException: Sequence contains too many elements
  at ... lots of Rx internals
  at rx.Observable.subscribe(Observable.java:9989)
  at com.squareup.Example.test(Example.java:116)
  at ...
Caused by: java.lang.IllegalArgumentException: Sequence contains too many elements
  ... 38 more
Caused by: rx.exceptions.AssemblyStackTraceException: Assembly trace:
  at rx.Observable.create(Observable.java:98)
  at rx.Observable.lift(Observable.java:251)
  at rx.Observable.single(Observable.java:9337)
  at com.squareup.Example.test(Example.java:115)

Cause already initialized

Now that I understand how assembly tracking works under the hood, I look at the logs again:

java.lang.IllegalStateException: Cause already initialized
    at rx.internal.operators.OnSubscribeOnAssembly$OnAssemblySubscriber
.onError(OnSubscribeOnAssembly.java:118)
    at ...
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
Caused by: rx.exceptions.AssemblyStackTraceException: Assembly trace:
    at ...
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)

I still don’t know where this IllegalStateException is coming from, since I don’t have a stacktrace for it. The message says Cause already initialized, and we just saw a piece of code trying to do just that:

public void attachTo(Throwable exception) {
  for (;;) {
    if (exception.getCause() == null) {
      exception.initCause(this);
      return;
    }
    exception = exception.getCause();
  }
}

I open Throwable.initCause():

public void initCause(Throwable cause) {
 if (this.cause != this) {
   throw new IllegalStateException("Cause already initialized");
 }
 this.cause = cause;
}

Tada! I just found our mystery exception. What exactly is going on with cause and this? Let’s look at how a Throwable is constructed:

public class Throwable {
  /*
   * The throwable that caused this throwable to get thrown, or null
   * if this throwable was not caused by another throwable, or if 
   * the causative throwable is unknown. **If this field is equal to
   * this throwable itself, it indicates that the cause of this
   * throwable has not yet been initialized**.
   */
  private Throwable **cause = this**;

  public Throwable() {
  }

  public Throwable(Throwable cause) {
    this.cause = cause;
  }
}

A Throwable can be constructed with no cause, in which case the cause field is set to this to mark that the cause hasn’t been initialized. You can then call initCause() later… but only if the cause has not yet been initialized.

The cause of a Throwable can only be set once.

We’re crashing because we’re calling initCause() but the cause has already been set. Let’s look at the code again:

public void attachTo(Throwable exception) {
  for (;;) {
    if (exception.getCause() == null) {
      exception.initCause(this);
      return;
    }
    exception = exception.getCause();
  }
}

We’re going down the exception chain until we find an exception that has a null cause. Then we call initCause() on it. When is getCause() returning null?

public class Throwable {
  /*
   * Returns the cause of this throwable or null if the
   * cause is nonexistent or unknown.
   */
  public Throwable getCause() {
    return (cause == this ? null : cause);
  }
}

Wait a minute. The cause can be nonexistent or it can be unknown. Those are two different things!

When the cause field is set to itself, the cause is nonexistent. When the cause field is set to null, the cause is unknown.

In terms of Java code, here’s the difference:

new Throwable(); // nonexistent cause (not initialized)
Throwable cause = null;
new Throwable(cause); // unknown cause (initialized to null)

In both cases, getCause() returns null. However, if the cause is unknown (initialized to null), then initCause() will throw an IllegalStateException.

Lost Cause

Can we fix RxJava so that it doesn’t call initCause() when the cause is unknown? Well, it turns out, there is no way to check for this.

The JDK provides no API to determine whether the cause of a Throwable is nonexistent or unknown.

The only option is to try and report a failure. I open a pull request in RxJava.

public void attachTo(Throwable exception) {
  for (;;) {
    if (exception.getCause() == null) {
      try {
        exception.initCause(this);
      } **catch (IllegalStateException ignored)** {
        onError(new RuntimeException("Unknown cause", exception);
      }
      return;
    }
    exception = exception.getCause();
  }
}

After reading the Javadoc, I realize that initCause() exists for backward compatibility issues where a legacy exception class lacks a constructor that takes a cause.

initCause() should only be called right after constructing an exception, it’s wasn’t designed to add metadata to an exception chain.

While assembly tracking seems to be built on a hack, the benefits clearly outweigh the downsides of this unknown cause edge case.

Root cause analysis

Now that RxJava has proper error reporting when assembly tracking fails, I can figure out what happened in the UI test failure that triggered this investigation:

connectService.getClientSettings(clientId)
    .subscribe()

We are testing an HTTP error scenario, so Retrofit creates the raised exception with RetrofitError.httpError():

public class RetrofitError extends RuntimeException {
  public static RetrofitError httpError(Response response) {
    return new RetrofitError(response, **null**);
  }

  RetrofitError(Response response, Throwable exception) {
    super(exception);
    this.response = response;
  }
}

Bingo! RetrofitError.httpError()* *creates an exception with an unknown cause, which is why assembly tracking is failing.

RetrofitError only exists in Retrofit 1.x, so I submit a pull request to fix it on the 1.x branch. There won’t be a new public release, since Retrofix 2.x has been out for two years now.

Square Register is our last app not migrated to Retrofit 2. We will soon complete the migration work. In the meantime, we’ll use an internal release of Retrofit 1.x.

Epilogue

Looking into a UI test failure led us on an interesting journey! Today, we learned that:

  • A Throwable can have a nonexistent or an unknown cause; those are two different things, and there is no API to figure out which is which.

  • Assembly Tracking makes Rx debugging easier, and uses the Throwable API in a non-intended way to append metadata to a stracktrace.

  • Passing null as a cause to a Throwable constructor seems harmless, yet leads to subtle bugs years later.

Feel free to provide more insights or ask questions!

Table Of Contents