Coroutines and Java Synchronization Don't Mix

Suppose you've got a critical section of code, meaning only one process should execute it at a time.

In the Java world, one solution is to slap synchronized on the method and call it a day. In Kotlin, we use @Synchronized to get the same effect:

repeat(2) {
  thread { criticalSection() }
}

@Synchronized
fun criticalSection() {
  println("Starting!")
  Thread.sleep(10)
  println("Ending!")
}

The above outputs the following (note the lack of interleaving):

Starting!
Ending!
Starting!
Ending!

Suppose we're using coroutines. What happens then?

val scope = CoroutineScope(Job())

repeat(2) {
  scope.launch { criticalSectionSuspending() }
}

@Synchronized
suspend fun criticalSectionSuspending() {
  println("Starting!")
  delay(10)
  println("Ending!")
}

In this case, the critical section has interleaved calls:

Starting!
Starting!
Ending!
Ending!

That's no good!


Understanding what's going on requires knowledge of how Kotlin coroutines work under the hood. They're implemented using continuation-passing style (CPS). (I go over briefly why that is the case in my talk on coroutines, but if you want a full understanding I suggest watching this deep dive on coroutines.)

For the purposes of this post, all you really need to understand is that suspending functions are not actually executing code line-by-line. When a suspending function gets to a suspension point, it stops execution for that function and passes control to a different function (with the understanding that the suspending function will be resumed later).

As such, what's really happening in the second code sample is:

  1. criticalSectionSuspending() starts, acquires the lock, prints out "Starting!"
  2. It gets to delay() (which is a suspension point) so it exits the function and releases the lock.
  3. With the lock released, the second execution of criticalSectionSuspending() begins, acquires lock, suspends, and releases the lock.
  4. When each delay is done, criticalSectionSuspending() is invoked again (but at a point after delay()).

That's hard to follow, so here's a timeline of what's happening (for one function call):

chart

As you can see, the period of time when it's delaying is not actually within criticalSectionSuspending() and there is no lock held. That's why multiple threads can access the same synchronized suspending function - they're not accessing it all at the same time.

This synchronization problem is known to Kotlin; in fact, if you try to use synchronized() {} around a suspension point, the compiler gets mad at you:

suspend fun criticalSectionSuspendingWontCompile() {
  synchronized(obj) {
    println("Starting!")
    // This code won't compile, with this message:
    // "The 'delay' suspension point is inside a critical section"
    delay(10)
    println("Ending!")
  }
}

I believe that @Synchronized ought to cause the same compiler error. There's an issue on YouTrack for it but it hasn't gotten much attention.


How do we solve the problem?

First, we should recognize that the problem isn't "we can't use synchronized." synchronized is just a means to an end: making critical sections work. And the only reason we have a critical section is because of shared mutable state. Therefore, the real problem we're solving is "we need a way to control shared mutable state."

What luck: there is an official Kotlin guide to shared mutable state which presents a few good solutions! In particular, the section on mutexes is most relevant here, as it most closely resembles synchronized.

val mutex = Mutex()
val scope = CoroutineScope(Job())

repeat(2) {
  scope.launch { criticalSectionSuspendingLocked() }
}

suspend fun criticalSectionSuspendingLocked() {
  mutex.withLock {
    println("Starting!")
    delay(10)
    println("Ending!")
  }
}

The above prints what we want again: non-interleaving messages.

I don't want people to conclude that Mutex is always the answer to shared mutable state; the guide gives multiple solutions and you should evaluate which works best in your situation. But in the case where you were using a synchronized function but now want to make it suspending, Mutex is the most natural way to make that conversion.