Distributed Designs Part 3 — Taking lock in MVCC for transactional outbox pattern

This is the third part of the Distributed Designs series. For your convenience you can find other parts in the table of contents in Part 1 — Outbox without idempotency nor synchronous commit

Last time we saw how to use transactional outbox pattern with multiple databases and no synchronous commit. We also learned how to take the lock on the database end to make sure that we can scale out the Relay. However, what if we use Multi-version Concurrency Control (MVCC) or can’t enforce pessimistic locking in general? Can we implement the lock in that case?

Solution

We need to implement leases. The idea is:

  • Create a table with leases
  • Take the lease if it’s not taken yet
  • Process rows
  • Release the lease

The only tricky part is how to release the lease automatically in case of a crash. However, this we can do with timestamps. The idea is:

Now, we want to take the lease if and only if it’s not taken yet. Let’s do this:

We take the lease row. If it’s missing, then it means that the lease is not available (someone else holds it). We wait and try again.

If the lease is available, then we try to take it. We try updating the row and book it for one minute. We then try to commit. If this fails, then it means that someone else modified the lease. We need to back off and try again.

However, if we succeed to take the lease, then we can get the messages from the outbox table. We don’t need to do anything special now because nobody else will fiddle with the outbox. We need to finish in one minute, and then we can release the lease.

Why does it work? What isolation level should I use?

Let’s stop for a moment and analyze why it even works and what isolation level we should use. We tried updating the lease with the following:

What we should in fact do is this:

If we run this query with REPEATABLE READ, then the database engine simply can’t let the UPDATE to complete if someone else modified the row. That’s because the UPDATE needs to read the row again (to check the filtering criteria). Since we run on REPEATABLE READ, then the second read must return exactly the same data. Therefore, either nobody else modified the row, or the UPDATE fails and the transaction is rolled back.

Therefore, after the first commit we can be sure that we have the lease. I can’t really imagine a database that would claim that it meets the ACID requirements and would still let this commit to succeed just to break the data later on. Such a database would be a very interesting (not necessarily useful) case. Obviously, with distributed databases we may get some different isolation levels, some the actual optimizations may be different and break this mechanism. However, according to the SQL standard, this solution should work, as it simply implements the Compare-and-swap operation on the database level.

In short, this is your checklist:

  • Use REPEATABLE READ
  • If you can enforce eager locks in the database, just use them without and leases and rely on the locks maintained by the database engine
  • If you can’t enforce eager locks (so the database engine uses optimistic locking), then implement the lease protocol defined above

Now you can scale your Relay to multiple instances and still get the correct results.

Summary

We implemented a generic lock for MVCC. You can use it for the transactional outbox pattern or for whatever else.