Copy
You're reading the Ruby/Rails performance newsletter by Speedshop.

Looking for an audit of the perf of your Rails app? I've partnered with Ombu Labs to do do just that.

When should you not add a new background job to a controller?

This tweet got a bit of attention:
 
I got two very good replies from Avi Flombaum (founder of the WeWork-acquired Flatiron School) and Ryan Bates (of Railscast fame):

In particular, Avi's question got me thinking about checkout flows in e-commerce apps, which is where this problem is extremely apparent. One of my first jobs was working at an e-commerce company running on the old Spree extension. Rendering the "checkout success!" page was extremely slow, because hundreds of ActiveRecord callbacks would get fired, leading to tons and tons of database updates, not to mention we were waiting inline on the response from the credit card processor.

I have a couple of thoughts regarding Avi and Ryan's questions:

Services And Transactions Are Just Jobs You Run Synchronously

It's often popular to wrap up a set of steps in a "service". Maybe after you complete a checkout, a bunch of services have to be notified and a lot of state needs to be updated. You put all of this logic inside of a class called CheckoutCompleter and call CheckoutCompleter.call(checkout_object) in a controller, or something like that.

Here's the thing: this is almost identical to a background job. Call it CheckoutCompleter.perform() instead and you start to see the similarity. Consider the most basic possible ActiveJob:

class GuestsCleanupJob < ApplicationJob
  queue_as :default

  def perform(*guests)
    # Do something later
  end
end


If you remove the inheritance from ApplicationJob and the queue_as call, it's just a "service object". Jobs are just "service objects" which optionally can be run asynchronously. So, anywhere you would insert a "service object", you can insert a background job.

I'm using the "service object" terminology here because it's easier to see the similarity with a background job, but this is just as applicable to the Vanilla Rails style.

There is a Minimum Viable Async Unit

When shouldn't you async things in a background job? 

1. You need the result of the job to render the response (my original tweet)
2. The code chunk you're looking at takes less than ~10 milliseconds to run.
3. Asyncing this would put too much pressure on your background job infrastructure, either memory or load.

If the chunk of code you're considering making async just always takes less than 10 milliseconds to run, you're adding cost (complexity, memory usage in Redis or the background job store, additional workers required to run these jobs) without much benefit.

There's also some types of potential jobs which are just too difficult to make a job. For example, if a chunk of logic needs a lot of data (say, more than 128kb worth) then it's probably true that the time and memory usage of serializing all the data into the background job store is just not worth it and would be too slow.


You Can Async Almost Anything If You Try Hard Enough


Now I want to return to Avi's thoughts about asyncing inventory during a checkout flow. I want to use this as an illustration that almost anything can be done asynchronously if you frame it differently. 

In a checkout flow, we wouldn't want to asynchronously check if there is enough inventory to complete the transaction. Then, we could end up showing a user that their checkout was successful, but the background job that checks inventory might raise an error because inventory was insufficient. What do you do in that case? Yikes.

Instead, flip the problem around. Reserve inventory pessimistically when a user enters the checkout flow and release that lock when they either abandon the checkout or checkout successfully. That way, when you're completing someone's checkout, you're releasing the lock on a piece of inventory rather than trying to acquire it, which is much much safer to do.

At this point, we can move the "unlocking inventory" into a background job as well, because we don't need to unlock the inventory before we return a response to the user.
 

Enter the Danish Scaling Master, Simon:

This particular problem of checkout process scaling is a great example of this issue at work.

I thought this might be a good thing to run by my friend Simon Eskildsen, former Principal Engineer at Shopify and current freelancer and author of the excellent newsletter Napkin Math. Specifically, I asked him about how he would build a scalable checkout process. Here's what he had to say:
 
 
"It's a devilishly complicated problem that looks deceptively simple. It's been a while since I thought about checkout, and fortunately never had to dive too deep into the inventory code at Shopify—so this isn't particularly inspired by how that works. Some loose thoughts!

(1) Failed payments are a problem. You also have to decide if you want reservations, i.e. preventing people from typing in their address to get an inventory error later. I think you should in your flash sale scenario.

Ideally, to implement reservations, you wrap the entire checkout flow in a serializable transaction (or `SELECT FOR UPDATE` at lower isolation levels), then decrement the inventory at the end. This doesn't scale. It means throughput is the average length for a human to complete checkout.

Instead, you could claim inventory at the beginning of checkout. On a relational database, the first thing I can then think of would be to have an inventory table where you do `SELECT FOR UPDATE COUNT(*) FROM inventory_claims WHERE created ≥ NOW() - interval 10 minute`, and if it's below the product's stock level (which is updated when the payment goes through), then you `INSERT` an inventory claim for your session in the same transaction you did the serializable COUNT(*), and go through checkout. I'm probably missing some detail here, but this is the first thing that comes to mind.

After checkout in a job when the payment has been authorized, you remove your inventory claim and update the inventory level of the product. If you need to minimize lock contention further, you could shard the inventory claims table giving each product e.g. 50 claims per table, or move the lock to Redis (which, due to its single-threaded nature and Lua, would probably be more straightforward). Your claim expires after 10 minutes. A background job repeatedly cleans out this table asynchronously. It could even be that job that updates the inventory directly on the product.

This should scale pretty well as long as you aggressively prune the table and keep indexes to a minimum; this should be able to do 1,000s of claims per second which is most likely sufficient in this case. You likely want at least to put the actual processing of the checkout behind a semaphore to limit concurrency slightly so that the database can prioritize the inventory claims. These transactions are generally heavy as they usually create several objects like a customer, order,  fulfillment objects, etc. immediately after a payment, which takes a while too.

(2) I think the best customer experience is to push as much of this as possible into a background job. For legitimate customers, their credit cards generally wouldn't get declined. If the credit card gets declined, you could increase their inventory claim time by, e.g. 30 min to allow them to fix it (unless the fraud rate is suspected to be high)."
Well put, Simon.

I hope this has given you something to think about. Until next week,

-Nate
You can share this email with this permalink: https://mailchi.mp/railsspeed/when-not-to-use-a-background-job?e=[UNIQID]

Copyright © 2022 Nate Berkopec, All rights reserved.


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.