You're reading the Ruby/Rails performance newsletter by Speedshop.

I'll be speaking at Alt::BrightonRuby on June 1.

Scaling is mostly about just one metric: queue time.

Imagine standing in line at a grocery store. You've got a cart full of groceries, and there's just one counter open. There are five people in line ahead of you. It looks like you're going to be waiting a while.

Don't you wish they would add one more checkout counter? Can't they get someone to come back from break?

This example seems pastoral, but all you have to do is change the timescale from minutes to milliseconds and this is exactly what happens in production when scaling a web service.

Adding more "checkout counters" (called "servers" in queueing theory) decreases the average amount of time a customer spends waiting in line. This is an intuitive result.

However, the experience of checking out one's groceries is actually made up of two operations: waiting for a checkout counter and then actually being checked out by the clerk. In queueing theory, we call the former "wait time" or "queue time", and the latter "service time". Add them together to get the total time spent by a customer getting their groceries checked out.

Scaling server counts up and down is not about reducing service time. It's about reducing queue time. In a properly configured system, service time is completely independent of queue time.

This is really critical to understand, because misunderstanding this leads to incorrect decisions and bad configurations around scaling.

Heroku's autoscaler works based on total response times. That's queue time plus service time. In theory, service times are constant, so the only variation in response times should be queue times, but in practice, this is not the case.

Why do service times in a Ruby web application vary over time? Think about why this might be before reading on.

One common pattern I see in my consulting is that response times get slower during periods of high traffic and faster during periods of low traffic. You might think that this is because the system is poorly configured and is running into CPU contention issues during periods of high traffic, but if you look at these apps' response times for an individual endpoint (say the UsersController#index action), each individual endpoint has constant, non-varying response times over the course of the day.

What you're seeing is that a change in the traffic mix is causing average response times to change. During the night, a site is visited by more bots and crawlers than it is during the day. Let's say that crawlers and bots make up 20% of the traffic mix during the day and 80% during the night. If the endpoints and pages that crawlers and bots hit are mostly fast pages, your average response times will decrease or increase accordingly.

This causes Heroku's autoscaler to scale up during the middle of the day even when it doesn't have to, because queue times may be unchanged.

Background jobs (i.e. Sidekiq) have a similar queue time/service time distinction. In background jobs, wait time is the time between enqueueing the job and a worker picking it up to be processed. Service time is how long it actually takes to run the job. Scaling your worker server counts up and down doesn't make jobs go any faster, but it does make your queues shorter and reduce time spent in those queues.

You wouldn't scale your background jobs based on job execution times. So don't scale your web servers based on response times.

Instead, you should be scaling based on request queue times. Currently on most cloud providers, this is only possible via add-on services or external providers.

How much queue latency is a lot? It's hard to say for any individual app. Think about it this way - let's say scaling your app up by 1 additional server decreases average queue latency times by 50 milliseconds. If that server cost you $50/month, you're paying $50/month to reduce your average total response latency by 50 milliseconds. Is that a good tradeoff for you?

Background jobs will have considerably different requirements - you might be OK with some background jobs taking 30 minutes or more to wait in line. Some other background jobs, like emails, maybe need to execute within the next 30 seconds. You should be organizing your queues based on desired queue latency.

Hope you found this interesting. If you're having trouble scaling your Ruby web application, reply to this email: I'm working on something new that I want you to see.

-Nate

You can share this email with this permalink: https://mailchi.mp/railsspeed/scaling-is-almost-entirely-about-queue-latency?e=[UNIQID]

Copyright © 2020 Nate Berkopec, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.