You're reading the Ruby/Rails performance newsletter by Speedshop.

Should anyone ever have to wait?

In response to last week's newsletter, I received a few questions about my comments around queue latency and scaling.

When should stop scaling? Shouldn't we scale up until queue latency is zero? What if we want zero queue latency?

We can think through this using reasoning by analogy: all operators of call centers want queue time, that is, hold time spent listening to that wonderful music, to be as close to zero as possible.

As we discussed last week, utilization and queue latency have a nonlinear relationship. As utilization decreases, queue latency approaches zero. So, as you add more operators to your call center, hold time decreases to near-zero.

The only call center that would be guaranteed to serve all calls with zero hold time would be one that always had one more available operator - that is to say, infinite operators. For any finite quantity of operators, there is a call center load such that the available capacity is exceeded, and you will have to hold.

In a system where work is queued, queue time approaches zero but only meets it at infinity.

So, should a call center always add operators until queue time is zero?

Of course not. You'd run out of money first. The answer to the question of "why not scale up to zero latency" is "you don't have infinite budget."

This also establishes a relationship between queue latency, utilization, and cost. As queue latencies approach zero, cost increases because lower utilization is required to maintain that low queue latency.

Because you don't have an infinite budget, you instead need to establish the tradeoff: how much queue latency will you remove for how much money per month?

For instance, consider a web service with an average queue latency of 100 milliseconds. You experiment for a day and determine that adding 1 additional Heroku Perf-L dyno reduces queue latency by 50 milliseconds.

So, that's $500/month for a 50 millisecond latency reduction. Should you scale up?

This is where I think many of you may be lost. What I can tell you is this: every major call center operator understands _exactly_ what the cost of 1 minute of average hold time is. Delta, United, etc. all know almost to a dollar what the impact on their revenue will be if call center hold times are 30 seconds longer than they are today.

You know that there is a relationship between latency and customer satisfaction (and therefore, revenue) implicitly. Consider that today your business makes $X/year with a browser load time of Y seconds. That's one point on our graph. We all agree that if your browser load time was actually 100 * Y, you would probably be making $0. Drawing a line between those two points creates a slope. Of course, this relationship is not a simple line (probably more like a downward exponential slope), but it gives us a place to start.

The slope of that line, between those two points, is the tradeoff between revenue and latency. We can use this line to determine when to stop scaling. In my previous example, you should scale your servers up until the next server cost more than $X/100 (because scaling up reduces latency and increases revenue by $X/100 dollars).

Now, a simple way to achieve zero queue latency without an infinite budget is to stop queueing. A system that rejects load beyond its capacity is sometimes called an Erlang-B system (queueing systems are Erlang-C systems). If you reject connections instead of queueing, there is no queue time, but of course instead you're rejecting requests.

This isn't possible for a call center, but there may be operations in your business that you want to carry out "now or not at all". I'll leave thinking about those as an exercise for you. You're essentially trading availability for latency. This is all a queue does: it creates the illusion of 100% availability by creating a buffer when the system is not actually currently available. If the system need not always be available, queues may add unnecessary latency.

Until next week, Rubyists!

-Nate

You can share this email with this permalink: https://mailchi.mp/railsspeed/when-should-you-stop-scaling-why-cant-queue-latencies-be-zero?e=[UNIQID]

Copyright © 2020 Nate Berkopec, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.