You're reading the Ruby/Rails performance newsletter by Speedshop.

Looking for an audit of the perf of your Rails app? I've partnered with Ombu Labs to do do just that.

In benchmarking, more is always better, right?

If you've been reading my stuff for a while, you'll know that I'm an advocate of performance engineering. Performance engineering means that, when doing perf work, you always lay out your requirements and constraints up front, and then design a system that meets those conditions. We don't engineer our systems to be better (or worse) than required but engineer them precisely to spec. If you design a box that is 2x bigger than required (ooh! ahh!), but it doesn't fit on the customer's shelves, costs twice as much as they want to pay, and takes twice as long to make as stipulated, then you didn't design a "better" box, you designed the wrong box.

Every computing system has a performance requirement. If it didn't, then, implicitly, a system that finished its work a half-millisecond before the heat death of the universe would also meet the requirements. I have yet to meet a customer who would pay for such a system (and if you have, please refer them to me).

On the other hand, is exceeding a latency or throughput specification always good? Is it always better to serve 10,000 requests per second instead of 1,000? Is it always better if the page loads in 0.5 seconds instead of 1 second?

The answer is that, most often, exceeding requirements comes at the cost of violating constraints. We have many constraints in programming, but the most important are time and money. Of course, if we can meet our constraints, then exceeding the requirement is usually fine (though we may want to check with the customer if, for example, they would rather have a system 2x as fast or one that is ready 2x sooner).

The best trick I know of gauging whether exceeding a requirement is useful or not is to flip the denominator and numerator. For example, instead of requests per second, think in terms of seconds per request.

A real-world example: Puma

I am one of the maintainers of the Puma project, a Rack web application server. I've set an informal requirement that Puma must not add more than 1 millisecond of overhead to every request. This requirement was drawn from what I know about HTTP web applications: most of them take at least 10 milliseconds to return a response and usually closer to 100 milliseconds or more. So, for most applications, Puma is less than 10% (but usually less than 1%) of the total response time. That feels good to me. If we shipped a feature that reduced that overhead from 1 millisecond to 0.1 milliseconds, it really wouldn't make that much of a difference to our users, who are using Puma to serve HTTP applications with latency of 10 milliseconds or more.

Benchmarks for webservers, however, are usually done with the denominator and numerator flipped here, so that we are expressing throughput instead of latency. This is how TechEmpower benchmarks are scored, for example. That would be a benchmark in terms of "requests per second." Today, Puma currently scores something like 10,000 to 15,000 requests per second, depending on the setup.

This benchmark doesn't really make sense at all for an application server. The job of an application server is to serve applications. Does it matter how many "hello world" responses a server can serve per second? That kind of "application" is so far removed from what our users are actually doing that the comparison is completely useless.

For example, let's say we're considering adding a feature to Puma that will add 0.5 milliseconds of overhead to every request. If you looked at our current benchmarking numbers, Puma does something like 10,000 requests per second, or 0.1 milliseconds per request. Merging this feature would absolutely destroy our throughput benchmark - our new result would be 1666 requests per second, an 84% decrease. Should we merge the feature?

Think about it this way. At present, 99% of actual, real-world Puma applications spend 10 milliseconds or more doing work in their applications. That means, for 99% of Puma users, we make their app 6% slower. Now, is this feature worth it? Maybe, maybe not. But I can think of a lot of features and changes I'd merge at the cost of a 6% performance penalty!

The frame makes all the difference here. By merging this feature, I'm essentially saying that you shouldn't use Puma if your application needs to return responses in less than about 2 milliseconds. But the number of HTTP apps that actually need to do that could probably be counted on one hand.

However, there are certainly other scenarios or use cases where half a millisecond matters. For example, if Puma wanted to improve WebSocket's performance, we would need to be cautious about the overhead of idle connections. Idle connections having 0.5 milliseconds of overhead would greatly affect the number of those connections we could handle, and having 1,000+ idle connections per Puma process is certainly not out of the question.

As an application author, you know better than anyone else (hopefully!) what is expected of that app's performance. But don't get lured into thinking that more is better: flip the ratio, and think about how much performance is enough.

- Nate

You can share this email with this permalink: https://mailchi.mp/railsspeed/the-denominator-matters-in-benchmarking?e=[UNIQID]

Copyright © 2023 Nate Berkopec, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.