You're reading the Ruby/Rails performance newsletter by Speedshop.

The Rails Performance Workshop is now available, for individuals and teams.

Is Puma the fastest Ruby app server?

Puma is not faster than any other Ruby application server.

There! I said it. I'm a maintainer of the project, and yet I don't think it's all that fast. Actually, I think Puma is probably a little bit slower than other app servers.

But, what does the word "fast" mean in this context?

Usually when we're talking about fast we mean latency: how long it takes to do a particular action. So, in the context of an app server, we mean "how long does it take this app server to serve a trivial response". A trivial response would be a "hello world" app. In Rack terms, something like:

lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] }

Now, I've done this benchmark with Puma, and it turns out it takes Puma about 100 microseconds to serve that response. That means that 1 Puma process can serve 1/0.0001 requests per second in this scenario, or about 10,000 requests per second. Great!

Now, let's say some other app server comes along and claims they can serve 50,000 requests per second. That's 5 times better! That means that server is 5 times faster than Puma, right? Shit, better switch all of your apps immediately!

Well, it does mean, factually, that this other server adds 20 microseconds of overhead to each request rather than Puma's 100 microseconds. This much is very, very true. But app server overhead doesn't really change too much between a hello world response and a bigger, more realistic response. So if you switch your app server from Puma to this other server, you're only removing about 80 microseconds of response time. I'm not sure that's something you could brag about to your boss.

So, latency is a bad metric when comparing application servers. One area that app servers do differentiate themselves, however, is their concurrency model. Concurrency models impact throughput, which is a metric we do very much care about.

Puma uses a thread pool to fulfill requests. Even with the GVL, it means that Puma can process about 25-40% more requests-per-second for the typical application than a single-threaded application server, like Unicorn. This was recently demonstrated at Gitlab, which noticed 30% lower memory usage across their rather massive fleet when they switched to Puma.

I blogged a while ago about the practical effects of the Global VM Lock (GVL) on scaling in Puma and Sidekiq, which can give you some insight into _why_ web apps see a 25-40% capacity increase with Puma.

So, when choosing app servers: don't look at benchmarks, look at concurrency models and other features that you need. Benchmark games are just that: games.

Until next week,

-Nate

You can share this email with this permalink: https://mailchi.mp/railsspeed/comparing-app-server-performance?e=[UNIQID]

Copyright © 2020 Nate Berkopec, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.