How to read Flamegraphs and Profiling Results

Howdy Rubyists! Time for another installment of the Speedshop newsletter. This time, I want to talk about reading the flames.

When you're looking at the profiling output of a Ruby web application, one thing you have to understand is the concept of the "stack frame", and how all of these frames put together constitute the "stack". You also should be generally aware of what the different components of the typical stack are, so that you understand how to find the important parts of a profile and what parts you can generally ignore. Today, I'd like to talk about reading flamegraphs and profiling results in general.

All sampling profilers (stackprof being the most popular one in Ruby) work by stopping processes at regular intervals and recording the current stack. It does this about once per millisecond, and the recorded stacks are put next to each other to create a flamegraph. For example, if we have stack frames A, B, C, D, E, and F, our flamegraph might look like:

EE
DDDD
CCCCCC
BBBBBBF
AAAAAAAA

A is the "lowest" part of our stack. It's the entry point to the entire program. All other methods in this profile/flamegraph had A somewhere in their stack (example: E was called by D, which was called by C, which was called by B, which was called by A). Sometimes methods call multiple child methods (in the case of A, it called methods B and F).

This profile had 8 samples. So, we can infer, that at a rate of 1 sample per millisecond, it took about 8 milliseconds to run this code.

We'll make a distinction in an individual sample between the "top level" frame and the "stack" itself. For example, in the second sample, method B is the top of the stack (the "top level") and method A is its parent.

Profiles make various statements about how often a particular frame (a particular _method_) appeared in the complete profile. For example, method A appeared in 100% of our samples. Method E appears in 25%. Method F appeared in 1 (12.5%).

Unfortunately, these percentages often tell us very little by themselves. For example, consider method B. It appears in 6 samples, or 75% of the total. So does that mean it's an interesting method? Maybe not. Note how B is never the "top level" of our profile - it appears only as the parent to other methods. So, we don't actually spend a lot of time in method B, but we do spend time in the things it calls out to.

When you see methods like that in a profile, they're only interesting if you have any control over whether or not B is ever called. For example, if method B is something internal to your application, like a business/domain operation (`UserCheckout.run!`), then it is very interesting - you spent 75% of your time in that operation! However, these methods are often internal methods from other frameworks, like Rails. It isn't really saying much if you're spending 80% of your total time inside of an ActionDispatch Rack middleware, because that middleware wraps all requests and doesn't appear as a "top level" frame anywhere.

On the contrary, top level frames are usually very interesting. Note how D and E appeared as top level frames for 25% of the profile, respectively. If I commented out the call to method E we would expect the entire program to get 25% faster!

I hope this gives you a bit more of an idea of what you're looking at when you're looking at flamegraphs or other profiling output. It's a bit initimidating at first, because it's so much information! But understanding the relationships between the top level and parent frames really can help you to undestand where your application is spending its time.

Until next time,

Nate

You can share this email with this permalink: https://mailchi.mp/railsspeed/how-to-read-flamegraphs-and-profiling-results?e=[UNIQID]

Copyright © 2019 Nate Berkopec, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.