CRuby Memory Slots: See Them, Tweak Them, Make Them Fast

You've probably read a lot about how Ruby handles memory over the years. If you haven't: there's a lot. Ruby is a dynamic language, and managing memory in dynamic languages is complicated. Managing memory well and fast in dynamic languages is usually very complicated. For instance, here's a very simple summary of how garbage collection works in Java. It's similar -- and complex.

Mostly you don't have to care. Most Java developers don't know that whole summary and most Ruby developers understand only a small fraction of how Ruby memory management works. Yay! If you had to know all that to write a program, it would be terrible.

You care about the specifics of how Ruby manages memory if you're optimizing: if you're trying to make your program faster, or have it run in less memory. Let's talk about how Ruby memory works, and how you can tweak it.

We'll talk about the Slots that Ruby uses - what they are, how you check them and how you optimize them.

 

Ruby Objects: Cheap, Cheaper, Expensive

Ruby has references. And mostly those references point to objects. The Ruby source code calls the references VALUEs, and they need to fit into 64 bits.

Tiny Ruby objects like "nil" and small integers don't get any allocation besides their reference - the number seven, for instance, doesn't get an object allocated to it (sorry, seven!) It gets stored inside the VALUE using bitwise trickery. Ruby doesn't keep an extra copy of seven, or a single frozen copy of seven or something. So tiny objects don't really count in terms of allocated memory or garbage collection. An Array of them can, though -- the Array holds a lot of references (VALUEs.) And while each reference doesn't get a Slot, the Array itself does.

What counts as "tiny?" Integers that Ruby can store in 31 bits or less, between around negative one billion and positive one billion. True, false, undef and nil. Floating point numbers. Symbols. There's a bunch of C code showing how they get encoded into VALUEs.

Every Ruby object that isn't encoded into its VALUE gets a 40-byte "slot". The Ruby structure in the slot is called an RVALUE. Some objects are small enough to fit entirely into a single Slot, such as an Array with up to 3 elements, a smallish Bignum or an Object with only a few instance variables. Ruby says these values are embedded in their RVALUEs if they fit there completely. Since every slot is 40 bytes, Ruby allocates them in big chunks called "pages" for efficiency. So instead of one allocation per Slot, Ruby does one allocation per page of 408 Slots. 408 is how many 40-bytes objects fit into a 16kb memory page after a bit of header overhead.

Objects too big to fit inside a Slot get both a Slot and a chunk of Heap. Heap is "extra" space for bigger Objects. Heap is allocated with normal C malloc unless you build a custom Ruby to do something different. Chunks of Heap take a lot more work to allocate and free. The require more memory and a lot more CPU. A page of Slots still gets allocated via malloc. But it takes one malloc per 408 Slots instead of one malloc per single object. So objects that fit inside a Slot are much cheaper. (Curious about more details? Pat Shaughnessy's Ruby Under a Microscope covers this extensively in chapter 5.)

Here's something interesting: Ruby can't easily reassign a slot. If you use a Slot and later free the object, Ruby waits until all the Slots on that page are free and then frees the page. Ruby doesn't reassign that Slot to a new object, just in case somebody (I'm looking at you, C extensions!) is holding onto a pointer and messes with the new object, thinking it's the old object. Aaron Patterson is trying to fix this, but that's all experimental. Right now, Ruby doesn't free a page unless all the Slots in it are free.

 

Becoming a Slot Whisperer

Okay, so how do you track Ruby Slots? The first thing you do is call GC.stat. (Disclaimer: GC.stat changes between Ruby versions and depending how you compile your Ruby, so you may see a slightly different set of keys in the hash table!)

If I pop into irb in Ruby 2.3.1, here are the GC stats I see:

hostname:ruby noah.gibbs$ irb
2.3.1 :001 > GC.stat
 => {
   :count=>11,
   :heap_allocated_pages=>132,
   :heap_sorted_length=>133,
   :heap_allocatable_pages=>0,
   :heap_available_slots=>53802,
   :heap_live_slots=>48358,
   :heap_free_slots=>5444,
   :heap_final_slots=>0,
   :heap_marked_slots=>20752,
   :heap_swept_slots=>20813,
   :heap_eden_pages=>119,
   :heap_tomb_pages=>13,
   :total_allocated_pages=>132,
   :total_freed_pages=>0,
   :total_allocated_objects=>184094,
   :total_freed_objects=>135736,
   :malloc_increase_bytes=>13040,
   :malloc_increase_bytes_limit=>16777216,
   :minor_gc_count=>8,
   :major_gc_count=>3,
   :remembered_wb_unprotected_objects=>201,
   :remembered_wb_unprotected_objects_limit=>364,
   :old_objects=>20081,
   :old_objects_limit=>33674,
   :oldmalloc_increase_bytes=>1786560,
   :oldmalloc_increase_bytes_limit=>16777216
 }
2.3.1 :002 >

That's fairly imposing. But you know enough to interpret some of it. Let's talk about that.

"Major GC count" and "minor GC count" are just counting how many times Ruby has garbage collected. These are just how many times they've happened. And "count" is just major plus minor.

"Heap live slots," "heap free slots" and "heap final slots" are about Slots - that's what we're looking for. "Live" slots have objects in them, "free" slots don't, and "final" slots are waiting to be cleaned up and garbage collected.

Pages are also important. "Tomb" pages have no live objects and can be handed back to the operating system. "Eden" pages have live objects. Remember how there are 408 slots per page? We can find out how many of those pages are hanging out in memory.

(Want more detail on all the bits of GC.stat? Here's Nate Berkopec's blog post for that, and you can expect more posts from me as I talk more about Ruby memory.)

 

The Hideous Secret of Fragmentation

Ruby can't free a page until all the Slots are free. What does it do with a page of 408 Slots when three last Slots never get un-referenced?

Short answer: they stick around forever, if you never unreference those last few.

This results in fragmentation - all your pages have a combination of used and unused slots. If you have a lot of pages with only three real objects each, that results in very bad fragmentation of Slots. You're allocating space for 408 and using 3. Not so good.

The word fragmentation can mean several different things. This "Slot fragmentation" doesn't happen in a language that only uses malloc and free, because there are no Slots. In languages that only use malloc/free there are two kinds of fragmentation. "Internal fragmentation" is extra space added to each block of allocated memory. "External fragmentation" is extra space in between chunks of allocated memory. So: keep in mind that there are at least three different ways to measure fragmentation that might apply to Ruby. "Slot fragmentation" is one.

So how do you measure Slot fragmentation? Like this:

stat = GC.stat
used_ratio = stat[:heap_live_slots].to_f / (stat[:heap_eden_pages] * 408)
fragmentation_ratio = 1.0 - used_ratio

You take the number of eden (live) pages from GC.stat, multiply by 408, and then see how many objects you have inside those pages. You don't expect the fragmentation ratio to be exactly zero - that's only true if you exactly fill every Slot and you happen to have a multiple of 408 objects and no waste at all. But if you see your fragmentation ratio get around 0.2 or 0.3, you're wasting a lot of space - 70%-80% of your total Slots. A freshly-booted irb session has about a 0.006 fragmentation ratio, or a 0.994 "used" ratio.

You also expect the fragmentation to increase over time for a running process, because you'll have the occasional stray page with a few objects you can't free. But if Slot fragmentation keeps increasing, you're probably doing something wrong.

Nate Berkopec says that if your process size goes up asymptotically -- approaches a line, slowly getting nearer and nearer -- then that's fragmentation, and mostly it's fine for a long-running process. But if your process size goes up linearly -- the same amount per hour, every hour -- then that's a memory leak and you need to hunt it down.

Using Your Slot Savvy

Okay, so now you know how to measure fragmentation. What do you do if it's too high?

  • See if you can do your allocations in a big block. Ruby makes it easy to autoload, but it's often better to load everything up front, where all your "keep them forever" classes and data will wind up on the same pages. That can make your fragmentation ratio significantly better.
  • If you have big chunks of data that you allocate on demand, try to do it right at the beginning. Not only will it not wind up in a page of freed data from a later request, it's more likely to wind up on the same pages with your early classes and code from last paragraph.
  • Any time you can avoid keeping a reference around, don't keep it around. The fastest memory management is always no memory management.
  • Do you have a structure that slowly grows or changes? That can be hard. But try to touch it in batches, where lots of allocations will wind up on the same page instead of scattered across many different pages.
  • Do you have a cache that's small and never cleared? If so, maybe you want to make it a permanent allocation up front, or get rid of it entirely.

And finally, don't sweat fragmentation too much. Each page is about 16 kilobytes (technically kibibytes, if you're a pedant.) If you're wasting 100,000 of them, that's worth a look. If you're wasting ten of them... It's 160kb. 

I'll be back next week or so with another lesson about Ruby memory!