A Rubyist's Walk Along the C-side (Part 11): Debugging

This is an article in a multi-part series called “A Rubyist’s Walk Along the C-side”

In the previous article, we looked at ways to benchmark Ruby code and saw how a pure Ruby implementation could actually be faster than a poorly written C extension. In this article, we’ll look at the various ways to deal with the inevitable scenario: our C extension crashes.

Understanding Ruby’s crash report

In “A Rubyist’s Walk Along the C-side (Part 5): Variables”, we saw our first crash report. The crash report looks like this (it may vary a bit depending on your system and Ruby version):

test.rb:8: [BUG] Segmentation fault at 0x0000000000000033
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

-- Control frame information -----------------------------------------------
c:0002 p:0037 s:0010 e:000005 EVAL   test.rb:8 [FINISH]
c:0001 p:0000 s:0003 E:000a60 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
test.rb:8:in `<main>'

-- Machine register context ------------------------------------------------
  x0: 0x0000000000000000  x1: 0x0000000000000d01  x2: 0x000000016ce7eb20
  x3: 0x0000000103113f74  x4: 0x0000000000000000  x5: 0x0000000000001a80
  x6: 0x0000600000321a80  x7: 0x0000000000000001 x18: 0x0000000000000000
 x19: 0x0000000000000d01 x20: 0x00000001031232d0 x21: 0x0000000000000000
 x22: 0x0000000000000000 x23: 0x000000000000001b x24: 0x0000000103bd8000
 x25: 0x0000000103bd8000 x26: 0x0000000000000000 x27: 0x000000012ce5ccd8
 x28: 0x000000012ce052c0  lr: 0x0000000103960908  fp: 0x000000016ce7eb60
  sp: 0x000000016ce7eb10

-- C level backtrace information -------------------------------------------
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_vm_bugreport+0x9a0) [0x103983084]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_bug_for_fatal_signal+0x160) [0x1037a721c]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(sig_do_nothing+0x0) [0x1038e1940]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x38) [0x18248aa84]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(callable_method_entry_or_negative+0xc0) [0x103960908]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_vm_search_method_slowpath+0x10c) [0x103957024]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(vm_exec_core+0x2698) [0x10395afac]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_vm_exec+0xad4) [0x10396e1d8]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_ec_exec_node+0x12c) [0x1037b28b4]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(ruby_run_node+0x60) [0x1037b2720]
/Users/peter/.rbenv/versions/3.2.2/bin/ruby(main+0x68) [0x102f83f34]

-- Other runtime information -----------------------------------------------

* Loaded script: test.rb

* Loaded features:

    0 enumerator.so
    1 thread.rb
    2 fiber.so
    3 rational.so
    4 complex.so
    5 ruby2_keywords.rb
    6 /Users/peter/.rbenv/versions/3.2.2/lib/ruby/3.2.0/arm64-darwin22/enc/encdb.bundle
    7 /Users/peter/.rbenv/versions/3.2.2/lib/ruby/3.2.0/arm64-darwin22/enc/trans/transdb.bundle
    8 /Users/peter/src/ruby-c-ext-code/part5/gv_bug/ext/gv_bug.bundle

* Process memory map:

102f80000-102f84000 r-x /Users/peter/.rbenv/versions/3.2.2/bin/ruby
102f84000-102f88000 r-- /Users/peter/.rbenv/versions/3.2.2/bin/ruby
102f88000-102f8c000 r-- /Users/peter/.rbenv/versions/3.2.2/bin/ruby
...

[IMPORTANT]
Don't forget to include the Crash Report log file under
DiagnosticReports directory in bug reports.

Let’s decipher this crash report section by section.

Headers

test.rb:8: [BUG] Segmentation fault at 0x0000000000000033
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]

The headers give information like which file (test.rb) and line (8) in Ruby that crashed. It then tells us what kind of crash occurred (segmentation fault). The next line shows the Ruby version (3.2.2) and architecture (ARM64 on macOS).

Crash report log

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

A full crash report is written to one of these locations. If you think you’ve found a bug in Ruby, please open a ticket upstream and include the file generated in one of these directories.

Control frame information

-- Control frame information -----------------------------------------------
c:0002 p:0037 s:0010 e:000005 EVAL   test.rb:8 [FINISH]
c:0001 p:0000 s:0003 E:000a60 DUMMY  [FINISH]

The information here is generally not useful for us but more useful for debugging Ruby. But if you’re curious: c is the index of the control frame (which tells us which instruction sequence, or “block of code”, is executing.), p is the program counter (used to keep track of which instruction in the control frame we’re on), and e is the environment pointer (used to keep track of local variables). It’s then followed by a stack frame type (e.g. EVAL for a block of code, RESCUE for a rescue block). And then the file and line number.

Ruby level backtrace

-- Ruby level backtrace information ----------------------------------------
test.rb:8:in `<main>'

This is (one of) the most useful pieces of information, it gives us the full Ruby level backtrace of every file, line, and method in Ruby when the crash occurred.

Machine register context

-- Machine register context ------------------------------------------------
  x0: 0x0000000000000000  x1: 0x0000000000000d01  x2: 0x000000016ce7eb20
  x3: 0x0000000103113f74  x4: 0x0000000000000000  x5: 0x0000000000001a80
  x6: 0x0000600000321a80  x7: 0x0000000000000001 x18: 0x0000000000000000
 x19: 0x0000000000000d01 x20: 0x00000001031232d0 x21: 0x0000000000000000
 x22: 0x0000000000000000 x23: 0x000000000000001b x24: 0x0000000103bd8000
 x25: 0x0000000103bd8000 x26: 0x0000000000000000 x27: 0x000000012ce5ccd8
 x28: 0x000000012ce052c0  lr: 0x0000000103960908  fp: 0x000000016ce7eb60
  sp: 0x000000016ce7eb10

This information is also not very important for us. It dumps the values stored in every CPU register.

C level backtrace information

-- C level backtrace information -------------------------------------------
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_vm_bugreport+0x9a0) [0x103983084]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_bug_for_fatal_signal+0x160) [0x1037a721c]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(sig_do_nothing+0x0) [0x1038e1940]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x38) [0x18248aa84]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(callable_method_entry_or_negative+0xc0) [0x103960908]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_vm_search_method_slowpath+0x10c) [0x103957024]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(vm_exec_core+0x2698) [0x10395afac]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_vm_exec+0xad4) [0x10396e1d8]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_ec_exec_node+0x12c) [0x1037b28b4]
/Users/peter/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(ruby_run_node+0x60) [0x1037b2720]
/Users/peter/.rbenv/versions/3.2.2/bin/ruby(main+0x68) [0x102f83f34]

This is also one of the most useful pieces of information for us; it gives us the C level backtrace of the crash, with each line containing the binary path, function name, line number, and address of the instruction.

Other runtime information

-- Other runtime information -----------------------------------------------

* Loaded script: test.rb

* Loaded features:

    0 enumerator.so
    1 thread.rb
    2 fiber.so
    3 rational.so
    4 complex.so
    5 ruby2_keywords.rb
    6 /Users/peter/.rbenv/versions/3.2.2/lib/ruby/3.2.0/arm64-darwin22/enc/encdb.bundle
    7 /Users/peter/.rbenv/versions/3.2.2/lib/ruby/3.2.0/arm64-darwin22/enc/trans/transdb.bundle
    8 /Users/peter/src/ruby-c-ext-code/part5/gv_bug/ext/gv_bug.bundle

* Process memory map:

102f80000-102f84000 r-x /Users/peter/.rbenv/versions/3.2.2/bin/ruby
102f84000-102f88000 r-- /Users/peter/.rbenv/versions/3.2.2/bin/ruby
102f88000-102f8c000 r-- /Users/peter/.rbenv/versions/3.2.2/bin/ruby
...

The information in this section is generally not important, but I’ll go over it anyway.

The “loaded script” is the script that’s currently running.

The “loaded features” is a list of files that are loaded by Ruby. These may be Ruby files (e.g. thread.rb) or native binaries (e.g. the gv_bug.bundle that we built).

The “process memory map” is very long and I’ve truncated it to save space. It contains a list of every region of memory allocated for our Ruby program, the memory permissions (read, write, execute) and the binary that allocated the memory.

Debuggers

When our program crashes, it’s useful to pull out a C debugger. The debugger will allow us to inspect the stack, variables, and memory addresses. If you’re on macOS, your only option is LLDB. But if you’re on Linux, you can choose between GDB, LLDB, or, my personal favorite, rr. GDB and LLDB work very similarly on Linux, however, rr is special in that it allows us to run a program backwards! For example, rr allows us to set a watchpoint on the memory address of a crash and run the program backwards to see where that memory was changed.

Read the documentation on how to use these debuggers.

printf debugging

Of course, it’s always valid to put printf statements in the code to output state for debugging :)

Core dumps

We might not always be able to debug inside of a debugger due to various reasons such as a crash not reproducing in a debugger (the debugger changes the memory layout, so it may cause certain bugs to not trigger) or the crash may be happening on another machine (e.g in a production server). In these cases, we can configure the system to generate core dumps, which are files that contain state of the process at the time of a crash, including all of the memory contents.

You can enable core dumps in Linux or macOS by running the following command, which enables unlimited sized core dumps:

ulimit -c unlimited

The location where your core dumps are saved depends on your system.

Inspecting Ruby objects

Inspecting and understanding Ruby objects from a C debugger is tricky and requires understanding of how memory is structured in Ruby. We’ll first look at how to determine the type of the object and then cover how to inspect a few common types.

Determining the type of Ruby object

Let’s try determining the type of a Ruby object. Let’s try it on a simple hello world Ruby script:

$ lldb -- ruby -e "puts 'hello world'"

This will start LLDB, we can then set a breakpoint on the puts method:

(lldb) b rb_io_puts
Breakpoint 1: where = ruby`rb_io_puts + 44 at io.c:8921:9, address = 0x00000001000c9948

And then run the program:

(lldb) r
Process 35484 launched: '/Users/peter/src/ruby/build/ruby' (arm64)
Process 35484 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 2.1
    frame #0: 0x00000001000c9948 ruby`rb_io_puts(argc=1, argv=0x0000000108028038, out=4302084560) at io.c:8921:9
   8918     VALUE line, args[2];
   8919
   8920     /* if no argument given, print newline. */
-> 8921     if (argc == 0) {
   8922         rb_io_write(out, rb_default_rs);
   8923         return Qnil;
   8924     }
Target 0: (ruby) stopped.

Which will pause when we reach the puts method call. We can see that this call to puts only has a single argument (since argc is 1). Let’s inspect the first argument:

(lldb) p argv[0]
(const VALUE) $0 = 4317976800

This is the address of our "hello world" string, but pretend we don’t know that. Let’s find out what type this object is:

(lldb) p (enum ruby_value_type)((((struct RBasic *)argv[0])->flags) & ruby_value_type::RUBY_T_MASK)
(enum ruby_value_type) $1 = RUBY_T_STRING

We can see that it’s a RUBY_T_STRING type. Ok, there’s a lot of unpack here, so let’s talk about each part one by one.

  1. (struct RBasic *)argv[0]: we first cast the VALUE into a struct RBasic pointer. The RBasic struct is a header in all Ruby objects that contains metadata.
  2. (...)->flags: we read the flags field of the RBasic.
  3. (...) & ruby_value_type::RUBY_T_MASK: we then use bitwise AND to extract the type information from the flags.
  4. (enum ruby_value_type)(...): we then cast the type information to the enumerable ruby_value_type that contains all Ruby types. The definition of ruby_value_type with all of all Ruby types can be found here.

Inspecting common Ruby object types

The data structures and implementation of the data types are internal to Ruby, so they are subject to change and will vary between Ruby versions. You may need to read part of the Ruby source code to to help you debug. A few data types are shown here as examples and are accurate specifically for Ruby 3.2.

Arrays

There are two possible locations for the contents of arrays: embedded or heap allocated. To determine whether an array is embedded or heap allocated, check for the RARRAY_EMBED_FLAG flag. For example:

(lldb) p ((struct RBasic *)ary)->flags & ruby_rarray_flags::RARRAY_EMBED_FLAG
(unsigned long) $7 = 8192

Since the resulting value is non-zero, this array is embedded.

To get the length of an array, we can call the rb_array_len function. However, we can’t call functions when we’re inspecting core dumps since calling a function requires a live process. To manually get the length of an embedded array, we can use this expression: (((struct RBasic *)ary)->flags & ruby_rarray_flags::RARRAY_EMBED_LEN_MASK) >> ruby_rarray_flags::RARRAY_EMBED_LEN_SHIFT. To get the length of a heap allocated array, we can simply read the as.heap.len field: ((struct RArray *)ary)->as.heap.len.

To get the buffer that contains the contents of the array, we can call the rb_array_const_ptr function. This returns a VALUE * buffer that contains all the elements of the array. Again, if we’re debugging a core dump then we can’t call the function. To manually get the buffer for an embedded array, we can use this: (VALUE *)((struct RArray *)ary)->as.ary. To get the buffer for a heap allocated array, we can read the as.heap.ptr field: ((struct RArray *)ary)->as.heap.ptr.

Strings

Just like arrays, strings can be embedded or heap allocated. To determine whether a string is embedded or not requires the STR_NOEMBED macro. Unlike arrays, this is a macro and not a enum, which means that it’s gone in the compiled binary. This macro evaluates to the number 8192, so we’ll just have to use this magic number. To determine if a string is embedded or heap allocated, check for whether the bit 8192 exists in the flags. For example:

(lldb) p ((struct RBasic *)str)->flags & 8192
(unsigned long) $20 = 8192

Since the resulting value is non-zero, this string is embedded (note that this flag is used in the exact opposite way of the one for arrays, when this bit is set the string is embedded).

To get the length of the string, we can call the rb_str_length function.

TypedData

For TypedData objects, there are two main components: the rb_data_type_t struct that contains the configuration, and the pointer to the buffer that holds the data.

To get the rb_data_type_t, we can do the following:

(lldb) p *((struct RTypedData *)typeddata)->type
(const rb_data_type_t) {
  wrap_struct_name = 0x00000001000a3f10 "circular_buffer"
  function = {
    dmark = 0x00000001000a3b28 (circular_buffer_typeddata.bundle`circular_buffer_mark at circular_buffer_typeddata.c:21)
    dfree = 0x00000001000a3b88 (circular_buffer_typeddata.bundle`circular_buffer_free at circular_buffer_typeddata.c:34)
    dsize = 0x00000001000a3bb0 (circular_buffer_typeddata.bundle`circular_buffer_memsize at circular_buffer_typeddata.c:46:79)
    dcompact = 0x00000001000a3bc0 (circular_buffer_typeddata.bundle`circular_buffer_compact at circular_buffer_typeddata.c:50)
    reserved = ([0] = 0x0000000000000000)
  }
  parent = NULL
  data = 0x0000000000000000
  flags = 0
}

We can see the name of the TypedData object, and the mark, free, size, and compaction functions.

We can also get the pointer to the buffer that holds the data. However, note that the type is a void * since Ruby does not know the type, so we need to cast it to the type that it was allocated as:

(lldb) p *(struct circular_buffer *)((struct RTypedData *)typeddata)->data
(struct circular_buffer) {
  capacity = 1
  length = 0
  read_cursor = 0
  write_cursor = 0
  buffer = 0x000000013971abb0
}

Conclusion

In this article, we looked at how to read Ruby’s crash reports and ways to debug our native extension. Debugging is probably one of the most complex and time-consuming things to do in software development, as there are so many components that you need to understand in order to debug the issue. This article barely scratches the surface on how to debug crashes in Ruby. However, you need not be afraid. You’re well equipped with knowledge, and you’ll learn so much more through debugging issues. Good luck!