2023-09-29

Ruby enumerables considered helpful

Ruby's Enumerable methods help you make powerful code simple — by filtering, transforming, and processing data like the best engineers do. These methods are available on Arrays, Hashes, and many (many) other objects, and similarly-named methods are available on even more. If you don't already know these methods well, then the most valuable time you can spend in Ruby is on mastering them.

But not all data can take advantage of Enumerable methods, at least not directly. What if you don't have an Array or Hash? Can you still use the Enumerable methods? With all the time you spent becoming an Enumerable master, wouldn't it be nice if you could treat more things like Enumerables — like lists of items across multiple pages, or even items that slowly stream in from an API over time?

Ruby's Enumerator class is exactly what you need to do this. It's not the easiest thing to understand, but a few examples can show you how to use it in your own applications.

Automatic pagination

Sometimes your data seems like a natural fit for an Enumerable, but it doesn't quite work. What if you're using an API that paginates? The best you can get is an Enumerable per page, which makes it annoying to work with multiple pages at once. You could fetch all the pages and then flatten the list of lists… But now you're always fetching all the data (even if you don't need it) and you have to wait for every page to load before you can do anything.

Instead, wrap your API calls in an Enumerator:

def paginated_list(initial_url, params = {})
  url = initial_url

  Enumerator.new do |yielder|
    # ...
  end
end

This returns an Enumerator, an object that can be used like an Enumerable, responding to all of the core Enumerable methods you know and love: map, filter, first, and all the rest.

When you call a method like map on that Enumerator, that method will ask for the next object as it needs it. When asked, the body of the Enumerator yields the next object using the yielder:

def paginated_list(initial_url, params = {})
  url = initial_url

  Enumerator.new do |yielder|
    loop do
      response = get(url, params)

      body = response.body

      # Yield each result to the caller
      body["results"].each { |result| yielder.yield(result) }

      break if last_page?(body)

      # get ready to fetch the next page
      url = next_page_url(body)
    end
  end
end

This Enumerator makes the request for the first page. Then, any time a record is needed from the first page, the Enumerator yields it to the caller. When you run out of records on the first page, the Enumerator fetches the next page and so on until you run out. If you don't need the next page, it won't fetch it.

How does this look for the caller?

paginated_list("/things").filter { |t| t["selected"] == "true" }.sort_by { |t| t["name"].downcase }

Easy — you don't even have to care that the data spans multiple pages across multiple API requests. From the caller's point of view, it's just an ordinary list.

Save a block for later

While working on the Aha! AI writing assistant, we had two different ways of working with AI responses. For development and testing, the response should be returned all at once because it is easier to work with. For a user, though, it feels better for the response to arrive incrementally — streaming in so you can know early on if you're getting the result you wanted.

Everything else was exactly the same, except for how the response was dealt with. In one case, a string was returned. In another, bits of a string were yielded to the caller as they came in:

client = Client.new(stream: stream?)

# Burst response
response = client.get(query: params[:query], context: params[:context] ...)
render json: { text: response }

# Stream response

# all kinds of streaming setup work
client.get(query: params[:query], context: params[:context] ...) do |chunk|
  # stream chunk to browser
end
# all kinds of streaming teardown work, stream error handling, etc.

This is fine. But the API method calls look exactly the same, except one takes a block and the other doesn't. And for streaming, you have extra work to do before and after the method call.

It's hard to untangle this. You could pass around lambdas, get could ignore a lambda parameter in stream mode, you could wrap the setup and teardown in its own block:

client = Client.new(stream: stream?)

response = streaming_setup(stream?) do
  client.get(query: params[:query], context: params[:context]) do |chunk|
    # stream chunk to browser
  end
end

render json: { text: response } unless stream?

Maybe this will work. Again, it's fine. But this is Ruby, so we can do better than fine.

Object#to_enum is a method available on any object. You give it a method name and arguments and it returns an Enumerator. Whenever that Enumerator is enumerated, it will call the method you gave it, on the object you called it on, with the arguments you gave it, and it will pass along anything your method yields as the next value of the Enumerator.

At first, it's hard to see how this is useful. In practice, this means you can capture the arguments a method was called with, but defer needing the block the method yields to until the part that processes your data needs it.

Using to_enum, the caller can now look like this:

def show
  client = Client.new(stream: stream?)

  response = client.get(query: params[:query], context: params[:context])

  if stream?
    stream_response response
  else
    render json: { text: response }
  end
end

def stream_response(response)
  # all kinds of setup work
  response.each do |chunk|
    # stream each chunk to browser as it arrives
  end
  # all kinds of teardown work, stream error handling, etc.
end

client.get now returns either a string or an Enumerator. If it's an Enumerator, it will yield each part of the response as it's received. Because this code no longer needs the block at the same time as the arguments, all of the logic around streaming a response can go somewhere else. This is much less tangled up than the last version. How is this possible? By using to_enum in the client when it's not given a block:

class Client

  def initialize(stream:)
    @stream = stream
  end

  def get(query:, context:, &block)
    if @stream
      get_stream(query, context, &block)
    else
      get_burst(query, context)
    end
  end

  def get_stream(query, context, &block)
    return to_enum(:get_stream, query, context) unless block_given?

    some_api_request do |chunk|
      # ...
      yield chunk
    end
  end

  def get_burst(query, context)
    # nothing interesting here, just a basic request and response
    response
  end
end

If the client is in streaming mode and isn't given a block, get_stream returns an Enumerator that calls get_stream with a block. The yielded chunk is passed back through to the Enumerator when the caller, or any other code that the caller hands the Enumerator to, uses it.

This pattern — decoupling the place where you need the arguments from the place where you have the block — becomes even more powerful the further apart those places are separated. It's a nice pattern to understand, and it's possible because of Enumerators.

Enumerators can be tricky at first. It's hard to see the use when you read about them in the docs. But they have an amazing ability to wrap Ruby code into objects that take advantage of the full power of Enumerable methods. It's valuable to use the knowledge you already have in more places, and Enumerators will help you do exactly that.

Ruby enumerables considered helpful

Automatic pagination

Save a block for later

Justin Weiss