Rake Does What?: A Debugging Story

The Mystery

While working on upgrading one of our apps to Rails 5 I noticed that suddenly migrations were failing with the following error:

StandardError: An error has occurred, all later migrations canceled:
wrong number of arguments (given 2, expected 1)

The migration it was failing on just had something like this:

class LeakyMigration < ActiveRecord::Migration[4.2]
  results = select_one <<~SQL
    SELECT 1 FROM table_name
  SQL

  if results
    raise "Panic!"
  end
end

Digging into the ActiveRecord 4.2 and 5.0 documentation it became pretty clear that there wasn't a change to the method signature for select_one, so what gives? Looking at the stack trace I noticed that the method call was going through one of our private gems - specifically through a rake task we have defined there. Huh?

The Investigation

I opened the rake task and found something like the following:

namespace :db do
  task :leaky_task do
    min_db_schema_version
    ...Other things
  end

  def select_one
    ActiveRecord::Base.connection.select_one(ActiveRecord::Base.send(:sanitize_sql, sql, "NONE"))
  end
  
  def min_db_schema_version
    select_one('SELECT min(version) AS version FROM schema_migrations')['version']
  end
end

A quick check found that the method signature for sanitize_sql_for_conditions had changed. An optional second parameter was no longer supported, so it was understandably upset when we tried to give it a second parameter.

But why is this select_one being called during the migration anyways? Do methods defined in a namespace get shared across the namespace? While perhaps a bit inconvenient, that wouldn't be completely unreasonable. A quick check eliminated that possibility. Even with a different namespace the exception was getting raised during the migration.

It couldn't possibly be defining the method on Object, could it?

The Experiment

Well, a bit of quick investigation showed me two things:

You can experiment with this by creating a new empty rake task, defining a method foo and then starting a Rails console. Try something like User.send(:foo) and you’ll get a NoMethodError - so far so good, but now load your application’s rake tasks (which aren’t loaded by default) by calling AppName::Application.load_tasks and call User.send(:foo) again.

It runs! Madness!

These two things alone are worth digging into more, but first I think we should answer the question - why is our migration calling our select_one rather than the one provided by ActiveRecord?

In order to find out I commented out the offending rake task and added a raise statement to the ActiveRecord definition of select one. Re-running the migration yielded a stack trace that highlighted something very interesting...

The Finale

All those nice little sql methods like select_one() and update() aren't defined in ActiveRecord::Migration or any of its parents, they are all defined on the connection object. ActiveRecord::Migration defines a custom method_missing implementation that forwards the call to the connection object.

And as you remember, Rake is defining all its tasks as methods on Ruby's main object, which defines them on Object, which defines them nearly everywhere.

Unfortunately these two things combined lead to trouble. When rake tasks are loaded (which they are when they are running) these methods defined on main get defined on every object. Then, when we call select_one in the migration it sends the call to itself which it now knows how to respond to since it inherits from Object.

You might reasonably ask, "how does defining a method on 'main' define it everywhere?" You can check that it does do that for yourself. But next week, we'll dig down into how that happens and why it works.