Taking control of YAML loading — Or what happened to my ActionController::Parameters YAML

In Rails 5 ActionController::Parameters no longer inherits from Ruby’s Hash. This broke loading YAML serialized from older Rails versions. Let’s find out why!

First, here’s what the serialization some params looks like in Rails 5. We’ll be using YAML.dump ActionController::Parameters.new(key: :value) throughout.

--- !ruby/object:ActionController::Parameters
parameters: !ruby/hash:Hash::WithIndifferentAccess
  key: :value
permitted: false

Here parameters and permitted are just instance variables from the params class.

This format isn’t the same in Rails 4.2 because the params itself was a Hash::WithIndifferentAccess. Look:

--- !ruby/hash-with-ivars:ActionController::Parameters
elements:
  key: :value
ivars:
  :@permitted: false

Looks pretty similar!

The YAML parser has correctly noticed that the params is a hash subclass with ivars and used its standard format for hash subclasses.

But if we try to load the 4.2 YAML on Rails 5 it blows up. When YAML sees hash-with-ivars it tries to revive the params as it would any other hash subclass. The YAML parser will allocate the ActionController::Parameters class and then use []= to assign values. Which would be fine, except the param’s initialize has never been called. Guess what happens here:

# actionpack/lib/action_controller/metal/strong_parameters.rb
# Not actual implementation but paraphrased to make it easier to gulf down.
def initialize(params = {})
  @parameters = params
  @permitted  = false
end

def []=(key, value)
  @parameters[key] = value # BOOM. If only `initialize` had been called!
end

We’ll get a nice exception for that. To fix this, we will need to know more about how YAML works under the hood.

In Ruby, YAML is implemented with the Psych library that is bundled alongside Ruby. Whenever you call YAML.load or any other method, Psych steps in and does the work.

When loading, Psych will first parse the YAML syntax into a tree of nodes it can work with. If you haven’t heard of a tree before, it’s just objects that have references to each other. There’s a root, it can have many children, and its children can have many children and so on.

Once, YAML has its tree structure, it will visit each node (the word for an object in the tree) and revive it.

The good news is YAML let’s us hook into it whenever we load the tree (same goes for dumping). We just need to give it two pieces of information.

  1. Which class should I use for this node?
  2. What’s the implementation for that?

To satisfy 1. we need to insert an element into YAML.load_tags. Nodes are referenced with a specific tag. You’ve seen them already and !ruby/hash-with-ivars:ActionController::Parameters is the one we need. So we tell YAML:

YAML.load_tags['!ruby/hash-with-ivars:ActionController::Parameters'] = 'ActionController::Parameters'

Then the parser will allocate an ActionController::Parameters when it sees that tag and will let us override the initialization routine if we implement an init_with method.

That method passes in a coder with the tag that we’re currently initializing and a Hash map of what the YAML data was. For the hash-with-ivars example the map would be

def init_with(coder)
  coder.tag # => '!ruby/hash-with-ivars:ActionController::Parameters'
  coder.map['elements'] # => { 'key' => :value }
  coder.map['ivars']    # => { @permitted: false }
end

That gives us everything we need to replicate the missing setup from initialize, and when done correctly ActionController::Parameters YAML from Rails 4.x can be loaded without errors on Rails 5.

There were several complications to this, including a second format depending on the Psych version you used. To learn more, here’s the original pull request, https://github.com/rails/rails/pull/26017, and here’s all the commits for details: https://github.com/rails/rails/compare/6b44155^…70b995a

This will ship in Rails 5.0.1, coming somewhat soon!