Background

It’s no secret that Ruby on Rails is a great web framework to build something fast. However, with mature mid-size to large web applications come more responsibilities and concerns.

Prior to Rails 5.1 both primary and foreign keys by default were set to a 32-bit integer type. When the application grows, the database has to store more data. That means sooner or later every developer starts wondering what to do when the table hits the magical limit number. Luckily, with Rails 5.1 primary keys were defaulted to big integers

On the other hand, there is always a chance that some popular gems in the Rails community have not caught up with the new change. One of such gems we encountered was audited gem. We use this gem to log changes of some of our models in our Postgres database.

The Challenge

The audited gem stores data into a polymorphic audits table where auditable_id column represents primary key of the certain model we want to track and auditable_type column represents the class name of that model.

At the time of this writing, auditable_id column is an int type. There is an open PR to make necessary changes to the gem for switching columns from int to the bigint type but, for now, it hasn’t been merged yet.

We had to make this change ourselves as we store and track more and more data. The challenge here, however, was that our audits table became large enough to start thinking about uninterrupted zero downtime deployment.

The Solution

If we followed the standard approach for changing a column type in the table we could use the change_column method and go on with our day.

change_column :audits, :auditable_id, :bigint

But this approach is risky to deploy to production.

We strive to follow the best practices and have uninterrupted deployments. To achieve that we also use the strong_migrations gem.

According to the strong_migrations gem, using the change_column method blocks reading and writing on the entire table. This method re-writes the whole table. As suggested by the author of the gem, the better approach would be to follow multiple steps.

After doing our research, we decided to introduce a 5 step approach to avoid production outages during our deployments.

First deployment

  1. Create a temporariy column temp_auditable_id
  2. Write data to both columns: auditable_id and temp_auditable_id

Second deployment

  1. Route the auditable_id attribute to read/write to the temp_auditable_id column, with read fallback on auditable_id and ultimately on  _auditable_id column, which will be introduced in the third deployment.
  2. Remove the  before_save callback that populates data to  temp_auditable_id column from auditable_id.

Third deployment

  1. Rename auditable_id column to _auditable_id.
  2. Rename temp_auditable_id column to auditable_id. After the third deployment, verify in the production database that _auditable_id column has the exact same values as auditable_id.

Fourth deployment

  1. Ignore _auditable_id column with the ignored_columns method.

Fifth deployment

  1. Remove other monkey-patches from Audited::Audit deployment.
  2. Remove _auditable_id column.

Now that we have a plan, let’s discuss each part in detail.

Step 1

The first step would be to create a new temporary column and write data to both auditable_id and temp_auditable_id columns.

To achieve this we added temp_auditable_id column to the audits table.

disable_ddl_transaction!

def change
  add_column :audits, :temp_auditable_id, :bigint
  add_index :audits, [:auditable_type, :temp_auditable_id, :version], name: "temp_auditable_index", algorithm: :concurrently
end

We also had to monkey-patch the gem to add a before_save callback to be able to store new data into the temporary column.

# frozen_string_literal: true

module AuditTempColumns

  extend ActiveSupport::Concern

  included do
    before_save :save_to_temp_columns
  end

private

  def save_to_temp_columns
    self.temp_auditable_id = auditable_id
  end

end

Audited::Audit.include(AuditTempColumns)

Last but not least, we had to backfill the existing data into the temporary column. That could be achieved in multiple ways. We chose to do it via a job and call this job from the migration.

class MaintenanceTasks::AuditsBackfillingJob < ApplicationJob

  queue_as :default

  def perform
    ::Audited::Audit.where(temp_auditable_id: nil).in_batches do |records|
      records.update_all("temp_auditable_id = auditable_id")
      sleep(0.01) # throttle
    end
  end

end

This is how our migration changed.

 disable_ddl_transaction!
 
 def up
    add_column :audits, :temp_auditable_id, :bigint
    add_index :audits, [:auditable_type, :temp_auditable_id, :version], name: "temp_auditable_index", algorithm: :concurrently
    
    MaintenanceTasks::AuditsBackfillingJob.set(wait: 2.hours).perform_later
  end

  def down
    remove_index :audits, name: "temp_auditable_index"
    remove_column :audits, :temp_auditable_id, if_exists: true
  end

It is a good idea to check that values in both columns are the same.

We can run the following command in our Rails console after the deployment is complete and data is backfilled.

Audited::Audit.where("temp_auditable_id <> auditable_id").count

Once we deployed Step 1 and made sure that data was backfilled properly, it was time to move on to the Step 2 of our plan.

Step 2

For the second deployment we had to do the following:

  1. Route the auditable_id to fallback to temp_auditable_idif it exists. If the temp_auditable_id  column does not exist, fall back to either auditable_id or _auditable_id columns.
  2. Remove the before_save callback that populates data to temp_auditable_id column from auditable_id. Instead we have to introduce custom getters and setters to ensure the integrity of the stored data.
# frozen_string_literal: true

module AuditTempColumns

  extend ActiveSupport::Concern

  # Reading data
  def auditable_id
    self[:temp_auditable_id] || self[:_auditable_id] || super
  end

  # Writing data
  def auditable_id=(value)
    self[:temp_auditable_id] = value if temp_auditable_id_column_exists?
    self[:_auditable_id]     = value if _auditable_id_column_exists?

    super if auditable_id_column_exists?
  end

  def temp_auditable_id=(value)
    self.auditable_id = value
  end

  def _auditable_id=(value)
    self.auditable_id = value
  end

private

  def temp_auditable_id_column_exists?
    ActiveRecord::Base.connection.column_exists?(:audits, :temp_auditable_id)
  end

  def _auditable_id_column_exists?
    ActiveRecord::Base.connection.column_exists?(:audits, :_auditable_id)
  end

  def auditable_id_column_exists?
    ActiveRecord::Base.connection.column_exists?(:audits, :auditable_id)
  end

end

Audited::Audit.include(AuditTempColumns)

We added ability to read and write data to _auditable_id in this deployment in advance so that the code changes get to production before the further updates in the migrations.

Step 3

Once Step 2 is deployed, we follow up on renaming the columns. Here we have to make next changes:

  1. Rename auditable_id column to _auditable_id.
  2. Rename temp_auditable_id column to auditable_id.

We broke down this task into 2 subtasks. In the first step we renamed the columns.

 def change
    safety_assured do 
      rename_column :audits, :auditable_id, :_auditable_id
      rename_column :audits, :temp_auditable_id, :auditable_id
    end
  end

In the second step we renamed the indexes.

  def change
    rename_index :audits, 'auditable_index', '_auditable_index'
    rename_index :audits, 'temp_auditable_index', 'auditable_index'
  end

We can run this command to verify that both auditable_id and _auditable_id columns have the same values:

Audited::Audit.where("_auditable_id <> auditable_id").count

Step 4

Next, we need to prepare our code for removing _auditable_id column. It is safer to ignore the column before completely removing it. At this point we also do not need to use any custom getters and setters.

# frozen_string_literal: true

module AuditTempColumns

  extend ActiveSupport::Concern

  included do
    self.ignored_columns = ["_auditable_id"]
  end

end

Audited::Audit.include(AuditTempColumns)

Step 5

After deploying all 4 parts, we made it to the end! This is the last step of this journey and it includes the following actions:

  1. Removing  _auditable_id column.
  2. Removing monkey-patches from Auditable::Audit.

We need to completely delete AuditTempColumns file and add one last migration.

def change
  # This column is ignored in the app
  safety_assured { remove_column :audits, :_auditable_id, :int }
end

Conclusion

Applying any change to existing large tables requires additional attention and a carefully thought out deployment plan.

I try to follow the best practices described in the strong_migrations gem when I write new migrations. Having multiple deployments for a small task such as changing a column type can seem as a big amount of work.

At the same time, it’s always better to keep in the back of our minds that one small migration can cause an outage to the application in producation and create an unpleasant experience for the end users.