DEV Community

Cover image for Rails: migration: Adding a unique Index and deleting Duplicates
chowderhead
chowderhead

Posted on

Rails: migration: Adding a unique Index and deleting Duplicates

Cover image Credit: @thesollers

Assumptions

  • you understand rails framework
  • you have duplicates in your database already
  • you have existing records in your database that are duplicates and in violation of ActiveRecord::UniqueRecord

Problem:

We found that a method in one of our controllers was using find_or_create_by , and according to this article this method does not use locking, so what can happen if two writes are happening simultaneously, then duplicates are created in the database.

This was conflicting with our business logic, but were harmless, but nonetheless bloating our tables


# for brevity this table will be called table_1

# Indexes
#
#  index_on_specific_table_1  (something_id)
#
# Foreign Keys
#
#  fk_rails_...  (something_id => random_record.id)
#

As you can see in this contrived example we have a index already on the column something_id , as well as an index.

Solution

The solution as laid out by my boss was two fold.

a. create a migration that would delete all duplicate instances
b. add a UNIQUE index field to the record
c. rescue the exception that was thrown for ActiveRecord::RecordNotUnique, and retry

Method

Inside of the migration the first task was to find a way to destroy all duplicates before running applying the UNIQUE index key to the column in the table, because if we tried to remove the index , the existing duplicates would throw an error and fail the migration.

so in order to clean up duplicates in the tables , i found this nifty sQL command that took care of all the heavy lifting, from a Wiki Page of all places!




    ActiveRecord::Base.connection.execute('
      DELETE FROM table_1
      WHERE id IN (
          SELECT
              id
          FROM (
              SELECT
                  id,
                  ROW_NUMBER() OVER w AS rnum
              FROM table_1
              WINDOW w AS (
                  PARTITION BY something_id
                  ORDER BY id
              )

          ) t
      WHERE t.rnum > 1);
      ')

This SQL statement will delete all rows in violation of not having a unique something_id field. you can also add more columns to this by comma seperation, if you have multiple fields, it would be :

PARTITION BY something_id, something2_id, something3_id

Now the next step is to add a unique index, but we have some issues , because episode_id already has an index on place on it. Not only that but it also has a foreign key assignment.

You cannot modify an index on a table if a foreign key is on that column , and if you currently have an index on a column, you have to remove it before adding a new UNIQUE index.

so in order to add the unique index we must first remove the foreign key assignment from the column, which we will add back later:

      remove_foreign_key :table_1, column: :something_id

then after the foreign key is removed we can now remove the existing (non-unique) index that is on episode_

But instead of just removing it, we will rename it to be extra safe.

      rename_index :table_1, :index_on_specific_table_1, :temp_non_unique_index

so in the above code you can see that we are renaming :index_on_specific_table_1 to :temp_non_unique_index.

the reason why we want to rename, instead of straight removing at this point is because the add_index will take a long time, and it could sometimes fail depending on the size of the table.

if the table size is huge it is possible it could add_index could fail during migration, and if we had used remove_index before it , then we would be left with a column with no index on it at all, and this could change a request that takes only milliseconds, to possible taking minutes.

then, after we rename the table, we add an index to the table:

      add_index :table_1, :something_id, unique: true, name: 'unque_index_table_1`

this will add a unique index to the specific column in the specific table and then it will give it . a specific name: unique_index_table_

After this is all said and done, we will then remove our temporary index: unique_index_table_

remove_index(:table_1, :name => 'temp_non_uniq_index')
      add_index :table_1, :episode_id, unique: true

now all togther it should looks something like this:


ActiveRecord::Base.connection.execute('
      DELETE FROM table_1
      WHERE id IN (
          SELECT
              id
          FROM (
              SELECT
                  id,
                  ROW_NUMBER() OVER w AS rnum
              FROM table_1
              WINDOW w AS (
                  PARTITION BY something_id
                  ORDER BY id
              )

          ) t
      WHERE t.rnum > 1);
      ')



      remove_foreign_key :table_i1, column: :something_id
      rename_index :table_1, :index_on_episode_id, :temp_non_uniq_index
      add_index :table_1, :something_id, unique: true, name: 'unique_index_on_table_1_episode_id'
      remove_index(:table_1, :name => 'temp_non_uniq_index')
      add_foreign_key :table_1, :episodes
      add_index :table_1, :something_id, unique: true




Sources: (all my open tabs)

https://medium.com/@josh_works/rails-migration-when-you-cant-add-a-uniqueness-constraint-because-you-already-have-duplicates-352a370e4b54

https://thoughtbot.com/blog/the-perils-of-uniqueness-validations

https://wiki.postgresql.org/wiki/Deleting_duplicates

https://stackoverflow.com/questions/19520298/cannot-remove-an-index-from-table-in-rails-4-and-psql-9-3

https://thoughtbot.com/blog/how-to-create-postgres-indexes-concurrently-in

https://stackoverflow.com/questions/20467165/how-to-add-unique-constraint-to-already-existing-index-by-migration

https://medium.com/@craigsheen/rails-migrations-can-do-more-than-you-think-bf1e40404714

Top comments (1)

Collapse
 
michalbryxi profile image
Michal Bryxí

Thank you! The snippet for finding duplicate rows is super helpful 🙇