CYBERTEC Logo

Simple partitioning of live data with Postgres

10.2020 / Category: / Tags: |

Partitioning data is a well known way to optimize big tables into smaller ones using a given attribute. And while creating a partitioned table structure itself is not complicated, the migration of a table on a live system can be tricky. Our solution offers a simple way to solve this problem.

 

The usual way can be summarized the following:

  1. create a table and necessary triggers for the future partitioned data
  2. create an insert trigger on the existing live table that on insert/update/delete performs the same operation on the partitioned table, to maintain the data integrity
  3. copy the data from the live table to the partitioned table - likely in batches and in a transaction
  4. switch in the application code to the partitioned table or rename the tables
  5. delete the original live table

 

Disadvantage of this method

The disadvantage of the method above is that it is relatively complex; the triggers that secure data integrity must be tested. It also creates data redundancy, because during migration the data must be duplicated. It also requires a migration plan, and a change in the application when switching from the old table to the new one - and that may or may not be an issue.

During data migration the particular difficulty is: how to be certain that if a given batch of data is being migrated and entries of the same batch of data is updated or deleted, the correct values are going to be present in the new, partitioned table. It may mean some freeze of insert/updated during migration (see answer 13) or some logic that re-scans the data in the batch after migration for any changes and gives priority to the original table. At think project! we needed both to be sure that we have no data loss - that may not be an issue in some scenarios -, and that the downtime/service time is minimal, or the better, non existent.

Another solution

Below, we would like to present a simple and robust solution that does not require duplication of the data, nor downtime or any freeze during migration and overall it is a minimum intrusion to the system. The solution is using the original table as the mother table for the partitioning. The data is moved from the mother table using a BEFORE UPDATE trigger, updating an extra column of each row by some migration process. This update value is added only for this purpose. In case a row is still in the mother table when queried - Postgres will deliver it from the mother table before consulting a partition.

Below we demonstrate how to set up a basic partitioning, and after that our proposal for the migration process is discussed.

Preparation for partitioning

Let's take the source table that acts as our live table. The part_value will be the base of our partitioning.

Let's create the child tables for partitioning:

Now, add the insert trigger to the livedata table to forward the new inserts to the proper child tables:

Bind the function to a trigger:

Up to this point we have prepared the partitions and the insert trigger, following the standard way. As soon as the function is bound to a trigger, the mother table inserts will land in the child tables, and will be delivered from there. The existing data will be delivered from the mother table.

should show the new entry in the partition table, as expected.

Migration through UPDATE

Now, let's migrate the data from the mother table. Our logic uses an UPDATE trigger. For the controlled migration we use a value to update and for this we choose to add an extra BOOL column without default values. This column can also be handy for monitoring the migration process. The application/client can be unaware of it's existence. Without default values and indices this modification runs in 10-100ms even if the table is huge.

BEFORE UPDATE Trigger

Now let's add a BEFORE UPDATE trigger, so that it copies the values from the mother table to the partition and then performs an update on the copied value, setting the in_partition flag on the updated row. This will be our migration function. The migration will simply be an update on the rows on the mother table setting the in_partition flag to true.

The migration happens when there is any update on a not yet migrated row.

To migrate the values through some migration process, we use the in_partition BOOL flag. Eg. doing a batch migration by part_value simply execute the following:

The race conditions are going to be handled by the database itself. The application code requires no change, all the migration happens inside the database. The data is not duplicated during the process and the integrity is maintained by the database, and requires very little extra code. To check if the migration is finished, either the size of the source table should be checked using the PostgreSQL internals, or adding the following to the insert_trigger() directly after the IF ... BETWEEN statement we can set the in_partition attribute to true on each insert.

In this case there is no need to set this attribute to true in the migrate_data(). Using this logic if the following query returns no rows the source table is fully migrated:

 

 

This blogpost was written by László Forró (LinkedIn) and André Plöger (LinkedIn & Xing).

 

0 0 votes
Article Rating
Subscribe
Notify of
guest
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Cat Cat
Cat Cat
10 months ago

From the small amount I have understood and tried so far, I believe you can't have livedata be a non-partitioned table originally and then have it act as a partitioned table for the purpose of the migration and final use. When itis created it either needs to be a partitioned or non-partitioned table.. you can't ALTER TABLE PARTITION BY TYPE (COLUMN) to an existing table...
Am I missing something? Am I wrong? Can you point me to how to use your livedata table as a non-partitioned table and then have the child tables act as partitions for it?

laurenz
laurenz
10 months ago
Reply to  Cat Cat

I would say that the article is talking about inheritance partitioning, which is a possible, but outdated solution these days.

Cat Cat
Cat Cat
10 months ago

I see. Thanks for the clarification.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    3
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram