In database management, job scheduling has always played an important role. And for PostgreSQL, PgAgent and pg_cron are examples of job schedulers that are already popular. However, there is another job scheduler called pg_timetable, which is completely database-driven and provides a couple of advanced concepts. In this blog, we are going to highlight some of the main features, installation, and some use cases of pg_timetable.
Main features of pg_timetable
- Fully database-driven configuration
- Cron-style scheduling at the PostgreSQL server time zone
- Built-in tasks such as sending emails, etc.
- Full support for database-driven logging
- Tasks can be arranged in chains
- A chain can consist of built-in commands, SQL, and executables
- Parameters can be passed to chains/tasks
Installation
There are currently two options on how you can install and run pg_timetable.
- Container installation. For this, please refer to this page.
Note: The image mentioned on the above page is based on Alpine and not from Percona. - Local installation, i.e., build from sources. Here, we will discuss this approach.
2.1. Download and install GO on your system.
2.2. Clone pg_timetable repo
1 2 | $ git clone https://github.com/cybertec-postgresql/pg_timetable.git $ cd pg_timetable |
2.3. Run pg_timetable
1 | $ go run main.go --dbname=dbname --clientname=worker001 --user=scheduler --password=strongpassword |
2.4. Alternatively, build a binary and run it:
1 2 | $ go build $ ./pg_timetable --dbname=dbname --clientname=worker001 --user=scheduler --password=strongpassword |
Demonstration of job scheduling with pg_timetable
As a use case here, it will be shown how to use pg_timetable as a scheduler to schedule a job, which will refresh the materialized view every day at 12 midnight.
1. Download pg_timetable executable (Follow step 2 mentioned above in the installation section).
2. Make sure the PostgreSQL server is up and running and has a role with CREATE privilege for a target database, e.g.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | postgres=# CREATE ROLE scheduler PASSWORD '***********'; postgres=# GRANT CREATE ON DATABASE postgres TO scheduler; postgres=# CREATE TABLE t_demo (grp int, data numeric); CREATE TABLE postgres=# INSERT INTO t_demo SELECT 1, random() FROM generate_series(1, 5000000); INSERT 0 5000000 postgres=#CREATE MATERIALIZED VIEW mat_view AS SELECT grp, avg(data), count(*) FROM t_demo GROUP BY 1; SELECT 1 postgres=# ALTER MATERIALIZED VIEW mat_view OWNER TO scheduler; postgres=# GRANT SELECT ON mat_view TO scheduler; -bash-4.2$ psql psql (12.16) Type "help" for help. postgres=# SELECT * FROM mat_view; grp | avg | count -----+--------------------------+--------- 1 | 0.5001807958659610956005 | 5000000 (1 row) postgres=# INSERT INTO t_demo SELECT 2, random() postgres-# FROM generate_series(1, 5000000); INSERT 0 5000000 |
3. Create a new job to refresh the materialized view each night at 12:00 Postgres server time zone.
1 2 3 4 5 | postgres=# SELECT timetable.add_job('refresh-matview', '0 12 * * *', 'REFRESH MATERIALIZED VIEW public.mat_view'); add_job --------- 1 (1 row) |
4. Run the pg_timetable
1 2 3 4 5 6 7 8 9 10 11 12 13 | [centos@ip-172-31-32-10 pg_timetable]$ ./pg_timetable --dbname=postgres --clientname=worker001 --user=scheduler --password=******** 2023-09-09 11:59:20.929 [INFO] [sid:697146069] Starting new session... 2023-09-09 11:59:20.941 [INFO] Database connection established 2023-09-10 12:00:00.961 [INFO] Accepting asynchronous chains execution requests... 2023-09-10 12:00:00.970 [INFO] [count:0] Retrieve scheduled chains to run @reboot 2023-09-10 12:00:00.991 [INFO] [count:3] Retrieve scheduled chains to run 2023-09-10 12:00:00.994 [INFO] [count:0] Retrieve interval chains to run 2023-09-10 12:00:00.019 [INFO] [chain:1] Starting chain 2023-09-10 12:00:00.722 [INFO] [chain:1] [task:1] [txid:2613] Starting task 2023-09-10 12:00:00.074 [INFO] [chain:1] [task:1] [txid:2613] Starting task 2023-09-10 12:00:00.141 [INFO] [chain:1] [task:1] [txid:2613] Closing remote session 2023-09-10 12:00:00.141 [INFO] [chain:1] [task:1] [txid:2613] Task executed successfully 2023-09-10 12:00:00.185 [INFO] [chain:1] [txid:2613] Chain executed successfully |
During the first start of pg_timetable, the necessary schema timetable gets created. For reference, below is the catalog structure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | postgres=# dn List of schemas Name | Owner -----------+----------- public | postgres timetable | scheduler (2 rows) postgres=# set search_path to timetable ; SET postgres=# dt List of relations Schema | Name | Type | Owner -----------+----------------+-------+----------- timetable | active_chain | table | scheduler timetable | active_session | table | scheduler timetable | chain | table | scheduler timetable | execution_log | table | scheduler timetable | log | table | scheduler timetable | migration | table | scheduler timetable | parameter | table | scheduler timetable | task | table | scheduler (8 rows) |
5. From database logs, it can be observed that MATERIALIZED VIEW gets refreshed as per the schedule.
Output from DB logs:
1 | 2023-09-10 12:00:00 UTC [14334] LOG: statement: REFRESH MATERIALIZED VIEW public.mat_view |
From psql prompt:
1 2 3 4 5 6 7 8 9 10 | -bash-4.2$ psql psql (12.16) Type "help" for help. postgres=# SELECT * FROM mat_view; grp | avg | count -----+---------------------------+--------- 1 | 0.5001807958659610956005 | 5000000 2 | 0.50000009110202547559215 | 5000000 (2 rows) |
Below is the output from the pg_timetable catalog tables.
1 2 3 4 5 | postgres=# select * from active_session ; client_pid | server_pid | client_name | started_at ------------+------------+-------------+------------------------------ 697146069 | 20137 | worker001 | 2023-09-10 11:59:205672+00 (1 row) |
1 2 3 4 5 6 7 8 9 10 11 12 | postgres=# select * from chain; -[ RECORD 1 ]-------+-------------------- chain_id | 1 chain_name | refresh-matview run_at | 0 12 * * * max_instances | timeout | 0 live | t self_destruct | f exclusive_execution | f client_name | on_error | |
1 2 3 4 5 6 7 8 9 10 11 12 13 | postgres=# select * from execution_log where chain_id=1; -[ RECORD 01 ]------------------------------------------------------------------- chain_id | 1 task_id | 1 txid | 2613 last_run | 2023-09-10 12:00:00.137404+00 finished | 2023-09-10 12:00:00.586543+00 pid | 697146069 returncode | 0 kind | SQL command | REFRESH MATERIALIZED VIEW public.mat_view output | REFRESH MATERIALIZED VIEW client_name | worker001 |
In conclusion, we can say that pg_timetable is open source and can be used freely by everyone. The main advantages include that pg_timetable is an independent process written in GO, which connects to PostgreSQL just like any other client program. So, if the scheduler crashes, it will not harm your server. Pg_timetable provides a variety of built-in tasks that help you to flexibly combine those operations in an easy way. Further, pg_timetable has been implemented in GO and, therefore, comes as ONE executable that can be started directly. So, there is no need to worry about libraries or dependencies during the installation.
Please refer to the links below to learn more about pg_timetable.
https://github.com/cybertec-postgresql/pg_timetable
https://www.cybertec-postgresql.com/en/products/pg_timetable/
https://pg-timetable.readthedocs.io/en/master/README.html