Implementing ASOF Joins in PostgreSQL and Timescale

What Is an `ASOF` Join?

An ASOF (or "as of") join is a type of join operation used when analyzing two sets of time-series data. It essentially matches each record from one table with the nearest—but not necessarily equal—value from another table based on a chosen column. Oracle supports this out of the box using a non-standard SQL syntax, but unfortunately, PostgreSQL does not provide a built-in ASOF keyword.

The chosen column needs to have some concept of range for the ASOF operation to work. You may think of it as being the "closest value," but not exceeding the comparison. It works for string (alphabetical), integer (ordinal), float (decimal), and any other data type that has an idea of ORDER. Because timestamps are near and dear to our hearts at Timescale, we will demonstrate with time and date columns.

✨

Want to understand how the PostgreSQL parser picks a join method or join types? Check out this article!

Performing this operation in PostgreSQL takes a bit of effort. This article aims to delve deeper into ASOF-style joins and how to implement similar functionality in PostgreSQL by subselecting data or other join types.

Understanding `ASOF` Joins

ASOF joins are a powerful tool when dealing with time-series data. In simple terms, an ASOF join will, for each row in the left table, find a corresponding single row in the right table where the key value is less than or equal to the key in the left table.

This is a common operation when dealing with financial data, sensor readings, or other types of time-series data where readings might not align perfectly by timestamp.

For a simple example, consider the real-world question, "What was the temperature yesterday at this time?" It is very unlikely that a temperature reading was taken yesterday at exactly the millisecond that the question is asked today. What we really want is "What was the temperature taken yesterday up to today's time stamp?"

This simple example becomes a lot more complex when we start comparing temperatures day over day, week over week, etc.

Implementing `ASOF` Joins in Timescale

Even though PostgreSQL does not directly support ASOF joins, you can achieve similar functionality using a combination of SQL operations. Here's a simplified step-by-step guide:

Step 1: Prepare your data

Ensure your data is in the correct format for the ASOF join. You'll need a timestamp or other monotonically increasing column to use as a key for the join.

Suppose you have two tables, bids and asks, each containing a timestamp column, and you want to join them by instrument and the nearest timestamp.

CREATE TABLE bids (
    instrument text,
    ts TIMESTAMPTZ,
    value NUMERIC
);
--
CREATE INDEX bids_instrument_ts_idx ON bids (instrument, ts DESC);
CREATE INDEX bids_ts_idx ON bids (ts);
--
CREATE TABLE asks (
    instrument text,
    ts TIMESTAMPTZ,
    value NUMERIC
);
CREATE INDEX asks_instrument_ts_idx ON asks (instrument, ts DESC);
CREATE INDEX asks_ts_idx ON asks (ts);
--

Normally you'd make both these tables into hypertables with the create_hypertable function (because you're a super educated Timescale user), but in this case, we aren't going to, as we won't be inserting much data (and we also have some Timescale magic to show off 🪄).

Step 2: Insert some test data

Next, we'll create data for four instruments, AAA, BBB, NCD, and USD.

INSERT INTO bids (instrument, ts, value)
SELECT 
   -- random 1 of 4 instruments
  (array['AAA', 'BBB', 'NZD', 'USD'])[floor(random() * 4 + 1)], 
   -- timestamp of last month plus some seconds
  now() - interval '1 month' + g.s, 
   -- random value
  random()* 100 +1
FROM (
  -- 2.5M seconds in a month
  SELECT ((random() * 2592000 + 1)::text || ' s')::interval s 
  FROM generate_series(1,3000000)) g;
INSERT INTO asks (instrument, ts, value)
SELECT 
   -- random 1 of 4 instruments
  (array['AAA', 'BBB', 'NZD', 'USD'])[floor(random() * 4 + 1)], 
   -- timestamp of last month plus some seconds
  now() - interval '1 month' + g.s, 
   -- random value
  random()* 100 +1
FROM (
  -- 2.5M seconds in a month
  SELECT ((random() * 2592000 + 1)::text || ' s')::interval s 
  FROM generate_series(1,2000000)) g;

Step 3: Query the data using a sub-select

To mimic the behavior of an ASOF join, use a SUBSELECT join operation along with conditions to match rows based on your criteria. This will run the sub-query once per row returned from the target table. We need to use the DISTINCT clause to limit the number of rows returned to one.

This will work in vanilla Postgres, but when we are using Timescale (even though we aren't using hypertables yet), we get the benefits of a Skip Scan, which will supercharge the query (for more information on this check our docs or blog post about how Skip Scan can give you an 8,000x speed-up).

SELECT bids.ts timebid, bids.value bid,
    (SELECT DISTINCT ON (asks.instrument) value ask
    FROM asks
    WHERE asks.instrument = bids.instrument
    AND asks.ts <= bids.ts
    ORDER BY instrument, ts DESC) ask
FROM bids
WHERE bids.ts > now() - interval '1 week'

                              QUERY PLAN                                                                               
-------------------------------------------------------------------------
 Index Scan using bids_ts_idx on public.bids  
    (cost=0.43..188132.58 rows=62180 width=56) 
    (actual time=0.067..1700.957 rows=57303 loops=1)
   Output: bids.instrument, bids.ts, bids.value, (SubPlan 1)
   Index Cond: (bids.ts > (now() - '7 days'::interval))
   SubPlan 1
     ->  Unique  (cost=0.43..2.71 rows=5 width=24) 
                (actual time=0.027..0.029 rows=1 loops=57303)
           Output: asks.value, asks.instrument, asks.ts
           ->  Custom Scan (SkipScan) on public.asks  
                  (cost=0.43..2.71 rows=5 width=24) 
                  (actual time=0.027..0.027 rows=1 loops=57303)
                 Output: asks.value, asks.instrument, asks.ts
                 ->  Index Scan using asks_instrument_ts_idx on public.asks  
                        (cost=0.43..15996.56 rows=143152 width=24) 
                        (actual time=0.027..0.027 rows=1 loops=57303)
                       Output: asks.value, asks.instrument, asks.ts
                       Index Cond: ((asks.instrument = bids.instrument) 
                          AND (asks.ts <= bids.ts))
 Planning Time: 1.231 ms
 Execution Time: 1703.821 ms

Conclusion

While PostgreSQL does not have an ASOF keyword, it does offer the flexibility and functionality to perform similar operations. When you're using Timescale, things only get better with the enhancements like Skip Scan.

Ingest and query in milliseconds, even at terabyte scale.

This post was written by

Implementing ASOF Joins in PostgreSQL and Timescale

What Is an `ASOF` Join?

Understanding `ASOF` Joins

Implementing `ASOF` Joins in Timescale

Step 1: Prepare your data

Step 2: Insert some test data

Step 3: Query the data using a sub-select

Conclusion

Related posts

How to Build LLM Applications With pgvector Vector Store in LangChain

PostgreSQL as a Vector Database: A pgvector Tutorial

Implementing ASOF Joins in PostgreSQL and Timescale

What Is an ASOF Join?

Understanding ASOF Joins

Implementing ASOF Joins in Timescale

Step 1: Prepare your data

Step 2: Insert some test data

Step 3: Query the data using a sub-select

Conclusion

Related posts

What Is an `ASOF` Join?

Understanding `ASOF` Joins

Implementing `ASOF` Joins in Timescale