Create Dynamic Views with jOOQ 3.17’s new Virtual Client Side Computed Columns

One of jOOQ 3.17‘s coolest new features are client side computed columns. jOOQ 3.16 already added support for server side computed columns, which many of you appreciate for various reasons.

What’s a computed column?

A computed column is a column that is derived (“computed”) from an expression. It cannot be written to. It works like any column in a view. There are two types of computed columns:

  • VIRTUAL computed columns, which are computed “on read”
  • STORED computed columns, which are computed “on write”

Some SQL dialects use these exact terms to distinguish between the two features. Some dialects support both of them, some only support one of them.

Some typical use-cases for server side computed columns include:

CREATE TABLE customer (
  id BIGINT NOT NULL PRIMARY KEY,
  first_name TEXT NOT NULL,
  last_name TEXT NOT NULL,
  full_name TEXT GENERATED ALWAYS AS 
    (first_name || ' ' || last_name) STORED
);

Now, try to insert some data into this table:

INSERT INTO customer (id, first_name, last_name)
VALUES (1, 'John', 'Doe')
RETURNING *;

And you will get:

|id |first_name|last_name|full_name|
|---|----------|---------|---------|
|1  |John      |Doe      |John Doe |

What are some limitations?

That’s a wonderful feature. Unfortunately, as always:

  • Not all dialects support it
  • Not all dialects support both VIRTUAL and/or STORED (both approaches have their benefits)
  • The feature itself is quite limited in SQL

Let’s look at the third bullet. What if we wanted to “compute” a column by using joins or correlated subqueries? We cannot, in SQL. E.g. PostgreSQL rejects this:

CREATE TABLE customer (
  id BIGINT NOT NULL PRIMARY KEY,
  first_name TEXT NOT NULL,
  last_name TEXT NOT NULL,
  address_id BIGINT REFERENCES address,
  full_address TEXT GENERATED ALWAYS AS ((
    SELECT a.address 
    FROM address AS a
    WHERE a.address_id = customer.address_id
  )) VIRTUAL
);

Why?

  • It doesn’t support VIRTUAL, only STORED
  • Even if it did support VIRTUAL, it currently throws SQL Error [0A000]: ERROR: cannot use subquery in column generation expression

There isn’t really any good reason that I can see for this limitation. After all, you can easily create a view like this:

CREATE VIEW v_customer AS
SELECT 
  id, first_name, last_name, address_id,
  (
    SELECT a.address 
    FROM address AS a
    WHERE a.address_id = customer.address_id
  ) AS full_address
FROM customer

And now, you have the desired behaviour. This approach has its own caveats, including:

  • The view is not the table. Every dialect has limitations with respect to updatable views, i.e. it can be difficult to write to this view.
  • Views are stored objects, and as such need to be versioned and installed. This isn’t a huge problem per se, but there are folks who try to avoid this, because of… well, the extra effort of doing database change management correctly?
  • You always have to decide whether to query the view or the table.

Enter jOOQ’s client side computed columns

This is why jOOQ 3.17 now offers this wonderful feature on the client side. Both versions are supported:

  • VIRTUAL computed columns are columns that are replaced by their respective expression when the column appears in any non-write position, e.g. SELECT, WHERE, but also RETURNING
  • STORED computed columns are columns that are computed when written to, e.g. in INSERT, UPDATE, MERGE

Let’s first look at VIRTUAL computed columns. The above two use-cases can be configured as follows in the code generator, assuming a Maven config.

<configuration>
    <generator>
        <database>

            <!-- Tell the code generator to add synthetic columns, i.e.
                 columns that the server does not know anything about -->
            <syntheticObjects>
                <columns>
                    <column>
                        <tables>customer|staff|store</tables>
                        <name>full_address</name>
                        <type>text</type>
                    </column>
                    <column>
                        <tables>customer|staff</tables>
                        <name>full_name</name>
                        <type>text</type>
                    </column>
                </columns>
            </syntheticObjects>

            <!-- Now tell the code generator how to compute the values -->
            <forcedTypes>
                <forcedType>
                    <generator>ctx -> DSL.concat(
                        FIRST_NAME, DSL.inline(" "), LAST_NAME)
                    </generator>
                    <includeExpression>full_name</includeExpression>
                </forcedType>
                <forcedType>
                    <generator>ctx -> DSL.concat(
                        address().ADDRESS_, 
                        DSL.inline(", "), 
                        address().POSTAL_CODE, 
                        DSL.inline(", "), 
                        address().city().CITY_, 
                        DSL.inline(", "), 
                        address().city().country().COUNTRY_
                    )</generator>
                    <includeExpression>full_address</includeExpression>
                </forcedType>
            </forcedTypes>
        </database>
    </generator>
</configuration>

The above example uses two new code generation features:

That’s it. With these two things, you can register a single jOOQ Field expression that computes the value of your desired columns. Notice how the FULL_ADDRESS makes use of implicit joins to simplify access to other tables. Of course, you could have also written a correlated subquery, which is one way to implement these implicit joins. It would have just been a bit more laborious.

You can query these columns like any other in jOOQ:

Result<Record2<String, String>> result =
ctx.select(CUSTOMER.FULL_NAME, CUSTOMER.FULL_ADDRESS)
   .from(CUSTOMER)
   .fetch();

The generated query does all the joining for you, transparently:

select
  customer.first_name || ' ' || customer.last_name 
    as full_name,
  alias_114975355.address || ', ' || 
  alias_114975355.postal_code || ', ' || 
  alias_57882359.city || ', ' || 
  alias_1060004.country
    as full_address
from (
  customer
    join (
      address as alias_114975355
        join (
          city as alias_57882359
            join country as alias_1060004
              on alias_57882359.country_id = alias_1060004.country_id
        )
          on alias_114975355.city_id = alias_57882359.city_id
    )
      on customer.address_id = alias_114975355.address_id
  )

The result being:

+----------------+------------------------------------------------+
|full_name       |full_address                                    |
+----------------+------------------------------------------------+
|ANDREA HENDERSON|320 Baiyin Parkway, 37307, Mahajanga, Madagascar|
|CLARA SHAW      |1027 Songkhla Manor, 30861, Molodetno, Belarus  |
|SHANE MILLARD   |184 Mandaluyong Street, 94239, La Paz, Mexico   |
|DANNY ISOM      |734 Bchar Place, 30586, Okara, Pakistan         |
|VALERIE BLACK   |782 Mosul Street, 25545, Brockton, United States|
|...             |...                                             |
+----------------+------------------------------------------------+

Note that as you’d expect, if you omit one of these columns, the relevant parts of the query aren’t generated, including implicit joins. So, for example, if you query this:

Result<Record1<String>> result =
ctx.select(CUSTOMER.FULL_NAME)
   .from(CUSTOMER)
   .fetch();

The generated SQL is much simpler:

select customer.first_name || ' ' || customer.last_name as full_name
from customer

A more complex example

SQL JOINs can be quite boring and repetitive at times. Consider this schema:

CREATE TABLE currency (
  code CHAR(3) NOT NULL,
  
  PRIMARY KEY (code)
);

CREATE TABLE conversion (
  from_currency CHAR(3) NOT NULL,
  to_currency CHAR(3) NOT NULL,
  rate NUMERIC(18, 2) NOT NULL,
  
  PRIMARY KEY (from_currency, to_currency),
  FOREIGN KEY (from_currency) REFERENCES currency,
  FOREIGN KEY (to_currency) REFERENCES currency
);

CREATE TABLE transaction (
  id BIGINT NOT NULL,
  amount NUMERIC(18, 2) NOT NULL,
  currency CHAR(3) NOT NULL,
  
  PRIMARY KEY (id),
  FOREIGN KEY (currency) REFERENCES currency
);

A typical (but simplified) finance application that has transactions with amounts and a currency associated with the amount. Imagine the CONVERSION table could have bitemporal versioning to make sure we can calculate the correct conversion rate at any given date, if needed.

Now, any time we want to sum up transactions, we have to convert the amount to some user currency, irrespective of the transaction currency. Isn’t that boring? We shouldn’t have to repeat this logic all over the place.

You can of course write views like this:

CREATE VIEW v_transaction AS
SELECT
  id, amount, currency,
  amount * (
    SELECT c.rate
    FROM conversion AS c
    WHERE c.from_currency = t.currency
    AND c.to_currency = 'USD'
  ) AS amount_usd
FROM transaction AS t

That would help summing up all transactions in USD. If we needed a different currency, we can either create different views, create stored (table valued, even?) functions, or generate the SQL with jOOQ.

But why not just store the computation with the jOOQ generated table directly?

<configuration>
    <generator>
        <database>

            <!-- Again, add the synthetic columns -->
            <syntheticObjects>
                <columns>
                    <column>
                        <tables>TRANSACTION</tables>
                        <name>AMOUNT_USD</name>
                        <type>NUMERIC</type>
                    </column>
                    <column>
                        <tables>TRANSACTION</tables>
                        <name>AMOUNT_USER_CURRENCY</name>
                        <type>NUMERIC</type>
                    </column>
                </columns>
            </syntheticObjects>

            <!-- And define the computations -->
            <forcedTypes>
                <forcedType>
                    <generator>ctx -> AMOUNT.times(DSL.field(
   DSL.select(Conversion.CONVERSION.RATE)
      .from(Conversion.CONVERSION)
      .where(Conversion.CONVERSION.FROM_CURRENCY.eq(CURRENCY))
      .and(Conversion.CONVERSION.TO_CURRENCY.eq(
           DSL.inline("USD")))))
                    </generator>
                    <includeExpression>
                        TRANSACTION\.AMOUNT_USD
                    </includeExpression>
                </forcedType>
                <forcedType>
                    <generator>ctx -> AMOUNT.times(DSL.field(
    DSL.select(Conversion.CONVERSION.RATE)
       .from(Conversion.CONVERSION)
       .where(Conversion.CONVERSION.FROM_CURRENCY.eq(CURRENCY))
       .and(Conversion.CONVERSION.TO_CURRENCY.eq(
           (String) ctx.configuration().data("USER_CURRENCY")))))
                    </generator>
                    <includeExpression>
                        TRANSACTION\.AMOUNT_USER_CURRENCY
                    </includeExpression>
                </forcedType>
            </forcedTypes>
        </database>
    </generator>
</configuration>

It does what you’d expect it does. Attach a correlated subquery to the two columns. The interesting bit, however is the AMOUNT_USER_CURRENCY column. It accesses ctx.configuration().data("USER_CURRENCY"). That’s just arbitrary user-defined data, which you can pass around jOOQ and access from anywhere.

When you run this query without setting that "USER_CURRENCY" information like this:

ctx.select(
        TRANSACTION.ID,
        TRANSACTION.AMOUNT,
        TRANSACTION.CURRENCY,
        TRANSACTION.AMOUNT_USD,
        TRANSACTION.AMOUNT_USER_CURRENCY,
        sum(TRANSACTION.AMOUNT_USD).over().as("total_usd"),
        sum(TRANSACTION.AMOUNT_USER_CURRENCY).over()
            .as("total_user_currency"))
   .from(TRANSACTION)
   .orderBy(TRANSACTION.ID))
   .fetch()

The generated SQL is this:

select
  TRANSACTION.ID,
  TRANSACTION.AMOUNT,
  TRANSACTION.CURRENCY,
  (TRANSACTION.AMOUNT * (
    select CONVERSION.RATE
    from CONVERSION
    where (
      CONVERSION.FROM_CURRENCY = TRANSACTION.CURRENCY
      and CONVERSION.TO_CURRENCY = 'USD'
    )
  )) AMOUNT_USD,
  (TRANSACTION.AMOUNT * (
    select CONVERSION.RATE
    from CONVERSION
    where (
      CONVERSION.FROM_CURRENCY = TRANSACTION.CURRENCY
      and CONVERSION.TO_CURRENCY = null
    )
  )) AMOUNT_USER_CURRENCY,
  sum((TRANSACTION.AMOUNT * (
    select CONVERSION.RATE
    from CONVERSION
    where (
      CONVERSION.FROM_CURRENCY = TRANSACTION.CURRENCY
      and CONVERSION.TO_CURRENCY = 'USD'
    )
  ))) over () total_usd,
  sum((TRANSACTION.AMOUNT * (
    select CONVERSION.RATE
    from CONVERSION
    where (
      CONVERSION.FROM_CURRENCY = TRANSACTION.CURRENCY
      and CONVERSION.TO_CURRENCY = null
    )
  ))) over () total_user_currency
from TRANSACTION
order by TRANSACTION.ID

The user currency is undefined (i.e. NULL), so we don’t get anything for it:

+----+------+--------+----------+--------------------+---------+-------------------+
|  ID|AMOUNT|CURRENCY|AMOUNT_USD|AMOUNT_USER_CURRENCY|total_usd|total_user_currency|
+----+------+--------+----------+--------------------+---------+-------------------+
|   1| 12.25|EUR     |   12.7400|              {null}| 150.0978|             {null}|
|   2| 15.37|USD     |   15.3700|              {null}| 150.0978|             {null}|
|   3| 99.99|GBP     |  121.9878|              {null}| 150.0978|             {null}|
+----+------+--------+----------+--------------------+---------+-------------------+

Now, let’s run the query again after setting the configuration().data() value like this:

// This mutates the configuration.
// To make a copy, use configuration().derive()
ctx.configuration().data("USER_CURRENCY", "CHF");

And suddenly, we get a different result:

+----+------+--------+----------+--------------------+---------+-------------------+
|  ID|AMOUNT|CURRENCY|AMOUNT_USD|AMOUNT_USER_CURRENCY|total_usd|total_user_currency|
+----+------+--------+----------+--------------------+---------+-------------------+
|   1| 12.25|EUR     |   12.7400|             12.2500| 150.0978|           142.9936|
|   2| 15.37|USD     |   15.3700|             14.7552| 150.0978|           142.9936|
|   3| 99.99|GBP     |  121.9878|            115.9884| 150.0978|           142.9936|
+----+------+--------+----------+--------------------+---------+-------------------+

This is extremely powerful! Imagine an application where you:

  • Get a connection from the connection pool
  • Initialise a few context variables, such as the user, and their settings
  • Automatically have arbitrary “views” (i.e. jOOQ expressions, both from dynamic SQL or from this feature) update their contents

A true jOOQ and SQL power user dream.

More possibilities

The above examples were just showing simple cases of scalar subqueries. But nothing keeps you from using:

After all, a client side computed column is just a “variable” referencing an expression that is expanded when you run the query using that column.

Caveats

Unlike server side virtual computed columns, you cannot put an index on these, because the server doesn’t know anything about the column, or the expression. That means the feature is mainly useful for projections and aggregations / computations, less for queries. For example, maybe don’t run filters on such columns.

Stored client side computed columns

A future blog post will talk about the STORED version of client side computed columns, which also includes the new audit column feature. As a teaser, just implement a Generator like the above on a non-synthetic column (i.e. an actual column from your schema), and voilà, the behaviour is now completely different.

More information about this:

Leave a Reply