PostgreSQL JDBC Statement Caching

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we’re going to see how the PostgreSQL JDBC Driver implements Statement Caching and what settings we need to configure in order to optimize the performance of our data access layer.

Prepared Statements

The JDBC API allows you to create a PreparedStatement by calling prepareStatement(java.lang.String) method on a given Connection reference.

For this reason, it’s very common for Java developers to think that Prepared Statements work like this:

Server-Side Prepared Statement

This misconception has been propagated throughout the years, as demonstrated by this official Java tutorial, which states that:

The main feature of a PreparedStatement object is that, unlike a Statement object, it is given a SQL statement when it is created. The advantage to this is that in most cases, this SQL statement is sent to the DBMS right away, where it is compiled. As a result, the PreparedStatement object contains not just a SQL statement, but a SQL statement that has been precompiled. This means that when the PreparedStatement is executed, the DBMS can just run the PreparedStatement SQL statement without having to compile it first.

However, the Connection.prepareStatement Javadoc is quite straightforward about how the JDBC PreparedStatement works:

If the driver supports precompilation, the method prepareStatement will send the statement to the database for precompilation. Some drivers may not support precompilation. In this case, the statement may not be sent to the database until the PreparedStatement object is executed.

So, if a relational database system does not provide precompilation by default, the PreparedStatement will be executed like this:

Statement

If a prepared statement is executed only once, it’s more efficient to send the statement in a single database roundtrip instead of using two database roundtrips (one for prepare and the other for execute).

And, since database vendors have noticed that prepared statements are often executed only once, some of them chose to delay the precompilation phase.

PostgreSQL JDBC prepareThreshold configuration

As explained in the PostgreSQL documentation, PostgreSQL delays the precompilation until the 5th execution of a given statement, where the default value of 5 is given by the prepareThreshold setting.

After the 5th execution, PostgreSQL will compile the PreparedStatement and avoid the prepare phase for the future statement execution.

PostgreSQL JDBC Statement Caching

The PostgreSQL JDBC Driver has a built-in statement cache that can be configured using the following settings:

preparedStatementCacheQueries (default value is 256), which defines the number of statements that can stored in the cache
preparedStatementCacheSizeMiB (default value is 5 MB), which defines the statement cache size per connection.

Testing Time

To demonstrate how the PostgreSQL JDBC Driver Statement Caching mechanism works, we are going to use a Hikari connection pool that can accommodate at most 4 database connections:

protected HikariConfig hikariConfig(DataSource dataSource) {
    HikariConfig hikariConfig = new HikariConfig();
    hikariConfig.setMaximumPoolSize(4);
    hikariConfig.setDataSource(dataSource);
    return hikariConfig;
}

Now, we will execute the same SQL statement on 4 concurrent Java Threads, and we are going to inspect if a given statement is reused:

ThreadLocalRandom random = ThreadLocalRandom.current();
ExecutorService executorService = Executors.newFixedThreadPool(4);
ConcurrentMap<CachedQuery, CachedQueryStats> cachedQueryStatsMap = 
    new ConcurrentHashMap<>();

int statementExecutionCount = 100;
List<Future<?>> futures = new ArrayList<>(statementExecutionCount);

for (long i = 1; i <= statementExecutionCount; i++) {
    futures.add(
        executorService.submit(() -> {
            doInJDBC(connection -> {
                try (PreparedStatement statement = connection.prepareStatement(
                    "SELECT title FROM post WHERE id = ?"
                )) {
                    PgStatement pgStatement = statement.unwrap(PgStatement.class);
                    CachedQuery cachedQuery = ReflectionUtils.getFieldValue(
                        pgStatement, 
                        "preparedQuery"
                    );
                    CachedQueryStats cachedQueryStats = cachedQueryStatsMap
                        .computeIfAbsent(
                            cachedQuery, pgc -> new CachedQueryStats()
                        );
                    cachedQueryStats.incrementExecutionCount();
                    
                    if(pgStatement.isUseServerPrepare()) {
                        cachedQueryStats.incrementPreparedExecutions();
                    } else {
                        cachedQueryStats.incrementUnpreparedExecutions();
                    }
                    
                    statement.setLong(1, random.nextLong(postCount()));
                    statement.executeQuery();
                }
            });
        })
    );
}

for(Future<?> future : futures) {
    future.get();
}

for(Map.Entry<CachedQuery, CachedQueryStats> statisticsMapEntry : 
        cachedQueryStatsMap.entrySet()) {
    CachedQuery cachedQuery = statisticsMapEntry.getKey();
    CachedQueryStats cachedQueryStats = statisticsMapEntry.getValue();
    LOGGER.error(
        "Statement [{}] stats: [{}]", 
        cachedQuery, 
        cachedQueryStats
    );
}

When executing the test case above, we get the following output:

Statement [
    CachedQuery{
        executeCount=29, 
        query=SELECT title FROM post WHERE id = ?, 
        isFunction=false
    }
] stats: [
    executionCount=29, 
    unpreparedExecutions=4, 
    preparedExecutions=25
]
Statement [
    CachedQuery{
        executeCount=21, 
        query=SELECT title FROM post WHERE id = ?, 
        isFunction=false
    }
] stats: [
    executionCount=21, 
    unpreparedExecutions=4, 
    preparedExecutions=17
]
Statement [
    CachedQuery{
        executeCount=21, 
        query=SELECT title FROM post WHERE id = ?, 
        isFunction=false
    }
] stats: [
    executionCount=21, 
    unpreparedExecutions=4, 
    preparedExecutions=17
]
Statement [
    CachedQuery{
        executeCount=29, 
        query=SELECT title FROM post WHERE id = ?, 
        isFunction=fals
    }
] stats: [
    executionCount=29, 
    unpreparedExecutions=4, 
    preparedExecutions=25
]

Because the statement cache is bound to the underlying database connection and since we are using a connection pool that caches 4 database connections, we can see that the PostgreSQL JDBC Driver has cached 4 prepared statements.

The first 4 executions are unprepared, meaning that the statement is sent to the database in one database roundtrip, where it will be prepared and executed in one go.

On the 5th execution onward, the statement is compiled, and its execution plan will be cached on the database server for as long as the database connection is open.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

As demonstrated by this Twitter pool, PostgreSQL is a very popular choice among Java developers:

What relational database your project is using?
— Vlad Mihalcea (@vlad_mihalcea) September 12, 2020

Knowing how the PostgreSQL JDBC Driver statement caching mechanism works is very important, as leveraging it can help you improve the performance of your data access layer.

Follow @vlad_mihalcea