How to avoid the Hibernate Query Cache N+1 issue

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

I recently answered this question on the Hibernate forum, and since it’s a very good one, I decided to turn it into an article.

In this post, we will describe how the N+1 query issue is generated when using the second-level Hibernate Query Cache.

How the N+1 query issue is generated when using the second-level #Hibernate Query Cache - @vlad_mihalcea https://t.co/ysel1ZBYU3 pic.twitter.com/Dg8gzlO6ST
— Java (@java) June 7, 2018

Domain Model

Assuming we have the following Domain Model classes:

Which are mapped as follows:

@Entity(name = "Post")
@Table(name = "post")
@org.hibernate.annotations.Cache(
    usage = CacheConcurrencyStrategy.READ_WRITE
)
public class Post {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    //Getters and setters omitted for brevity
}

@Entity(name = "PostComment")
@Table(name = "post_comment")
@org.hibernate.annotations.Cache(
    usage = CacheConcurrencyStrategy.READ_WRITE
)
public class PostComment {

    @Id
    @GeneratedValue
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    private Post post;

    private String review;

    //Getters and setters omitted for brevity
}

So, both the Post and PostComment entities are cacheable and use the READ_WRITE CacheConcurrencyStrategy.

Query Cache

To enable the Query Cache, we need to enable the second-level cache as well. Therefore, the following configuration properties must be supplied:

<property 
    name="hibernate.cache.use_second_level_cache" 
    value="true" 
/>

<property 
    name="hibernate.cache.use_query_cache" 
    value="true" 
/>

<property 
    name="hibernate.cache.region.factory_class" 
    value="ehcache" 
/>

Although we have enabled the Query Cache, it does not automatically apply to any query and we need to explicitly tell Hibernate which queries are to be cached. To do so, you need to use the org.hibernate.cacheable query hint as illustrated by the following example:

public List<PostComment> getLatestPostComments(
        EntityManager entityManager) {
    return entityManager.createQuery(
        "select pc " +
        "from PostComment pc " +
        "order by pc.post.id desc", PostComment.class)
    .setMaxResults(10)
    .setHint(QueryHints.HINT_CACHEABLE, true)
    .getResultList();
}

Now, if we call the getLatestPostComments twice, we can see that the result is fetched from the cache the second time we execute this method.

Therefore, when executing this test case:

printCacheRegionStatistics(
    StandardQueryCache.class.getName()
);
assertEquals(
    3, 
    getLatestPostComments(entityManager).size()
);

printCacheRegionStatistics(
    StandardQueryCache.class.getName()
);
assertEquals(
    3, 
    getLatestPostComments(entityManager).size()
);

Hibernate generates the following output:

Region: org.hibernate.cache.internal.StandardQueryCache,
Statistics: SecondLevelCacheStatistics[
    hitCount=0,
    missCount=0,
    putCount=0,
    elementCountInMemory=0,
    elementCountOnDisk=0,
    sizeInMemory=0
],
Entries: {}

-- Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache
-- Query results were not found in cache

SELECT pc.id AS id1_1_,
       pc.post_id AS post_id3_1_,
       pc.review AS review2_1_
FROM post_comment pc
ORDER BY pc.post_id DESC
LIMIT 10

-- Caching query results in region: org.hibernate.cache.internal.StandardQueryCache; timestamp=6244549098291200

Region: org.hibernate.cache.internal.StandardQueryCache,
Statistics: SecondLevelCacheStatistics[
    hitCount=0,
    missCount=1,
    putCount=1,
    elementCountInMemory=1,
    elementCountOnDisk=0,
    sizeInMemory=776
],
Entries: {
sql: select pc.id as id1_1_, pc.post_id as post_id3_1_, pc.review as review2_1_ from post_comment pc order by pc.post_id desc; parameters: ; 
named parameters: {}; 
max rows: 10; 
transformer: org.hibernate.transform.CacheableResultTransformer@110f2=[
    6244549098291200, 
    4, 
    3, 
    2
]}

-- Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache
-- Checking query spaces are up-to-date: [post_comment]
-- [post_comment] last update timestamp: 6244549098266628, result set timestamp: 6244549098291200
-- Returning cached query results

As you can see in the log, only the first call has executed the SQL query as the second one used the cached result set.

N+1 query issue

Now, let’s see what happens if we evict all PostComment entities prior to running the second call to the getLatestPostComments method.

doInJPA(entityManager -> {
    entityManager
    .getEntityManagerFactory()
    .getCache()
    .evict(PostComment.class);
});

doInJPA(entityManager -> {
    assertEquals(
        3, 
        getLatestPostComments(entityManager).size()
    );
});

When running the test case above, Hibernate generates the following output:

-- Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache
-- Checking query spaces are up-to-date: [post_comment]
-- [post_comment] last update timestamp: 6244574473195524, result set timestamp: 6244574473207808
-- Returning cached query results

SELECT pc.id AS id1_1_0_,
       pc.post_id AS post_id3_1_0_,
       pc.review AS review2_1_0_
FROM post_comment pc
WHERE pc.id = 4

SELECT pc.id AS id1_1_0_,
       pc.post_id AS post_id3_1_0_,
       pc.review AS review2_1_0_
FROM post_comment pc
WHERE pc.id = 3

SELECT pc.id AS id1_1_0_,
       pc.post_id AS post_id3_1_0_,
       pc.review AS review2_1_0_
FROM post_comment pc
WHERE pc.id = 2

As you can see in the logs, even if the entity identifiers were fetched from the Query Cache, because the entities are not found in the second-level cache, the PostComment entities are fetched using SQL queries.

If the Query Cache result contains N entity identifiers, N secondary queries will be executed, which might actually be worse than executing the query that we have previously cached.

This is a typical N+1 query issue, just that the first query is being served from the cache while the N ones hit the database.

As I explained before, you can detect all N+1 query issues using my Hypersistence Utils unit test assert mechanism and fix this problem long before deploying into production.

Avoiding the issue

To avoid this issue, you have to make sure that the cached entity is stored in the second-level cache.

. Make sure that the PostComment entity is cacheable, meaning that you have annotated it with the Hibernate-specific @Cache annotation. Although JPA defines the @Cacheable annotation, that’s not enough since Hibernate needs to know what CacheConcurrencycStrategy you want to use for the entity in question.
. Also, make sure that the Ehcache timeToIdleSeconds or the equivalent TTL(Time to live) setting of the other second-level cache providers is greater for entities than for the Query Cache. This will ensure that entities will stay in the cache longer than the query cache result set, which only stores the entity identifiers.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While the Hibernate Query Cache can help you off-load the database server, you have to be very careful when caching entities because, unlike DTO projections, the result set is not served entirely from the Query Cache, being dependent on the second-level cache entity regions as well.

That’s why monitoring is very important when you enable the Hibernate second-level cache and its associated Query Cache.

Follow @vlad_mihalcea