How to calculate percentiles with SQL PERCENTILE_CONT

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to investigate how to calculate percentiles with the SQL PERCENTILE_CONT function.

Domain Model

Let’s consider we have the following quotes table that holds the historical price values of various stocks and indexes:

SQL PERCENTILE_CONT quotes table

The quotes table is populated with data that looks like this:

| quote_date | ticker | close_price |
| ---------- | ------ | ----------- |
| 2023-12-04 | SPX    | 4569.7800   |
| 2023-12-05 | SPX    | 4567.1800   |
| 2023-12-06 | SPX    | 4549.3400   |
| 2023-12-07 | SPX    | 4585.5900   |
| 2023-12-08 | SPX    | 4604.3700   |
| 2023-12-11 | SPX    | 4622.4400   |
| 2023-12-12 | SPX    | 4643.7000   |
| 2023-12-13 | SPX    | 4707.0900   |
| 2023-12-14 | SPX    | 4719.5500   |
| 2023-12-15 | SPX    | 4719.1900   |

SQL PERCENTILE_CONT

The SQL Standard provides support for a great variety of Window Functions, such as DENSE_RANK, and Aggregation Functions, like PERCENTILE_CONT, PERCENTILE_DISC, or LISTAGG.

The PERCENTILE_CONT is an inverse distribution aggregate function that we can use to calculate percentiles.

While Window Functions use the OVER clause, the Aggregate Functions use the WITHIN GROUP clause. Therefore, the PERCENTILE_CONT is defined as follows:

PERCENTILE_CONT(percentage) WITHIN GROUP (ORDER BY sort_criteria)

To calculate the Median (50th), 75th, 95th, and 99th percentiles of the S&P 500 index value between 2019 and 2023, we can use the following SQL query with PERCENTILE_CONT:

SELECT
    PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY close_price) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY close_price) AS p75,
    PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY close_price) AS p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY close_price) AS p99
FROM 
    quotes
WHERE
    ticker = 'SPX' AND
    quote_date BETWEEN DATE '2019-01-01' AND DATE '2023-12-31'

The SQL query above runs on both PostgreSQL and Oracle. For SQL Server, you will need to modify it as explained further on in the SQL server section.

When running the above SQL query, we get the following values:

| median  | p75     | p95               | p99               |
| ------- | ------- | ----------------- | ----------------- |
| 3907.45 | 4305.09 | 4602.737999999999 | 4750.138400000001 |

If we want to calculate these percentiles per year, then we can use the following SQL query:

SELECT
    EXTRACT(YEAR FROM quote_date) AS year,
    PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY close_price) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY close_price) AS p75,
    PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY close_price) AS p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY close_price) AS p99
FROM 
    quotes
WHERE
    ticker = 'SPX' AND
    quote_date BETWEEN DATE '2019-01-01' AND DATE '2023-12-31'
GROUP BY 
    EXTRACT(YEAR FROM quote_date)
ORDER BY 
    year

And we will get the following results:

| year | median   | p75               | p95               | p99                |
| ---- | -------- | ----------------- | ----------------- | ------------------ |
| 2019 | 2919.025 | 3001.615          | 3160.353          | 3227.3273000000004 |
| 2020 | 3276.02  | 3408.63           | 3690.79           | 3729.44            |
| 2021 | 4304     | 4486.287499999999 | 4699.282999999999 | 4782.4638          |
| 2022 | 4027.26  | 4338.87           | 4588.280000000001 | 4719.71            |
| 2023 | 4291.16  | 4466.95           | 4634.133          | 4772.3392          |

PERCENTILE_CONT on SQL Server

In SQL Server, the PERCENTILE_CONT function requires both the WITHIN GROUP and the OVER clause, and the result set is closer to the typical of a Window Function, rather than the one we would expect from an Aggregate Function. For this reason, you will also need to use DISTINCT, otherwise the percentile value will be repeated for every matching record in the quotes table.

Therefore, if you’re using SQL Server, you will need to change the first SQL query like this:

SELECT DISTINCT 
    PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY close_price) 
        OVER (PARTITION BY ticker) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY close_price) 
        OVER (PARTITION BY ticker) AS p75,
    PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY close_price) 
        OVER (PARTITION BY ticker) AS p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY close_price) 
        OVER (PARTITION BY ticker) AS p99
FROM 
    quotes
WHERE
    ticker = 'SPX' AND
    quote_date BETWEEN '2019-01-01' AND '2023-12-31'

Notice that we need to use DISTINCT and provide an OVER clause as well.

The second SQL query that calculates the percentile values per year looks like this for SQL Server:

SELECT DISTINCT
    YEAR(quote_date) AS year,
    PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY close_price) 
        OVER (PARTITION BY YEAR(quote_date)) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY close_price) 
        OVER (PARTITION BY YEAR(quote_date)) AS p75,
    PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY close_price) 
        OVER (PARTITION BY YEAR(quote_date)) AS p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY close_price) 
        OVER (PARTITION BY YEAR(quote_date)) AS p99
FROM 
    quotes
WHERE
    ticker = 'SPX' AND
    quote_date BETWEEN '2019-01-01' AND '2023-12-31'
ORDER BY 
    year

That’s it!

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Calculating percentile values is a very common task when dealing with metrics or other time series.

While the SQL Standard defines the PERCENTILE_CONT function, the syntax may differ from one relational database to another, so make sure you check the database manual before starting using it.

Transactions and Concurrency Control eBook

2 Comments on “How to calculate percentiles with SQL PERCENTILE_CONT

  1. PERCENTILE_CONT is an aggregate function, not a window function

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.