How to calculate percentiles with SQL PERCENTILE_CONT
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, we are going to investigate how to calculate percentiles with the SQL PERCENTILE_CONT function.
Domain Model
Let’s consider we have the following quotes
table that holds the historical price values of various stocks and indexes:
The quotes
table is populated with data that looks like this:
| quote_date | ticker | close_price | | ---------- | ------ | ----------- | | 2023-12-04 | SPX | 4569.7800 | | 2023-12-05 | SPX | 4567.1800 | | 2023-12-06 | SPX | 4549.3400 | | 2023-12-07 | SPX | 4585.5900 | | 2023-12-08 | SPX | 4604.3700 | | 2023-12-11 | SPX | 4622.4400 | | 2023-12-12 | SPX | 4643.7000 | | 2023-12-13 | SPX | 4707.0900 | | 2023-12-14 | SPX | 4719.5500 | | 2023-12-15 | SPX | 4719.1900 |
SQL PERCENTILE_CONT
The SQL Standard provides support for a great variety of Window Functions, such as DENSE_RANK
, and Aggregation Functions, like PERCENTILE_CONT
, PERCENTILE_DISC
, or LISTAGG
.
The PERCENTILE_CONT
is an inverse distribution aggregate function that we can use to calculate percentiles.
While Window Functions use the OVER
clause, the Aggregate Functions use the WITHIN GROUP
clause. Therefore, the PERCENTILE_CONT
is defined as follows:
PERCENTILE_CONT(percentage) WITHIN GROUP (ORDER BY sort_criteria)
To calculate the Median (50th), 75th, 95th, and 99th percentiles of the S&P 500 index value between 2019 and 2023, we can use the following SQL query with PERCENTILE_CONT
:
SELECT PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY close_price) AS median, PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY close_price) AS p75, PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY close_price) AS p95, PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY close_price) AS p99 FROM quotes WHERE ticker = 'SPX' AND quote_date BETWEEN DATE '2019-01-01' AND DATE '2023-12-31'
The SQL query above runs on both PostgreSQL and Oracle. For SQL Server, you will need to modify it as explained further on in the SQL server section.
When running the above SQL query, we get the following values:
| median | p75 | p95 | p99 | | ------- | ------- | ----------------- | ----------------- | | 3907.45 | 4305.09 | 4602.737999999999 | 4750.138400000001 |
If we want to calculate these percentiles per year, then we can use the following SQL query:
SELECT EXTRACT(YEAR FROM quote_date) AS year, PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY close_price) AS median, PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY close_price) AS p75, PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY close_price) AS p95, PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY close_price) AS p99 FROM quotes WHERE ticker = 'SPX' AND quote_date BETWEEN DATE '2019-01-01' AND DATE '2023-12-31' GROUP BY EXTRACT(YEAR FROM quote_date) ORDER BY year
And we will get the following results:
| year | median | p75 | p95 | p99 | | ---- | -------- | ----------------- | ----------------- | ------------------ | | 2019 | 2919.025 | 3001.615 | 3160.353 | 3227.3273000000004 | | 2020 | 3276.02 | 3408.63 | 3690.79 | 3729.44 | | 2021 | 4304 | 4486.287499999999 | 4699.282999999999 | 4782.4638 | | 2022 | 4027.26 | 4338.87 | 4588.280000000001 | 4719.71 | | 2023 | 4291.16 | 4466.95 | 4634.133 | 4772.3392 |
PERCENTILE_CONT on SQL Server
In SQL Server, the PERCENTILE_CONT
function requires both the WITHIN GROUP
and the OVER
clause, and the result set is closer to the typical of a Window Function, rather than the one we would expect from an Aggregate Function. For this reason, you will also need to use DISTINCT
, otherwise the percentile value will be repeated for every matching record in the quotes
table.
Therefore, if you’re using SQL Server, you will need to change the first SQL query like this:
SELECT DISTINCT PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY close_price) OVER (PARTITION BY ticker) AS median, PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY close_price) OVER (PARTITION BY ticker) AS p75, PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY close_price) OVER (PARTITION BY ticker) AS p95, PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY close_price) OVER (PARTITION BY ticker) AS p99 FROM quotes WHERE ticker = 'SPX' AND quote_date BETWEEN '2019-01-01' AND '2023-12-31'
Notice that we need to use DISTINCT
and provide an OVER
clause as well.
The second SQL query that calculates the percentile values per year looks like this for SQL Server:
SELECT DISTINCT YEAR(quote_date) AS year, PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY close_price) OVER (PARTITION BY YEAR(quote_date)) AS median, PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY close_price) OVER (PARTITION BY YEAR(quote_date)) AS p75, PERCENTILE_CONT(0.95) WITHIN GROUP(ORDER BY close_price) OVER (PARTITION BY YEAR(quote_date)) AS p95, PERCENTILE_CONT(0.99) WITHIN GROUP(ORDER BY close_price) OVER (PARTITION BY YEAR(quote_date)) AS p99 FROM quotes WHERE ticker = 'SPX' AND quote_date BETWEEN '2019-01-01' AND '2023-12-31' ORDER BY year
That’s it!
I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
Calculating percentile values is a very common task when dealing with metrics or other time series.
While the SQL Standard defines the PERCENTILE_CONT
function, the syntax may differ from one relational database to another, so make sure you check the database manual before starting using it.
PERCENTILE_CONT is an aggregate function, not a window function
That’s true. According to the standard, it’s an aggregate function. In SQL Server, it behaves like a window function, hence the mandatory OVER and the need of DISTINCT.
What’s even more confusing is that the MariaDB manual also calls it a window function:
https://mariadb.com/kb/en/percentile_cont/