What’s Faster? COUNT(*) or COUNT(1)?

One of the biggest and undead myths in SQL is that COUNT(*) is faster than COUNT(1). Or was it that COUNT(1) is faster than COUNT(*)? Impossible to remember, because there’s really no reason at all why one should be faster than the other. But is the myth justified?

Let’s measure!

How does COUNT(…) work?

But first, let’s look into some theory. The two ways to count things are not exactly the same thing. Why?

COUNT(*) counts all the tuples in a group
COUNT(<expr>) counts all the tuples in a group for which <expr> evaluates to something that IS NOT NULL

This distinction can be quite useful. Most of the time, we’ll simply COUNT(*) for convenience, but there are (at least) two cases where we don’t want that, for example:

When outer joining

Imagine that in the Sakila database, we have some actors that did not play in any films. Making sure such an actor actually exists:

INSERT INTO actor (actor_id, first_name, last_name)
VALUES (201, 'SUSAN', 'DAVIS');

When inner joining, we might write the following (using PostgreSQL syntax):

SELECT actor_id, a.first_name, a.last_name, count(*) AS c
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

And we won’t get the newly added SUSAN DAVIS, because of the nature of inner join:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

So we might change our query to use LEFT JOIN instead

SELECT actor_id, a.first_name, a.last_name, count(*) AS c
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
LEFT JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

There she is now, but oops, wrong count! She doesn’t have any films, which we have proven before with the INNER JOIN query. Yet we get 1:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     201|SUSAN      |DAVIS       | 1|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

Her COUNT(*) value is 1, because we do get 1 film tuple for her in the group, with all columns being NULL. The solution is to count the FILM_ID instead, which cannot be NULL in the table (being a primary key), but only because of the LEFT JOIN:

SELECT actor_id, a.first_name, a.last_name, count(film_id) AS c
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
LEFT JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

Notice, we could count other things than the primary key, but with the primary key, we’re quite certain we don’t get any other “accidental” nulls in our groups, which we did not want to exclude from the count value.

Now, we’re getting the correct result:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     201|SUSAN      |DAVIS       | 0|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

When counting subsets of a group

An even more powerful application of counting only non-null evaluations of an expression is counting only subsets of a group. We’ve already blogged about this technique in our previous post about aggregating several expressions in one single query.

For example, counting in a single query:

All actors
Actors with their first_name starting with A
Actors with their first_name ending with A
Actors with their first_name containing A

In SQL:

SELECT 
  count(*),
  count(CASE WHEN first_name LIKE 'A%' THEN 1 END),
  count(CASE WHEN first_name LIKE '%A' THEN 1 END),
  count(CASE WHEN first_name LIKE '%A%' THEN 1 END)
FROM actor;

This yields:

count|count|count|count|
-----|-----|-----|-----|
  201|   13|   30|  105|

This is very useful when pivoting data sets (see also Oracle/SQL Server PIVOT clause).

Notice that PostgreSQL supports the SQL standard FILTER clause for this, which is more convenient and more readable. The above query can be written like this, in PostgreSQL:

SELECT 
  count(*),
  count(*) FILTER (WHERE first_name LIKE 'A%'),
  count(*) FILTER (WHERE first_name LIKE '%A'),
  count(*) FILTER (WHERE first_name LIKE '%A%')
FROM actor;

Back to COUNT(*) vs COUNT(1)

Now that we know the theory behind these COUNT expressions, what’s the difference between COUNT(*) and COUNT(1). There is none, effectively. The 1 expression in COUNT(1) evaluates a constant expression for each row in the group, and it can be proven that this constant expression will never evaluate to NULL, so effectively, we’re running COUNT(*), counting ALL the rows in the group again.

There should be no difference, and parsers / optimisers should be able to recognise this and not do the extra work of checking every expression evaluation for NULL-ness.

I recently saw this discussion on Twitter, though, where Vik Fearing looked up the PostgreSQL sources, showing that PostgreSQL does do the extra work instead of optimising this:

Okay, are you ready? The difference between count(*) and count(1) is a spin through this:

for (i = 1; i <= numTransInputs; i++)
{
if (fcinfo->args[i].isnull)
return;
}

But for each row, so it's probably like tens of microseconds on a large grouping!
— Vik Fearing (@pg_xocolatl) September 15, 2019

So, I was curious to see if it mattered. I ran a benchmark on the 4 most popular RDBMS, with these results:

MySQL: Doesn’t matter. Sometimes COUNT(1) was faster, sometimes COUNT(*) was faster, so all differences were only benchmark artifacts
Oracle: Doesn’t matter. Like MySQL
PostgreSQL: Does matter (!). COUNT(*) was consistently faster by around 10% on 1M rows, that’s much more than I had expected
SQL Server: Doesn’t matter. Like MySQL

The benchmark code can be found in the following gists:

The results are below. Each benchmark run repeated SELECT COUNT(*) FROM t or SELECT COUNT(1) FROM t 100 times on a 1M row table, and then the benchmark was repeated 5 times to mitigate any warmup penalties and be fair with respect to caching.

The times displayed are relative to the fastest run per database product. This removes any distraction that may be caused by interpreting actual execution times as we do not want to compare database products against each other.

The database versions I’ve used are:

MySQL 8.0.16 (in Docker)
Oracle 18c XE (in Docker)
PostgreSQL 11.3 (in Docker)
SQL Server 2017 Express (in Windows)

MySQL

No relevant difference, nor a clear winner:

RUN     STMT    RELATIVE_TIME
-----------------------------
0	1	1.0079
0	2	1.0212
1	1	1.0229
1	2	1.0256
2	1	1.0009
2	2	1.0031
3	1	1.0291
3	2	1.0256
4	1	1.0618
4	2	1.0000

Oracle

No relevant difference, nor a clear winner

Run 1, Statement 1 : 1.06874
Run 1, Statement 2 : 1.01982
Run 2, Statement 1 : 1.09175
Run 2, Statement 2 : 1.0301
Run 3, Statement 1 : 1.00308
Run 3, Statement 2 : 1.02499
Run 4, Statement 1 : 1.02503
Run 4, Statement 2 : 1
Run 5, Statement 1 : 1.01259
Run 5, Statement 2 : 1.05828

PostgreSQL

A significant, consistent difference of almost 10%:

RUN 1, Statement 1: 1.00134
RUN 1, Statement 2: 1.09538
RUN 2, Statement 1: 1.00190
RUN 2, Statement 2: 1.09115
RUN 3, Statement 1: 1.00000
RUN 3, Statement 2: 1.09858
RUN 4, Statement 1: 1.00266
RUN 4, Statement 2: 1.09260
RUN 5, Statement 1: 1.00454
RUN 5, Statement 2: 1.09694

Again, I’m surprised by the order of magnitude of this difference. I would have expected it to be less. Curious to hear about your own results in the comments, or further ideas why this is so significant in PostgreSQL.

SQL Server

No relevant difference, nor a clear winner

Run 1, Statement 1: 1.00442
Run 1, Statement 2: 1.00702
Run 2, Statement 1: 1.00468
Run 2, Statement 2: 1.00000
Run 3, Statement 1: 1.00208
Run 3, Statement 2: 1.00624
Run 4, Statement 1: 1.00780
Run 4, Statement 2: 1.00364
Run 5, Statement 1: 1.00468
Run 5, Statement 2: 1.00702

Conclusion

As it is now in 2019, given the database versions mentioned above, unfortunately, there is a significant difference between COUNT(*) and COUNT(1) in PostgreSQL. Luckily (and this is rare in SQL), all the other dialects don’t care and thus, consistently using COUNT(*), rather than COUNT(1) is a slightly better choice for ALL measured database products from this article.

Do note that the benchmark only tried a very simple query! The results might be different when using joins, unions, or any other SQL constructs, or in other edge cases, e.g. when using COUNT() in HAVING or ORDER BY or with window functions, etc.

In any case, there shouldn’t be any difference, and I’m sure that a future PostgreSQL version will optimise the constant expression in the COUNT(<expr>) aggregate function directly in the parser to avoid the extra work.

For other interesting optimisations that do not depend on the cost model, see this article here.

16 thoughts on “What’s Faster? COUNT(*) or COUNT(1)?”

Hans says:

September 19, 2019 at 12:05

This is what I get on Postgres 11.5, Windows 10 Laptop with a SSD:

RUN 1, Statement 1: 1.06690
RUN 1, Statement 2: 1.24424
RUN 2, Statement 1: 1.00000
RUN 2, Statement 2: 1.16920
RUN 3, Statement 1: 1.13591
RUN 3, Statement 2: 1.28373
RUN 4, Statement 1: 1.01390
RUN 4, Statement 2: 1.26324
RUN 5, Statement 1: 1.08433
RUN 5, Statement 2: 1.18853

Postgres was using a Parallel Seq Scan with 2 workers

Loading...

1. lukaseder says:
  
  September 19, 2019 at 12:06
  
  Thanks for the confirmation! So, also significant on your machine.
  
  Loading...
  
Félix says:

September 19, 2019 at 12:30

And what happens if you use count(75697897) or even count(‘dummy’) instead of count(1) ?
(i.e. are the “smarter” RDMS really capable of detecting that the expression is not nullable in an efficient way or have they “hardcoded” the count(1) case ?)

Loading...

1. lukaseder says:
  
  September 19, 2019 at 13:54
  
  Worth a try ;)
  
  I tried counting the ‘oracle is so cool’ expression.
  
  MySQL: Not impressed
  Oracle: Not impressed
  PostgreSQL: Same result
  SQL Server: Not impressed
  
  Loading...
  
Colin says:

September 19, 2019 at 17:02

I use count(*) all over the place and never really considered if there was a performance hit. ill test it versus Oracle and hsql (use it for embedded database) and see if it makes much difference for me also!

Loading...

1. lukaseder says:
  
  September 20, 2019 at 09:04
  
  Thanks for the feedback. Will be very curious to learn how HSQL performs here. I suspect it will be like PostgreSQL, without any optimisations on COUNT(1)
  
  Loading...
  
Pep Pla says:

September 20, 2019 at 19:33

Not sure if the mysql test case is relevant in this case. Using a cursor forces the use of a temporary table and this has a huge penalty in performance compared with the size of the dataset.

Loading...

1. lukaseder says:
  
  September 20, 2019 at 19:43
  
  Very interesting, thanks for sharing. Historically, I’ve been able to detect significant performance differences in MySQL, using this approach, but I’m very open to learning how to improve it. Do you have a suggestion that allows for averaging out hundreds of executions?
  
  Loading...
  
Monika Mitura (@monika_mitura) says:

September 25, 2019 at 14:18

Well, in Oracle it’s quite simple – when we have count(1) oracle rewrite this into count() so there is no way to count(1) be faster than count(). It’s the same really. You can check this on your on :)

https://pretius.com/oracle-count-or-count1-that-is-the-question/
https://pretius.com/oracle-count-vs-count1-the-final-battle/

Loading...

1. lukaseder says:
  
  September 25, 2019 at 15:40
  
  Yes indeed, Oracle gets it right. Thanks for your comment, Monika
  
  Loading...
  
Mark W says:

October 26, 2019 at 18:57

The myth I always heard, especially in the early 2000s, was that COUNT(1) was faster than the more popular COUNT(). Sometimes some smarty-pants would try to get people to edit their code to change COUNT() to COUNT(1) – the (incorrect) justification usually being something like COUNT(*) has to check if all columns are null whereas COUNT(1) is just checking if a single constant is null.

Tom Kyte (now-retired Oracle guru and the guy formerly behind AskTom) had to disprove this myth a few times by posting test cases and showing the plans as well as timings. Some people just want to repeat performance tidbits they heard without ever verifying them. :(

Loading...

1. lukaseder says:
  
  October 27, 2019 at 08:27
  
  The SQL ecosystem is full of people who make spurious claims about SQL performance based on some syntax, without the least bit of verification. I mean, it’s ridiculous in the first place, given SQL’s 4GL promise, to think that one syntax should be faster than another. They should be equally fast, if run by a perfect optimiser. So, all differences are really just implementation details, or if you will, “bugs” of specific vendors and versions.
  
  For example, almost at the same time as you made your comment, this comment came in, just to prove the point :)
  https://blog.jooq.org/2016/05/27/correlated-subqueries-are-evil-and-slow-or-are-they/#comment-329222
  
  Loading...
  
Martin L. Buchanan says:

September 21, 2022 at 20:19

Recently needed to change count(*) to count(1) within some exists/not exists sub queries in views because in PostgreSQL count(*) created dependencies on the columns in the queried tables, so I was dropping columns and needing to recreate those views, which I did not expect. Very much appreciate your blog content as SQL development is my main job.

Loading...

1. lukaseder says:
  
  September 23, 2022 at 08:37
  Interesting, I can’t reproduce this naively even on PostgreSQL 9.3, let alone 15. I’ve tried:
```
create table t (i int, j int);

create view v as
select count(*) 
from t;

alter table t drop j;

select * from v;
```
  May be an edge case / bug worth reporting to the community, if you can find a minimal reproducer? https://www.postgresql.org/docs/current/bug-reporting.html
  
  I don’t see why you should have experienced the behaviour you did
  
  Loading...
iheb says:

October 2, 2023 at 14:41

Is this still the case for recent versions of PG ?

Loading...

1. lukaseder says:
  
  October 3, 2023 at 11:34
  Here’s the linked benchmark from the blog post. You can run it on your own PG instance. On PG 15, I’m still getting the same difference:
```
RUN 1, Statement 1: 1.01133
RUN 1, Statement 2: 1.10533
RUN 2, Statement 1: 1.00212
RUN 2, Statement 2: 1.11307
RUN 3, Statement 1: 1.00000
RUN 3, Statement 2: 1.10907
RUN 4, Statement 1: 1.03223
RUN 4, Statement 2: 1.11331
RUN 5, Statement 1: 1.04064
RUN 5, Statement 2: 1.16061
```
  Loading...

What’s Faster? COUNT(*) or COUNT(1)?

How does COUNT(…) work?

Back to COUNT(*) vs COUNT(1)

Conclusion

Like this:

Published by lukaseder

16 thoughts on “What’s Faster? COUNT(*) or COUNT(1)?”

Leave a ReplyCancel reply

How does COUNT(…) work?

Back to COUNT(*) vs COUNT(1)

Conclusion

Like this:

Published by lukaseder

16 thoughts on “What’s Faster? COUNT(*) or COUNT(1)?”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.