Count(*) Vs. Count(1) Vs. Count(Pk): Which Is Better

COUNT(*) vs. COUNT(1) vs. COUNT(pk): which is better?

Bottom Line

Use either COUNT(field) or COUNT(*), and stick with it consistently, and if your database allows COUNT(tableHere) or COUNT(tableHere.*), use that.

In short, don't use COUNT(1) for anything. It's a one-trick pony, which rarely does what you want, and in those rare cases is equivalent to count(*)

Use count(*) for counting

Use * for all your queries that need to count everything, even for joins, use *

SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

But don't use COUNT(*) for LEFT joins, as that will return 1 even if the subordinate table doesn't match anything from parent table

SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Don't be fooled by those advising that when using * in COUNT, it fetches entire row from your table, saying that * is slow. The * on SELECT COUNT(*) and SELECT * has no bearing to each other, they are entirely different thing, they just share a common token, i.e. *.

An alternate syntax

In fact, if it is not permitted to name a field as same as its table name, RDBMS language designer could give COUNT(tableNameHere) the same semantics as COUNT(*). Example:

For counting rows we could have this:

SELECT COUNT(emp) FROM emp

And they could make it simpler:

SELECT COUNT() FROM emp

And for LEFT JOINs, we could have this:

SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

But they cannot do that (COUNT(tableNameHere)) since SQL standard permits naming a field with the same name as its table name:

CREATE TABLE fruit -- ORM-friendly name
(
fruit_id int NOT NULL,
fruit varchar(50), /* same name as table name,
and let's say, someone forgot to put NOT NULL */
shape varchar(50) NOT NULL,
color varchar(50) NOT NULL
)

Counting with null

And also, it is not a good practice to make a field nullable if its name matches the table name. Say you have values 'Banana', 'Apple', NULL, 'Pears' on fruit field. This will not count all rows, it will only yield 3, not 4

SELECT count(fruit) FROM fruit

Though some RDBMS do that sort of principle (for counting the table's rows, it accepts table name as COUNT's parameter), this will work in Postgresql (if there is no subordinate field in any of the two tables below, i.e. as long as there is no name conflict between field name and table name):

SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

But that could cause confusion later if we will add a subordinate field in the table, as it will count the field(which could be nullable), not the table rows.

So to be on the safe side, use:

SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

count(1): The one-trick pony

In particular to COUNT(1), it is a one-trick pony, it works well only on one table query:

SELECT COUNT(1) FROM tbl

But when you use joins, that trick won't work on multi-table queries without its semantics being confused, and in particular you cannot write:

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

So what's the meaning of COUNT(1) here?

SELECT boss.boss_id, COUNT(1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Is it this...?

-- counting all the subordinates only
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Or this...?

-- or is that COUNT(1) will also count 1 for boss regardless if boss has a subordinate
SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

By careful thought, you can infer that COUNT(1) is the same as COUNT(*), regardless of type of join. But for LEFT JOINs result, we cannot mold COUNT(1) to work as: COUNT(subordinate.boss_id), COUNT(subordinate.*)

So just use either of the following:

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Works on Postgresql, it's clear that you want to count the cardinality of the set

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Another way to count the cardinality of the set, very English-like (just don't make a column with a name same as its table name) : http://www.sqlfiddle.com/#!1/98515/7

select boss.boss_name, count(subordinate)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name

You cannot do this: http://www.sqlfiddle.com/#!1/98515/8

select boss.boss_name, count(subordinate.1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name

You can do this, but this produces wrong result: http://www.sqlfiddle.com/#!1/98515/9

select boss.boss_name, count(1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name

Count(*) vs Count(1) - SQL Server

There is no difference.

Reason:

Books on-line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"

"1" is a non-null expression: so it's the same as COUNT(*).
The optimizer recognizes it for what it is: trivial.

The same as EXISTS (SELECT * ... or EXISTS (SELECT 1 ...

Example:

SELECT COUNT(1) FROM dbo.tab800krows
SELECT COUNT(1),FKID FROM dbo.tab800krows GROUP BY FKID

SELECT COUNT(*) FROM dbo.tab800krows
SELECT COUNT(*),FKID FROM dbo.tab800krows GROUP BY FKID

Same IO, same plan, the works

Edit, Aug 2011

Similar question on DBA.SE.

Edit, Dec 2011

COUNT(*) is mentioned specifically in ANSI-92 (look for "Scalar expressions 125")

Case:

a) If COUNT(*) is specified, then the result is the cardinality of T.

That is, the ANSI standard recognizes it as bleeding obvious what you mean. COUNT(1) has been optimized out by RDBMS vendors because of this superstition. Otherwise it would be evaluated as per ANSI

b) Otherwise, let TX be the single-column table that is the
result of applying the <value expression> to each row of T
and eliminating null values. If one or more null values are
eliminated, then a completion condition is raised: warning-

Difference between count(1) and count(*) in oracle

I believe count(1) used to be faster in older versions of Oracle. But by now, I'm pretty sure the optimizer is smart enough to know that count(*) and count(1) mean you want the number of rows and creates an appropriate execution plan.

Here you go:

create table t as select * from all_objects;

Table T created.

create index tindx on t( object_name );

Index TINDX created.

select count(*) from t;

COUNT(*)
----------
21534

select * from table(dbms_xplan.display_cursor( NULL, NULL, 'allstats last' ));

Plan hash value: 2940353011

--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 100 | 93 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 100 | 93 |
| 2 | INDEX FAST FULL SCAN| TINDX | 1 | 18459 | 21534 |00:00:00.01 | 100 | 93 |
--------------------------------------------------------------------------------------------------

select count(1) from t;

COUNT(1)
----------
21534

Plan hash value: 2940353011

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 100 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 100 |
| 2 | INDEX FAST FULL SCAN| TINDX | 1 | 18459 | 21534 |00:00:00.01 | 100 |
-----------------------------------------------------------------------------------------

So not only is it smart enough to know it can use the index to optimize this query, but it uses the exact same execution plan for the different versions (the plan has value is the same).

What is better in MYSQL count(*) or count(1)?

This is a MySQL answer.

They perform exactly the same - unless you are using MyISAM, then a special case for COUNT(*) exists. I always use COUNT(*) anyway.

https://dev.mysql.com/doc/refman/5.6/en/aggregate-functions.html#function_count

For MyISAM tables, COUNT(*) is optimized to return very quickly if the
SELECT retrieves from one table, no other columns are retrieved, and
there is no WHERE clause. For example:

mysql> SELECT COUNT(*) FROM student;

This optimization only applies to MyISAM
tables, because an exact row count is stored for this storage engine
and can be accessed very quickly. COUNT(1) is only subject to the same
optimization if the first column is defined as NOT NULL.


###EDIT
Some of you may have missed the dark attempt at humour. I prefer to keep this as a non-duplicate question for any such day when MySQL will do something different to SQL Server. So I threw a vote to reopen the question (with a clearly wrong answer).

The above MyISAM optimization applies equally to

COUNT(*)
COUNT(1)
COUNT(pk-column)
COUNT(any-non-nullable-column)

So the real answer is that they are always the same.

Is mysql count(*) much less efficient than count(specific_field)?

For InnoDB

If specific_field is not nullable, they are equivalent and have the same performance.

If specific_field is nullable, they don't do the same thing. COUNT(specific_field) counts the rows which have a not null value of specific_field. This requires looking at the value of specific_field for each row. COUNT(*) simply counts the number of rows and in this case can be faster as it does not require examining the value of specific_field.

For MyISAM

There is a special optimization for the following so that it does not even need to fetch all rows:

SELECT COUNT(*) FROM yourtable

whats faster, count(*) or count(table_field_name) in mysql?

The difference is Count(field) returns count of NOT NULL values in the field, whether COUNT(*) returns COUNT of rows.
COUNT(*) in MyIsam should be faster.

http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count



Related Topics



Leave a reply



Submit