MySQL Count Performance

Mysql count performance on very big tables

Finally the fastest was to query the first X rows using C# and counting the rows number.

My application is treating the data in batches. The amount of time between two batches are depending the number of rows who need to be treated

SELECT pk FROM table WHERE fk = 1 LIMIT X

I got the result in 0.9 seconds.

Thanks all for your ideas!

MySQL: Fastest way to count number of rows

When you COUNT(*) it takes in count column indexes, so it will be the best result. MySQL with MyISAM engine actually stores row count, it doesn't count all rows each time you try to count all rows. (based on primary key's column)

Using PHP to count rows is not very smart, because you have to send data from MySQL to PHP. Why do it when you can achieve the same on the MySQL side?

If the COUNT(*) is slow, you should run EXPLAIN on the query, and check if indexes are really used, and where they should be added.

The following is not the fastest way, but there is a case, where COUNT(*) doesn't really fit - when you start grouping results, you can run into a problem where COUNT doesn't really count all rows.

The solution is SQL_CALC_FOUND_ROWS. This is usually used when you are selecting rows but still need to know the total row count (for example, for paging).
When you select data rows, just append the SQL_CALC_FOUND_ROWS keyword after SELECT:

SELECT SQL_CALC_FOUND_ROWS [needed fields or *] FROM table LIMIT 20 OFFSET 0;

After you have selected needed rows, you can get the count with this single query:

SELECT FOUND_ROWS();

FOUND_ROWS() has to be called immediately after the data selecting query.

In conclusion, everything actually comes down to how many entries you have and what is in the WHERE statement. You should really pay attention on how indexes are being used, when there are lots of rows (tens of thousands, millions, and up).

Mysql count performance on very big rows between two dates conditions

I increase the performance to take less than 2 seconds by applying MySQL partiotons.

I used partition by range using the viewed_at column. change viewed_at type from timestamp to datatime and made it as primary key with id.
make a cronjob runs on first day of each month that reorganize last partition into another partitions and so on.

Improve InnoDB count(*) performance

Difference between innodb and myisam concerning counting

Please notice that counting with WHERE is not slower with InnoDB than it would be with MyISAM. Only a very bare

SELECT COUNT(*) FROM table

can be computed faster with MyISAM as this number is stored in MyISAMs table metadata.

If you have a query with WHERE constraint for example:

SELECT COUNT(*) FROM table WHERE active_calls = 1

the query needs to access the table data in both storage engines and there should be no notable performance difference between MyISAM and InnoDB.

Concerning your specific problem

Please see that your query does not use any proper index. This is not because InnoDB "prefers" a full table scan, but because there exists no proper index.

You have a combined index (campaign_id, active_calls), but active_calls is the second part of the index. As long as the first part is not used in the query, MySQL has no easy access to the second part.

What you want for this simple count query is another index (active_calls) only on this one column. It should run fast then.

How to optimize COUNT(*) performance on InnoDB by using index

For the time being I've solved the problem by using this approximation:

EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)

The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.

Are full count queries really so slow on a large MySQL InnoDB tables?

In addition to what Bill says...

Smallest index

InnoDB picks the 'smallest' index for doing COUNT(*). It could be that all of the indexes of communication are bigger than the smallest of transaction, hence the time difference. When judging the size of an index, include the PRIMARY KEY column(s) with any secondary index:

PRIMARY KEY(id),   -- INT (4 bytes)
INDEX(flag),       -- TINYINT (1 byte)
INDEX(name),       -- VARCHAR(255) (? bytes)

For measuring size, the PRIMARY KEY has big since it includes (due to clustering) all the columns of the table. INDEX(flag) is "5 bytes". INDEX(name) probably averages a few dozen bytes. SELECT COUNT(*) will clearly pick INDEX(flag).

Apparently transaction has a 'small' index, but communication does not.

TEXT/BLOG columns are sometimes stored "off-record". Hence, they do not count in the size of the PK index.

Query Cache

If the "Query cache" is turned on, the second running of a query may be immensely faster than the first. But that is only if there were no changes to the table in the mean time. Since any change to the table invalidates all QC entries for that table, the QC is rarely useful in production systems. By "faster" I mean on the order of 0.001 seconds; not 1.44 seconds.

The difference between 1m38s and 1.44s is probably due to what was cached in the buffer_pool -- the general caching area for InnoDB. The first run probably found none of the 'smallest' index in RAM so it did a lot of I/O, taking 98 seconds to fetch all 4.5M rows of that index. The second run found all that data cached in the buffer_pool, so it ran at CPU speed (no I/O), hence much faster.

Good Enough

In situations like this, I question the necessity of doing the COUNT(*) at all. Notice how you said "2.8 mio entries", as if 2 significant digits was "good enough". If you are displaying the count to users on a UI, won't that be "good enough"? If so, one solution to the performance is to do the count once a day and store it some place. This would allow instantaneous access to a "good enough" value.

There are other techniques. One is to keep the counter updated, either with active code, or with some form of Summary Table.

Throwing hardware at it

You already found that changing the hardware did not help.

The 98s was as fast as any of RDS's I/O offerings can run.
The 1.44s was as fast as any one RDS CPU can run.
MySQL (and its variants) do not use more than one CPU per query.
You had enough RAM so the entire 'small' index would fit in the buffer_pool until your second SELECT COUNT(*).. (Too little RAM would have led the second running to be very slow.)

Performance of mysql counting rows in a big table

It can indeed be slow when running on an InnoDB engine. As stated in section 14.24 of the MySQL 5.7 Reference Manual, “InnoDB Restrictions and Limitations”, 3rd bullet point:

InnoDB InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

For information about how InnoDB processes SELECT COUNT(*) statements, refer to the COUNT() description in Section 12.20.1, “Aggregate Function Descriptions”.

The suggested solution is a counter table. This is a separate table with one row and column, having the current record count. It could be kept updated via triggers. Something like this:

create table big_table_count (rec_count int default 0);
-- one-shot initialisation:
insert into big_table_count select count(*) from big_table;

create trigger big_insert after insert on big_table
    for each row
    update big_table_count set rec_count = rec_count + 1;

create trigger big_delete after delete on big_table
    for each row
    update big_table_count set rec_count = rec_count - 1;

You can see here a fiddle, where you should alter the insert/delete statements in the build section to see the effect on:

select rec_count from big_table_count;

You could extend this for several tables, either by creating such a table for each, or to reserve a row per table in the above counter table. It would then be keyed by a column "table_name".

Improving concurrency

The above method does have an impact if you have many concurrent sessions inserting or deleting records, because they need to wait for each other to complete the update of the counter.

A solution is to not let the triggers update the same, single record, but to let them insert a new record, like this:

create trigger big_insert after insert on big_table
    for each row
    insert into big_table_count (rec_count) values (1);

create trigger big_delete after delete on big_table
    for each row
    insert into big_table_count (rec_count) values (-1);

The way to get the count then becomes:

select sum(rec_count) from big_table_count;

Then, once in a while (e.g. daily) you should re-initialise the counter table to keep it small:

truncate table big_table_count;
insert into big_table_count select count(*) from big_table;

Select count vs select statement performance when the query will return 0 rows

You are correct. [But read all the way to the end!]

Two steps is inefficient -- in any version of MySQL. If there are zero rows, the Optimizer will do essentially the same amount of work for either SELECT.

If there are some rows, then the first SELECT is a waste of time. See the programming acronym 'KISS'.

The pseudo-code implies that the 'return' is an empty array; I am assuming that is the case for either query when there is no matching row?

A side note: I hope ColumnInIndex means "a column that is first in some index". If the column is not first, neither query will use the index. (This comment does not affect the main Question.)

Another side note (aimed at PTank): When there are no matching rows, neither of these

SELECT col FROM ... WHERE ...
SELECT  *  FROM ... WHERE ...

return any rows; they do not return NULL for column(s). (The use of * is 'bad' for multiple reasons unrelated to the original Question.)

Oh, yet another comment. COUNT(col) checks col for being not-NULL before counting the row. COUNT(*) simply counts the rows. In almost all cases, you should use COUNT(*); it's simpler, faster, and probably gives the same answer. (The answer will be the same if col is the PRIMARY KEY.)

Good grief!! That means that your first query is not redundant! The odd-ball case is when col1 is NULL in every row. The first query will return a count of zero, but the second query will return some rows (with the first column NULL).

Was this a trick question?