How to Find the Row Count for All Your Tables in Postgres

How do you find the row count for all your tables in Postgres

There's three ways to get this sort of count, each with their own tradeoffs.

If you want a true count, you have to execute the SELECT statement like the one you used against each table. This is because PostgreSQL keeps row visibility information in the row itself, not anywhere else, so any accurate count can only be relative to some transaction. You're getting a count of what that transaction sees at the point in time when it executes. You could automate this to run against every table in the database, but you probably don't need that level of accuracy or want to wait that long.

WITH tbl AS
(SELECT table_schema,
TABLE_NAME
FROM information_schema.tables
WHERE TABLE_NAME not like 'pg_%'
AND table_schema in ('public'))
SELECT table_schema,
TABLE_NAME,
(xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int AS rows_n
FROM tbl
ORDER BY rows_n DESC;

The second approach notes that the statistics collector tracks roughly how many rows are "live" (not deleted or obsoleted by later updates) at any time. This value can be off by a bit under heavy activity, but is generally a good estimate:

SELECT schemaname,relname,n_live_tup 
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;

That can also show you how many rows are dead, which is itself an interesting number to monitor.

The third way is to note that the system ANALYZE command, which is executed by the autovacuum process regularly as of PostgreSQL 8.3 to update table statistics, also computes a row estimate. You can grab that one like this:

SELECT 
nspname AS schemaname,relname,reltuples
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
nspname NOT IN ('pg_catalog', 'information_schema') AND
relkind='r'
ORDER BY reltuples DESC;

Which of these queries is better to use is hard to say. Normally I make that decision based on whether there's more useful information I also want to use inside of pg_class or inside of pg_stat_user_tables. For basic counting purposes just to see how big things are in general, either should be accurate enough.

Fast way to discover the row count of a table in PostgreSQL

Counting rows in big tables is known to be slow in PostgreSQL. The MVCC model requires a full count of live rows for a precise number. There are workarounds to speed this up dramatically if the count does not have to be exact like it seems to be in your case.

(Remember that even an "exact" count is potentially dead on arrival under concurrent write load.)

Exact count

Slow for big tables.

With concurrent write operations, it may be outdated the moment you get it.

SELECT count(*) AS exact_count FROM myschema.mytable;
Estimate

Extremely fast:

SELECT reltuples AS estimate FROM pg_class where relname = 'mytable';

Typically, the estimate is very close. How close, depends on whether ANALYZE or VACUUM are run enough - where "enough" is defined by the level of write activity to your table.

Safer estimate

The above ignores the possibility of multiple tables with the same name in one database - in different schemas. To account for that:

SELECT c.reltuples::bigint AS estimate
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = 'mytable'
AND n.nspname = 'myschema';

The cast to bigint formats the real number nicely, especially for big counts.

Better estimate

SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE oid = 'myschema.mytable'::regclass;

Faster, simpler, safer, more elegant. See the manual on Object Identifier Types.

Replace 'myschema.mytable'::regclass with to_regclass('myschema.mytable') in Postgres 9.4+ to get nothing instead of an exception for invalid table names. See:

  • How to check if a table exists in a given schema

Better estimate yet (for very little added cost)

We can do what the Postgres planner does. Quoting the Row Estimation Examples in the manual:

These numbers are current as of the last VACUUM or ANALYZE on the
table. The planner then fetches the actual current number of pages in
the table (this is a cheap operation, not requiring a table scan). If
that is different from relpages then reltuples is scaled
accordingly to arrive at a current number-of-rows estimate.

Postgres uses estimate_rel_size defined in src/backend/utils/adt/plancat.c, which also covers the corner case of no data in pg_class because the relation was never vacuumed. We can do something similar in SQL:

Minimal form

SELECT (reltuples / relpages * (pg_relation_size(oid) / 8192))::bigint
FROM pg_class
WHERE oid = 'mytable'::regclass; -- your table here

Safe and explicit

SELECT (CASE WHEN c.reltuples < 0 THEN NULL       -- never vacuumed
WHEN c.relpages = 0 THEN float8 '0' -- empty table
ELSE c.reltuples / c.relpages END
* (pg_catalog.pg_relation_size(c.oid)
/ pg_catalog.current_setting('block_size')::int)
)::bigint
FROM pg_catalog.pg_class c
WHERE c.oid = 'myschema.mytable'::regclass; -- schema-qualified table here

Doesn't break with empty tables and tables that have never seen VACUUM or ANALYZE. The manual on pg_class:

If the table has never yet been vacuumed or analyzed, reltuples contains -1 indicating that the row count is unknown.

If this query returns NULL, run ANALYZE or VACUUM for the table and repeat. (Alternatively, you could estimate row width based on column types like Postgres does, but that's tedious and error-prone.)

If this query returns 0, the table seems to be empty. But I would ANALYZE to make sure. (And maybe check your autovacuum settings.)

Typically, block_size is 8192. current_setting('block_size')::int covers rare exceptions.

Table and schema qualifications make it immune to any search_path and scope.

Either way, the query consistently takes < 0.1 ms for me.

More Web resources:

  • The Postgres Wiki FAQ
  • The Postgres wiki pages for count estimates and count(*) performance


TABLESAMPLE SYSTEM (n) in Postgres 9.5+

SELECT 100 * count(*) AS estimate FROM mytable TABLESAMPLE SYSTEM (1);

Like @a_horse commented, the added clause for the SELECT command can be useful if statistics in pg_class are not current enough for some reason. For example:

  • No autovacuum running.
  • Immediately after a large INSERT / UPDATE / DELETE.
  • TEMPORARY tables (which are not covered by autovacuum).

This only looks at a random n % (1 in the example) selection of blocks and counts rows in it. A bigger sample increases the cost and reduces the error, your pick. Accuracy depends on more factors:

  • Distribution of row size. If a given block happens to hold wider than usual rows, the count is lower than usual etc.
  • Dead tuples or a FILLFACTOR occupy space per block. If unevenly distributed across the table, the estimate may be off.
  • General rounding errors.

Typically, the estimate from pg_class will be faster and more accurate.

Answer to actual question

First, I need to know the number of rows in that table, if the total
count is greater than some predefined constant,

And whether it ...

... is possible at the moment the count pass my constant value, it will
stop the counting (and not wait to finish the counting to inform the
row count is greater).

Yes. You can use a subquery with LIMIT:

SELECT count(*) FROM (SELECT 1 FROM token LIMIT 500000) t;

Postgres actually stops counting beyond the given limit, you get an exact and current count for up to n rows (500000 in the example), and n otherwise. Not nearly as fast as the estimate in pg_class, though.

Get a row count of every table in Postgres database

if you want a perticular table's rowcount then it will work

SELECT reltuples FROM pg_class WHERE oid = 'my_schema.my_table'::regclass;

reltuples is a column from pg_class table, it holds data about "number of rows >in the table. This is only an estimate used by the planner.

and if your want a list of all tables with its rowcount then it will do the job

SELECT
pgClass.relname AS tableName,
pgClass.reltuples AS rowCount
FROM
pg_class pgClass
INNER JOIN
pg_namespace pgNamespace ON (pgNamespace.oid = pgClass.relnamespace)
WHERE
pgNamespace.nspname NOT IN ('pg_catalog', 'information_schema') AND
pgClass.relkind='r'

"Why is "SELECT count(*) FROM bigtable;" slow?" : count(*)

how to get the table row count of all the tables present in particular schema in postgresql 9.5?

This can be done with some XML magic:

select table_schema, table_name,
(xpath('/row/count/text()', query_to_xml('select count(*) from '||format('%I.%I', table_schema, table_name), true, true, '')))[1]::text::int as row_count
from information_schema.tables
where table_schema = 'public'

How to count all rows and column of the tables in the database?

You need to join the result of the query to information_schema.columns and aggregate the columns into a string. It's easier to do the aggregation in a derived table (aka "sub-query"):

select t.table_schema, 
t.table_name,
(xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count,
c.columns
from (
select table_name, table_schema,
query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
from information_schema.tables
where table_schema = 'public' --<< change here for the schema you want
) t
join (
select col.table_schema, col.table_name,
string_agg(col.column_name::text, ',' order by col.ordinal_position) as columns
from information_schema.columns col
group by col.table_schema, col.table_name
) c on t.table_schema = c.table_schema
and t.table_name = c.table_name;

Calculate difference between the row counts of tables in two schemas in PostgreSQL

with counts as (
select
(select count(distinct id) from schema1.compound) as count_old,
(select count(distinct id) from schema2.compound) as count_new
)
select
'compound' as table_name,
count_old,
count_new,
count_old - count_new as diff
from counts;


Related Topics



Leave a reply



Submit