Best Way to Get Result Count Before Limit Was Applied

Best way to get result count before LIMIT was applied


Pure SQL

Things have changed since 2008. You can use a window function to get the full count and the limited result in one query. Introduced with PostgreSQL 8.4 in 2009.

SELECT foo
, count(*) OVER() AS full_count
FROM bar
WHERE <some condition>
ORDER BY <some col>
LIMIT <pagesize>
OFFSET <offset>;

Note that this can be considerably more expensive than without the total count. All rows have to be counted, and a possible shortcut taking just the top rows from a matching index may not be helpful any more.

Doesn't matter much with small tables or full_count <= OFFSET + LIMIT. Matters for a substantially bigger full_count.

Corner case: when OFFSET is at least as great as the number of rows from the base query, no row is returned. So you also get no full_count. Possible alternative:

  • Run a query with a LIMIT/OFFSET and also get the total number of rows

Sequence of events in a SELECT query

( 0. CTEs are evaluated and materialized separately. In Postgres 12 or later the planner may inline those like subqueries before going to work.) Not here.

  1. WHERE clause (and JOIN conditions, though none in your example) filter qualifying rows from the base table(s). The rest is based on the filtered subset.

( 2. GROUP BY and aggregate functions would go here.) Not here.

( 3. Other SELECT list expressions are evaluated, based on grouped / aggregated columns.) Not here.


  1. Window functions are applied depending on the OVER clause and the frame specification of the function. The simple count(*) OVER() is based on all qualifying rows.

  2. ORDER BY

( 6. DISTINCT or DISTINCT ON would go here.) Not here.


  1. LIMIT / OFFSET are applied based on the established order to select rows to return.

LIMIT / OFFSET becomes increasingly inefficient with a growing number of rows in the table. Consider alternative approaches if you need better performance:

  • Optimize query with OFFSET on large table

Alternatives to get final count

There are completely different approaches to get the count of affected rows (not the full count before OFFSET & LIMIT were applied). Postgres has internal bookkeeping how many rows where affected by the last SQL command. Some clients can access that information or count rows themselves (like psql).

For instance, you can retrieve the number of affected rows in plpgsql immediately after executing an SQL command with:

GET DIAGNOSTICS integer_var = ROW_COUNT;

Details in the manual.

Or you can use pg_num_rows in PHP. Or similar functions in other clients.

Related:

  • Calculate number of rows affected by batch query in PostgreSQL

Run a query with a LIMIT/OFFSET and also get the total number of rows

Yes. With a simple window function:

SELECT *, count(*) OVER() AS full_count
FROM tbl
WHERE /* whatever */
ORDER BY col1
OFFSET ?
LIMIT ?

Be aware that the cost will be substantially higher than without the total number, but typically still cheaper than two separate queries. Postgres has to actually count all rows either way, which imposes a cost depending on the total number of qualifying rows. Details:

  • Best way to get result count before LIMIT was applied

However, as Dani pointed out, when OFFSET is at least as great as the number of rows returned from the base query, no rows are returned. So we also don't get full_count.

If that's not acceptable, a possible workaround to always return the full count would be with a CTE and an OUTER JOIN:

WITH cte AS (
SELECT *
FROM tbl
WHERE /* whatever */
)
SELECT *
FROM (
TABLE cte
ORDER BY col1
LIMIT ?
OFFSET ?
) sub
RIGHT JOIN (SELECT count(*) FROM cte) c(full_count) ON true;

You get one row of NULL values with the full_count appended if OFFSET is too big. Else, it's appended to every row like in the first query.

If a row with all NULL values is a possible valid result you have to check offset >= full_count to disambiguate the origin of the empty row.

This still executes the base query only once. But it adds more overhead to the query and only pays if that's less than repeating the base query for the count.

If indexes supporting the final sort order are available, it might pay to include the ORDER BY in the CTE (redundantly).

How to get the number of total results when there is LIMIT in query?

Add a column, total, for example:

select t.*
, (select count(*) from tbl where col = t.col) as total
from tbl t
where t.col = 'anything'
limit 5

As stated by @Tim Biegeleisen: limit keyword is applied after everything else, so the count(*) still returns the right answer.

Return count before limit and skip applied with mongoose

I believe two queries are necessary, you can execute them in parallel:

var Promise = require('bluebird');
Promise.all([
Item.find().limit(limit).skip(skip).exec(),
Item.count().exec()
]).spread(function(items, count) {
res.json(200, { items: items, count: count });
}, function(err) {
handleError(res, err);
});

Note that I've adapted the JSON response format in order to fit in a new property containing the count -- I've changed the response from an array to { items: Array, count: Number }.

Calling .exec() without passing a callback returns a Mongoose Promise. In the example above, I'm using Bluebird (npm install bluebird) to manage the asynchronous flow, but you could use any other library of your choice as well.

Postgresql get the distinct count when using limit and offset

Use COUNT() window function after you filter the table student_teacher for teacherId = 2 and then join to students.

SELECT s.*, st.total_count  
FROM student s
JOIN (
SELECT *, COUNT(*) OVER() AS total_count
FROM student_teacher
WHERE teacherId = 2
) st ON st.studentId = s.id
LIMIT 10 OFFSET 0;

There is no need to join teachers.

Get total number of rows while using limit clause

SQLite computes results on the fly when they are actually needed.

The only way to get the total count is to run the actual query (or better, SELECT COUNT(*)) without the LIMIT.



Related Topics



Leave a reply



Submit