Run a Query with a Limit/Offset and Also Get the Total Number of Rows

Run a query with a LIMIT/OFFSET and also get the total number of rows

Yes. With a simple window function:

SELECT *, count(*) OVER() AS full_count
FROM tbl
WHERE /* whatever */
ORDER BY col1
OFFSET ?
LIMIT ?

Be aware that the cost will be substantially higher than without the total number, but typically still cheaper than two separate queries. Postgres has to actually count all rows either way, which imposes a cost depending on the total number of qualifying rows. Details:

  • Best way to get result count before LIMIT was applied

However, as Dani pointed out, when OFFSET is at least as great as the number of rows returned from the base query, no rows are returned. So we also don't get full_count.

If that's not acceptable, a possible workaround to always return the full count would be with a CTE and an OUTER JOIN:

WITH cte AS (
SELECT *
FROM tbl
WHERE /* whatever */
)
SELECT *
FROM (
TABLE cte
ORDER BY col1
LIMIT ?
OFFSET ?
) sub
RIGHT JOIN (SELECT count(*) FROM cte) c(full_count) ON true;

You get one row of NULL values with the full_count appended if OFFSET is too big. Else, it's appended to every row like in the first query.

If a row with all NULL values is a possible valid result you have to check offset >= full_count to disambiguate the origin of the empty row.

This still executes the base query only once. But it adds more overhead to the query and only pays if that's less than repeating the base query for the count.

If indexes supporting the final sort order are available, it might pay to include the ORDER BY in the CTE (redundantly).

Best way to get result count before LIMIT was applied



Pure SQL

Things have changed since 2008. You can use a window function to get the full count and the limited result in one query. Introduced with PostgreSQL 8.4 in 2009.

SELECT foo
, count(*) OVER() AS full_count
FROM bar
WHERE <some condition>
ORDER BY <some col>
LIMIT <pagesize>
OFFSET <offset>;

Note that this can be considerably more expensive than without the total count. All rows have to be counted, and a possible shortcut taking just the top rows from a matching index may not be helpful any more.

Doesn't matter much with small tables or full_count <= OFFSET + LIMIT. Matters for a substantially bigger full_count.

Corner case: when OFFSET is at least as great as the number of rows from the base query, no row is returned. So you also get no full_count. Possible alternative:

  • Run a query with a LIMIT/OFFSET and also get the total number of rows

Sequence of events in a SELECT query

( 0. CTEs are evaluated and materialized separately. In Postgres 12 or later the planner may inline those like subqueries before going to work.) Not here.

  1. WHERE clause (and JOIN conditions, though none in your example) filter qualifying rows from the base table(s). The rest is based on the filtered subset.

( 2. GROUP BY and aggregate functions would go here.) Not here.

( 3. Other SELECT list expressions are evaluated, based on grouped / aggregated columns.) Not here.


  1. Window functions are applied depending on the OVER clause and the frame specification of the function. The simple count(*) OVER() is based on all qualifying rows.

  2. ORDER BY

( 6. DISTINCT or DISTINCT ON would go here.) Not here.


  1. LIMIT / OFFSET are applied based on the established order to select rows to return.

LIMIT / OFFSET becomes increasingly inefficient with a growing number of rows in the table. Consider alternative approaches if you need better performance:

  • Optimize query with OFFSET on large table

Alternatives to get final count

There are completely different approaches to get the count of affected rows (not the full count before OFFSET & LIMIT were applied). Postgres has internal bookkeeping how many rows where affected by the last SQL command. Some clients can access that information or count rows themselves (like psql).

For instance, you can retrieve the number of affected rows in plpgsql immediately after executing an SQL command with:

GET DIAGNOSTICS integer_var = ROW_COUNT;

Details in the manual.

Or you can use pg_num_rows in PHP. Or similar functions in other clients.

Related:

  • Calculate number of rows affected by batch query in PostgreSQL

Get paginated rows and total count in single query

First things first: you can use results from a CTE multiple times in the same query, that's a main feature of CTEs.) What you have would work like this (while still using the CTE once only):

WITH cte AS (
SELECT * FROM (
SELECT *, row_number() -- see below
OVER (PARTITION BY person_id
ORDER BY submission_date DESC NULLS LAST -- see below
, last_updated DESC NULLS LAST -- see below
, id DESC) AS rn
FROM tbl
) sub
WHERE rn = 1
AND status IN ('ACCEPTED', 'CORRECTED')
)
SELECT *, count(*) OVER () AS total_rows_in_cte
FROM cte
LIMIT 10
OFFSET 0; -- see below

Caveat 1: rank()

rank() can return multiple rows per person_id with rank = 1. DISTINCT ON (person_id) (like Gordon provided) is an applicable replacement for row_number() - which works for you, as additional info clarified. See:

  • Select first row in each GROUP BY group?

Caveat 2: ORDER BY submission_date DESC

Neither submission_date nor last_updated are defined NOT NULL. Can be an issue with ORDER BY submission_date DESC, last_updated DESC ... See:

  • PostgreSQL sort by datetime asc, null first?

Should those columns really be NOT NULL?

You replied:

Yes, all those columns should be non-null. I can add that constraint. I put it as nullable since we get data in files which are not always perfect. But this is very rare condition and I can put in empty string instead.

Empty strings are not allowed for type date. Keep the columns nullable. NULL is the proper value for those cases. Use NULLS LAST as demonstrated to avoid NULL being sorted on top.

Caveat 3: OFFSET

If OFFSET is equal or greater than the number of rows returned by the CTE, you get no row, so also no total count. See:

  • Run a query with a LIMIT/OFFSET and also get the total number of rows

Interim solution

Addressing all caveats so far, and based on added information, we might arrive at this query:

WITH cte AS (
SELECT DISTINCT ON (person_id) *
FROM tbl
WHERE status IN ('ACCEPTED', 'CORRECTED')
ORDER BY person_id, submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC
)
SELECT *
FROM (
TABLE cte
ORDER BY person_id -- ?? see below
LIMIT 10
OFFSET 0
) sub
RIGHT JOIN (SELECT count(*) FROM cte) c(total_rows_in_cte) ON true;

Now the CTE is actually used twice. The RIGHT JOIN guarantees we get the total count, no matter the OFFSET. DISTINCT ON should perform OK-ish for the only few rows per (person_id) in the base query.

But you have wide rows. How wide on average? The query will likely result in a sequential scan on the whole table. Indexes won't help (much). All of this will remain hugely inefficient for paging. See:

  • Optimize query with OFFSET on large table

You cannot involve an index for paging as that is based on the derived table from the CTE. And your actual sort criteria for paging is still unclear (ORDER BY id ?). If paging is the goal, you desperately need a different query style. If you are only interested in the first few pages, you need a different query style, yet. The best solution depends on information still missing in the question ...

Radically faster

For your updated objective:

Find latest entries for a person_id by submission_date

(Ignoring "for specified filter criteria, type, plan, status" for simplicity.)

And:

Find the latest row per person_id only if that has status IN ('ACCEPTED','CORRECTED')

Based on these two specialized indices:

CREATE INDEX ON tbl (submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST)
WHERE status IN ('ACCEPTED', 'CORRECTED'); -- optional

CREATE INDEX ON tbl (person_id, submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST);

Run this query:

WITH RECURSIVE cte AS (
(
SELECT t -- whole row
FROM tbl t
WHERE status IN ('ACCEPTED', 'CORRECTED')
AND NOT EXISTS (SELECT FROM tbl
WHERE person_id = t.person_id
AND ( submission_date, last_updated, id)
> (t.submission_date, t.last_updated, t.id) -- row-wise comparison
)
ORDER BY submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST
LIMIT 1
)

UNION ALL
SELECT (SELECT t1 -- whole row
FROM tbl t1
WHERE ( t1.submission_date, t1.last_updated, t1.id)
< ((t).submission_date,(t).last_updated,(t).id) -- row-wise comparison
AND t1.status IN ('ACCEPTED', 'CORRECTED')
AND NOT EXISTS (SELECT FROM tbl
WHERE person_id = t1.person_id
AND ( submission_date, last_updated, id)
> (t1.submission_date, t1.last_updated, t1.id) -- row-wise comparison
)
ORDER BY submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST
LIMIT 1)
FROM cte c
WHERE (t).id IS NOT NULL
)
SELECT (t).*
FROM cte
LIMIT 10
OFFSET 0;

Every set of parentheses here is required.

This level of sophistication should retrieve a relatively small set of top rows radically faster by using the given indices and no sequential scan. See:

  • Optimize GROUP BY query to retrieve latest row per user

submission_date should most probably be type timestamptz or date, not character varying(255) - which is an odd type definition in Postgres in any case. See:

  • Refactor foreign key to fields

Many more details might be optimized, but this is getting out of hands. You might consider professional consulting.

Is this possible to get total number of rows count with offset limit

You can use SQL_CALC_FOUND_ROWS like this

SELECT SQL_CALC_FOUND_ROWS * FROM users limit 0,5;

It gets the row count before applying any LIMIT clause. It does need another query to fetch the results but that query can simply be

SELECT FOUND_ROWS()

and hence you don't have to repeat your complicated query.

How to get the number of total results when there is LIMIT in query?

Add a column, total, for example:

select t.*
, (select count(*) from tbl where col = t.col) as total
from tbl t
where t.col = 'anything'
limit 5

As stated by @Tim Biegeleisen: limit keyword is applied after everything else, so the count(*) still returns the right answer.

Total row count + select with limit

What about:

WITH  a AS (select *, count(*) over (range unbounded preceding)
FROM resgroups)
SELECT * from a order by foo limit 10 offset 10;

Now, I think you are actually better off to break this into two queries though because it looks like you are doing paging, effectively. If you select the count(*) first, and then decide how many pages you need (and maybe cache that result) then your subsequent partial queries can use an index, but in this case, every group of 10 will require a full sequential scan.

Get total number of rows while using limit clause

SQLite computes results on the fly when they are actually needed.

The only way to get the total count is to run the actual query (or better, SELECT COUNT(*)) without the LIMIT.

Get first row of all pages while also paginating (LIMIT / OFFSET)

If you want one query that returns the nth row, you can use row_number() and modulo arithmetic:

select w.*
from (select w.*, row_number() over (order by word) as seqnum
from words w
) w
where w.seqnum % 50 = 1;

You may be able to do this using just id, if it is sequential, gapless, starts with 1, and represents the ordering of the words.



Related Topics



Leave a reply



Submit