Run a query with a LIMIT/OFFSET and also get the total number of rows
Yes. With a simple window function:
SELECT *, count(*) OVER() AS full_count
FROM tbl
WHERE /* whatever */
ORDER BY col1
OFFSET ?
LIMIT ?
Be aware that the cost will be substantially higher than without the total number, but typically still cheaper than two separate queries. Postgres has to actually count all rows either way, which imposes a cost depending on the total number of qualifying rows. Details:
- Best way to get result count before LIMIT was applied
However, as Dani pointed out, when OFFSET
is at least as great as the number of rows returned from the base query, no rows are returned. So we also don't get full_count
.
If that's not acceptable, a possible workaround to always return the full count would be with a CTE and an OUTER JOIN
:
WITH cte AS (
SELECT *
FROM tbl
WHERE /* whatever */
)
SELECT *
FROM (
TABLE cte
ORDER BY col1
LIMIT ?
OFFSET ?
) sub
RIGHT JOIN (SELECT count(*) FROM cte) c(full_count) ON true;
You get one row of NULL values with the full_count
appended if OFFSET
is too big. Else, it's appended to every row like in the first query.
If a row with all NULL values is a possible valid result you have to check offset >= full_count
to disambiguate the origin of the empty row.
This still executes the base query only once. But it adds more overhead to the query and only pays if that's less than repeating the base query for the count.
If indexes supporting the final sort order are available, it might pay to include the ORDER BY
in the CTE (redundantly).
Best way to get result count before LIMIT was applied
Pure SQL
Things have changed since 2008. You can use a window function to get the full count and the limited result in one query. Introduced with PostgreSQL 8.4 in 2009.
SELECT foo
, count(*) OVER() AS full_count
FROM bar
WHERE <some condition>
ORDER BY <some col>
LIMIT <pagesize>
OFFSET <offset>;
Note that this can be considerably more expensive than without the total count. All rows have to be counted, and a possible shortcut taking just the top rows from a matching index may not be helpful any more.
Doesn't matter much with small tables or full_count
<= OFFSET
+ LIMIT
. Matters for a substantially bigger full_count
.
Corner case: when OFFSET
is at least as great as the number of rows from the base query, no row is returned. So you also get no full_count
. Possible alternative:
- Run a query with a LIMIT/OFFSET and also get the total number of rows
Sequence of events in a SELECT
query
( 0. CTEs are evaluated and materialized separately. In Postgres 12 or later the planner may inline those like subqueries before going to work.) Not here.
WHERE
clause (andJOIN
conditions, though none in your example) filter qualifying rows from the base table(s). The rest is based on the filtered subset.
( 2. GROUP BY
and aggregate functions would go here.) Not here.
( 3. Other SELECT
list expressions are evaluated, based on grouped / aggregated columns.) Not here.
Window functions are applied depending on the
OVER
clause and the frame specification of the function. The simplecount(*) OVER()
is based on all qualifying rows.ORDER BY
( 6. DISTINCT
or DISTINCT ON
would go here.) Not here.
LIMIT
/OFFSET
are applied based on the established order to select rows to return.
LIMIT
/ OFFSET
becomes increasingly inefficient with a growing number of rows in the table. Consider alternative approaches if you need better performance:
- Optimize query with OFFSET on large table
Alternatives to get final count
There are completely different approaches to get the count of affected rows (not the full count before OFFSET
& LIMIT
were applied). Postgres has internal bookkeeping how many rows where affected by the last SQL command. Some clients can access that information or count rows themselves (like psql).
For instance, you can retrieve the number of affected rows in plpgsql immediately after executing an SQL command with:
GET DIAGNOSTICS integer_var = ROW_COUNT;
Details in the manual.
Or you can use pg_num_rows
in PHP. Or similar functions in other clients.
Related:
- Calculate number of rows affected by batch query in PostgreSQL
Get paginated rows and total count in single query
First things first: you can use results from a CTE multiple times in the same query, that's a main feature of CTEs.) What you have would work like this (while still using the CTE once only):
WITH cte AS (
SELECT * FROM (
SELECT *, row_number() -- see below
OVER (PARTITION BY person_id
ORDER BY submission_date DESC NULLS LAST -- see below
, last_updated DESC NULLS LAST -- see below
, id DESC) AS rn
FROM tbl
) sub
WHERE rn = 1
AND status IN ('ACCEPTED', 'CORRECTED')
)
SELECT *, count(*) OVER () AS total_rows_in_cte
FROM cte
LIMIT 10
OFFSET 0; -- see below
Caveat 1: rank()
rank()
can return multiple rows per person_id
with rank = 1
. DISTINCT ON (person_id)
(like Gordon provided) is an applicable replacement for row_number()
- which works for you, as additional info clarified. See:
- Select first row in each GROUP BY group?
Caveat 2: ORDER BY submission_date DESC
Neither submission_date
nor last_updated
are defined NOT NULL
. Can be an issue with ORDER BY submission_date DESC, last_updated DESC ...
See:
- PostgreSQL sort by datetime asc, null first?
Should those columns really be NOT NULL
?
You replied:
Yes, all those columns should be non-null. I can add that constraint. I put it as nullable since we get data in files which are not always perfect. But this is very rare condition and I can put in empty string instead.
Empty strings are not allowed for type date
. Keep the columns nullable. NULL
is the proper value for those cases. Use NULLS LAST
as demonstrated to avoid NULL
being sorted on top.
Caveat 3: OFFSET
If OFFSET
is equal or greater than the number of rows returned by the CTE, you get no row, so also no total count. See:
- Run a query with a LIMIT/OFFSET and also get the total number of rows
Interim solution
Addressing all caveats so far, and based on added information, we might arrive at this query:
WITH cte AS (
SELECT DISTINCT ON (person_id) *
FROM tbl
WHERE status IN ('ACCEPTED', 'CORRECTED')
ORDER BY person_id, submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC
)
SELECT *
FROM (
TABLE cte
ORDER BY person_id -- ?? see below
LIMIT 10
OFFSET 0
) sub
RIGHT JOIN (SELECT count(*) FROM cte) c(total_rows_in_cte) ON true;
Now the CTE is actually used twice. The RIGHT JOIN
guarantees we get the total count, no matter the OFFSET
. DISTINCT ON
should perform OK-ish for the only few rows per (person_id)
in the base query.
But you have wide rows. How wide on average? The query will likely result in a sequential scan on the whole table. Indexes won't help (much). All of this will remain hugely inefficient for paging. See:
- Optimize query with OFFSET on large table
You cannot involve an index for paging as that is based on the derived table from the CTE. And your actual sort criteria for paging is still unclear (ORDER BY id
?). If paging is the goal, you desperately need a different query style. If you are only interested in the first few pages, you need a different query style, yet. The best solution depends on information still missing in the question ...
Radically faster
For your updated objective:
Find latest entries for a
person_id
bysubmission_date
(Ignoring "for specified filter criteria, type, plan, status" for simplicity.)
And:
Find the latest row per
person_id
only if that hasstatus IN ('ACCEPTED','CORRECTED')
Based on these two specialized indices:
CREATE INDEX ON tbl (submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST)
WHERE status IN ('ACCEPTED', 'CORRECTED'); -- optional
CREATE INDEX ON tbl (person_id, submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST);
Run this query:
WITH RECURSIVE cte AS (
(
SELECT t -- whole row
FROM tbl t
WHERE status IN ('ACCEPTED', 'CORRECTED')
AND NOT EXISTS (SELECT FROM tbl
WHERE person_id = t.person_id
AND ( submission_date, last_updated, id)
> (t.submission_date, t.last_updated, t.id) -- row-wise comparison
)
ORDER BY submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST
LIMIT 1
)
UNION ALL
SELECT (SELECT t1 -- whole row
FROM tbl t1
WHERE ( t1.submission_date, t1.last_updated, t1.id)
< ((t).submission_date,(t).last_updated,(t).id) -- row-wise comparison
AND t1.status IN ('ACCEPTED', 'CORRECTED')
AND NOT EXISTS (SELECT FROM tbl
WHERE person_id = t1.person_id
AND ( submission_date, last_updated, id)
> (t1.submission_date, t1.last_updated, t1.id) -- row-wise comparison
)
ORDER BY submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST
LIMIT 1)
FROM cte c
WHERE (t).id IS NOT NULL
)
SELECT (t).*
FROM cte
LIMIT 10
OFFSET 0;
Every set of parentheses here is required.
This level of sophistication should retrieve a relatively small set of top rows radically faster by using the given indices and no sequential scan. See:
- Optimize GROUP BY query to retrieve latest row per user
submission_date
should most probably be type timestamptz
or date
, not - which is an odd type definition in Postgres in any case. See:character varying(255)
- Refactor foreign key to fields
Many more details might be optimized, but this is getting out of hands. You might consider professional consulting.
Is this possible to get total number of rows count with offset limit
You can use SQL_CALC_FOUND_ROWS like this
SELECT SQL_CALC_FOUND_ROWS * FROM users limit 0,5;
It gets the row count before applying any LIMIT clause. It does need another query to fetch the results but that query can simply be
SELECT FOUND_ROWS()
and hence you don't have to repeat your complicated query.
How to get the number of total results when there is LIMIT in query?
Add a column, total
, for example:
select t.*
, (select count(*) from tbl where col = t.col) as total
from tbl t
where t.col = 'anything'
limit 5
As stated by @Tim Biegeleisen: limit
keyword is applied after everything else, so the count(*)
still returns the right answer.
Total row count + select with limit
What about:
WITH a AS (select *, count(*) over (range unbounded preceding)
FROM resgroups)
SELECT * from a order by foo limit 10 offset 10;
Now, I think you are actually better off to break this into two queries though because it looks like you are doing paging, effectively. If you select the count(*) first, and then decide how many pages you need (and maybe cache that result) then your subsequent partial queries can use an index, but in this case, every group of 10 will require a full sequential scan.
Get total number of rows while using limit clause
SQLite computes results on the fly when they are actually needed.
The only way to get the total count is to run the actual query (or better, SELECT COUNT(*)
) without the LIMIT
.
Get first row of all pages while also paginating (LIMIT / OFFSET)
If you want one query that returns the nth row, you can use row_number()
and modulo arithmetic:
select w.*
from (select w.*, row_number() over (order by word) as seqnum
from words w
) w
where w.seqnum % 50 = 1;
You may be able to do this using just id
, if it is sequential, gapless, starts with 1, and represents the ordering of the words.
Related Topics
How to Get a SQL Row_Number Equivalent for a Spark Rdd
SQL Server 2008 Paging Methods
Sqlite Select with Condition on Date
How Exactly Does Using or in a MySQL Statement Differ With/Without Parentheses
Custom Function with Check Constraint SQL Server 2008
SQL Speed Up Performance of Insert
Sql, How to Concatenate Results
Why Is My T-SQL Left Join Not Working
Use Select Inside an Update Query
Error: Functions in Index Expression Must Be Marked Immutable in Postgres
How to Do an Inner Join on Row Number in SQL Server
How to Compare Two SQLite Databases on Linux
Oracle After Update Trigger: Solving Ora-04091 Mutating Table Error
Foreign Key Creation Issue in Oracle
Group by with Clob in Select-Statement