Postgresql Equivalent for Top N with Ties: Limit "With Ties"

PostgreSQL equivalent for TOP n WITH TIES: LIMIT with ties ?

Postgres 13 finally adds WITH TIES . See:

  • Get top row(s) with highest value, with ties

There is no WITH TIES clause up to PostgreSQL 12, like there is in SQL Server.

In PostgreSQL I would substitute this for TOP n WITH TIES .. ORDER BY <something>:

WITH cte AS (
SELECT *, rank() OVER (ORDER BY <something>) AS rnk
FROM tbl
)
SELECT *
FROM cte
WHERE rnk <= n;

To be clear, rank() is right, dense_rank() would be wrong (return too many rows).

Consider this quote from the SQL Server docs (from the link above):

For example, if expression is set to 5 but 2 additional rows match the
values of the ORDER BY columns in row 5, the result set will contain 7 rows.

The job of WITH TIES is to include all peers of the last row in the top n as defined by the ORDER BY clause. rank() gives the exact same result.

To make sure, I tested with SQL Server, here is a live demo.

db<>fiddle here

Faster alternatives for big tables in Postgres 12 or older:

  • Equivalent for FETCH FIRST WITH TIES in Postgres 11 with comparable performance

Equivalent for FETCH FIRST WITH TIES in Postgres 11 with comparable performance

You won't get the performance of the extremely convenient WITH TIES in Postgres 13 or later. See:

  • Get top row(s) with highest value, with ties

But there are always options ...

Faster SQL variant

Should be faster for big tables because it avoids to rank the whole table like the simple solution in the referenced answer.

WITH cte AS (
SELECT *, row_number() OVER (ORDER BY number, id) AS rn -- ORDER must match
FROM numbers
WHERE number > 1
ORDER BY number, id -- tiebreaker to make it deterministic
LIMIT 1000
)
TABLE cte
UNION ALL
(
SELECT n.*, 1000 + row_number() OVER (ORDER BY n.id) AS rn
FROM numbers n, (SELECT number, id FROM cte WHERE rn = 1000) AS max
WHERE n.number = max.number
AND n.id > max.id
ORDER BY n.id
);

id is any UNIQUE NOT NULL column in your table that lends itself as tiebreaker, typically the PRIMARY KEY.

Sort by that additionally, which has the (maybe welcome) side effect that you get a deterministically ordered result.

Ideally, you have a multicolumn index on (number, id) for this. But it will use the existing index on just (number), too. Resulting performance depends on cardinalities and data types.

Related:

  • Is a composite index also good for queries on the first field?

Since the CTE query counts as one command, there is no race condition under concurrent write load. All parts of the command see the same snapshot of the table in default isolation level READ COMMITTED.

I might wrap this into a simple SQL function (or prepared statement) and pass LIMIT to it for convenience.

Alternative with PL/pgSQL

Only running a single SELECT probably outweighs the added overhead. Works optimally with the existing index on just (number):

CREATE OR REPLACE FUNCTION foo(_bound int, _limit int, OUT _row public.numbers)
RETURNS SETOF public.numbers
LANGUAGE plpgsql AS
$func$
DECLARE
_ct int := 0;
_number int; -- match type!
BEGIN
FOR _row IN
SELECT *
FROM public.numbers
WHERE number > _bound
ORDER BY number
LOOP
_ct := _ct + 1;
CASE
WHEN _ct < _limit THEN
-- do nothing
WHEN _ct > _limit THEN
EXIT WHEN _row.number > _number;
WHEN _ct = _limit THEN
_number := _row.number;
END CASE;

RETURN NEXT;
END LOOP;
END
$func$;

But it's tailored for just the one query. Gets tedious for varying queries.

Get top row(s) with highest value, with ties

The first query fails if any row has quantity IS NULL (as Gordon demonstrates).

The second query only fails if all rows have quantity IS NULL. So it should be usable in most cases. (And it's faster.)

Postgres 13 or newer

Use the standard SQL clause WITH TIES:

SELECT id
FROM product
ORDER BY quantity DESC NULLS LAST
FETCH FIRST 1 ROWS WITH TIES;

db<>fiddle here

Works with any amount of NULL values.

The manual:

SQL:2008 introduced a different syntax to achieve the same result,
which PostgreSQL also supports. It is:

OFFSET start { ROW | ROWS }
FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }

In this syntax, the start or count value is required by the standard
to be a literal constant, a parameter, or a variable name; as a
PostgreSQL extension, other expressions are allowed, but will
generally need to be enclosed in parentheses to avoid ambiguity. If
count is omitted in a FETCH clause, it defaults to 1. The WITH TIES
option is used to return any additional rows that tie for the last
place in the result set according to the ORDER BY clause; ORDER BY is
mandatory in this case. ROW and ROWS as well as FIRST and NEXT are
noise words that don't influence the effects of these clauses.

Notably, WITH TIES cannot be used with the (non-standard) short syntax LIMIT n.

It's the fastest possible solution. Faster than either of your current queries. More important for performance: have an index on (quantity). Or a more specialized covering index to allow index-only scans (a bit faster, yet):

CREATE INDEX ON product (quantity DESC NULLS LAST) INCLUDE (id);

See:

  • Do covering indexes in PostgreSQL help JOIN columns?

We need NULLS LAST to keep NULL values last in descending order. See:

  • Sort by column ASC, but NULL values first?

Postgres 12 or older

A NULL-safe query:

SELECT id, quantity
FROM product
WHERE quantity IS NOT DISTINCT FROM (SELECT MAX(quantity) FROM product);

Or, probably faster:

SELECT id, quantity
FROM (
SELECT *, rank() OVER (ORDER BY quantity DESC NULLS LAST) AS rnk
FROM product
) sub
WHERE rnk = 1;

See:

  • PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?

Faster alternatives for big tables:

  • Equivalent for FETCH FIRST WITH TIES in PostgreSQL 11 with comparable performance

Fetching a minimum of N rows, plus all peers of the last row


Postgres 13 or later

There is a dead simple solution using WITH TIES in Postgres 13 or later:

SELECT *
FROM assets
WHERE block_no >= 2 -- your starting block
ORDER BY block_no
FETCH FIRST 100 ROWS WITH TIES;

This will return at least 100 rows (if enough qualify), plus all peers of the 100th row.

If your table isn't trivially small, an index on (block_no) is essential for performance.

See:

  • Get top row(s) with highest value, with ties

Older versions

Use the window function rank() in a subquery:

SELECT (a).*
FROM (
SELECT a, rank() OVER (ORDER BY block_no) AS rnk
FROM assets a
) sub
WHERE rnk <= 100;

Same result.

I use a little trick with the row type to strip the added rnk from the result. That's an optional addition.

See:

  • PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?

In Postgresql, how to select top n percent of rows by a column?

I would use a subquery:

select student_id, student_name, avg_grade, rank() over (order by avg_grade desc)
from (select s.student_id,
s.student_name,
avg(ce.grade) as avg_grade,
rank() over (order by avg(ce.grade) desc nulls last) as seqnum,
count(*) over () as cnt
from students s
left join
course_enrollment ce
on s.student_id = ce.student_id
group by s.student_id
) as ce_avg
where seqnum <= cnt * 0.1;

There are other window functions you can use instead, such as NTILE() and PERCENTILE_DISC(). I prefer the direct calculation because it gives more control over how ties are handled.

Select Top N rows and also handle tie situation

Use a sub-query to get the third highest cost:

select * from table
where cost >= (SELECT COST FROM TABLE
ORDER BY COST DESC
LIMIT 3,1)
ORDER BY c_cost DESC

(I'm not fully sure about the LIMIT 3,1 part, since I'm not a MySQL guy. Test and see!)



Related Topics



Leave a reply



Submit