Implementing a Total Order Ranking in Postgresql 8.3

Resequence (reorder) in column return false

In MySQL, the expression @new_ordering := @new_ordering + @ordering_inc assigns to the variable.

Postgres, on the other hand, evaluates the expression
new_ordering = new_ordering + ordering_inc according to standard SQL: it compares new_ordering to new_ordering + ordering_inc with the equality operator =, which yields the boolean false. When concatenating with concat(), that's coerced to text 'false'.

For actual variable assignment see:

  • Store query result in a variable using in PL/pgSQL

But that's not what you need here. There are various ways to assign sequential numbers.

You could use a (temporary) SEQUENCE for the task:

CREATE TEMP SEQUENCE foo_seq;
UPDATE tasks
SET "order" = 'p/' || nextval('foo_seq');

See:

  • Implementing a total order ranking in PostgreSQL 8.3

To get a serial number in the table column (with arbitrary order), I would just:

ALTER TABLE tasks
DROP COLUMN "order"
, ADD COLUMN "order" serial;

And if you don't want an arbitrary sort order, you have to ORDER BY something. Like by ORDER BY id. Using the (standard SQL) window function row_number() to generate the sequence. In a (non-standard) FROM clause to the UPDATE:

UPDATE tasks t
SET "order" = t1.rn
FROM (SELECT id, row_number() OVER (ORDER BY id) AS rn FROM tasks) t1
WHERE t.id = t1.id;

db<>fiddle here

See:

  • Update a column of a table with a column of another table in PostgreSQL

But, seriously, you wouldn't want to use the reserved word order as identifier. That's begging for trouble.

How should I handle ranked x out of y data in PostgreSQL?

If you want the rank, do something like

SELECT id,num,rank FROM (
SELECT id,num,rank() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4

Or if you actually want the row number, use

SELECT id,num,row_number FROM (
SELECT id,num,row_number() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4

They'll differ when you have equal values somewhere. There is also dense_rank() if you need that.

This requires PostgreSQL 8.4, of course.

postgresql selecting highest values

Generally the algorithm is very simple:

  1. Retrieve rows from Student table
  2. Sort the whole resultset from #1 using ORDER BY expressions
  3. Apply LIMIT clause (+ offset) to get a portion of rows from ordered resultset obtained from #2

You can read more on LIMIT here: http://www.postgresql.org/docs/9.4/static/queries-limit.html

In some cases LIMIT is taken into account during ORDER BY operation (sorting) to speed up the query, especially when some index can be used to eliminate a sort operation.

You can examine how this work looking at explain plans.

Let's say there is an index created on this table:

create index student_mark1 on student(student_marks1);

This query gives the following explain plan:

select * From student
order by student_marks2
limit 1;

Limit (cost=5.06..5.06 rows=1 width=178) (actual time=0.088..0.089 rows=1 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2
-> Sort (cost=5.06..5.57 rows=204 width=178) (actual time=0.088..0.088 rows=1 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2
Sort Key: student.student_marks2
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on public.student (cost=0.00..4.04 rows=204 width=178) (actual time=0.007..0.021 rows=204 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2

You need to read this plan from bottom to up.

The first operation is Seq scan - it means that the all rows are read from disk (the whole table - see actual rows = 204).

Then the sort operation is performed (ORDER BY). And the last operation is LIMIT 1 (at the top of the plan)

Compare the above plan to this query:

select * From student
order by student_marks1
limit 1;

Limit (cost=0.14..0.24 rows=1 width=178) (actual time=0.010..0.010 rows=1 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2
-> Index Scan using student_mark1 on public.student (cost=0.14..19.20 rows=204 width=178) (actual time=0.009..0.009 rows=1 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2

Here the sorting phase is skipped, since we can use the index to retrieve rows in required order (ORDER BY student_marks1 => INDEX ON Student( student_marks1 )).

Please take notice of Actual rows = 1 in the bottom-most operation: `Index scan'.

This means, that PostgreSQL doesn't scan the whole index, but retrieves only 1 (first) row from the index, because it knows, than the query has LIMIT 1 clause. (One sometimes says that PostgreSQL "pushed down" limit 1 clause to the index scan operation and used it to reduce a number of scanned entries in the index).

More on using indices to speed up ORDER BY you can find here: http://www.postgresql.org/docs/8.3/static/indexes-ordering.html

In case the query in your question, the ORDER BY clause contains an expression Student_Marks1+Student_Marks2, not a simple columns. An explain plan for this query looks like this:

select *
From student
order by student_marks1 + student_marks2
limit 2;

Limit (cost=7.10..7.11 rows=2 width=178) (actual time=0.207..0.207 rows=2 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2, (((student_marks1)::numeric + student_marks2))
-> Sort (cost=7.10..7.61 rows=204 width=178) (actual time=0.205..0.205 rows=2 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2, (((student_marks1)::numeric + student_marks2))
Sort Key: (((student.student_marks1)::numeric + student.student_marks2))
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on public.student (cost=0.00..5.06 rows=204 width=178) (actual time=0.019..0.107 rows=204 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2, ((student_marks1)::numeric + student_marks2)

But you can still speed up this query creating a function based index, in this way:

create index student_mark12 on student( ( student_marks1 + student_marks2) );

After creating the index, we have now:

Limit  (cost=0.14..0.34 rows=2 width=178) (actual time=0.044..0.047 rows=2 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2, (((student_marks1)::numeric + student_marks2))
-> Index Scan using student_mark12 on public.student (cost=0.14..20.22 rows=204 width=178) (actual time=0.043..0.046 rows=2 loops=1)
Output: student_name, student_rollno, student_marks1, student_marks2, ((student_marks1)::numeric + student_marks2)

Notice that Postgre uses the index in this case, and retrieves only 2 entries from it (actual rows = 2) according to LIMIT 2 clause.

Grouped LIMIT in PostgreSQL: show the first N rows for each group?

New solution (PostgreSQL 8.4)

SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY section_id ORDER BY name) AS r,
t.*
FROM
xxx t) x
WHERE
x.r <= 2;


Related Topics



Leave a reply



Submit