"Order by ... Using" Clause in Postgresql

ORDER BY ... USING clause in PostgreSQL

A very simple example would be:

> SELECT * FROM tab ORDER BY col USING <

But this is boring, because this is nothing you can't get with the traditional ORDER BY col ASC.

Also the standard catalog doesn't mention anything exciting about strange comparison functions/operators. You can get a list of them:

    > SELECT amoplefttype::regtype, amoprighttype::regtype, amopopr::regoper 
FROM pg_am JOIN pg_amop ON pg_am.oid = pg_amop.amopmethod
WHERE amname = 'btree' AND amopstrategy IN (1,5);

You will notice, that there are mostly < and > functions for primitive types like integer, date etc and some more for arrays and vectors and so on. None of these operators will help you to get a custom ordering.

In most cases where custom ordering is required you can get away using something like ... ORDER BY somefunc(tablecolumn) ... where somefunc maps the values appropriately. Because that works with every database this is also the most common way. For simple things you can even write an expression instead of a custom function.

Switching gears up

ORDER BY ... USING makes sense in several cases:

  • The ordering is so uncommon, that the somefunc trick doesn't work.
  • You work with a non-primitive type (like point, circle or imaginary numbers) and you don't want to repeat yourself in your queries with strange calculations.
  • The dataset you want to sort is so large, that support by an index is desired or even required.

I will focus on the complex datatypes: often there is more than one way to sort them in a reasonable way. A good example is point: You can "order" them by the distance to (0,0), or by x first, then by y or just by y or anything else you want.

Of course, PostgreSQL has predefined operators for point:

    > CREATE TABLE p ( p point );
> SELECT p <-> point(0,0) FROM p;

But none of them is declared usable for ORDER BY by default (see above):

    > SELECT * FROM p ORDER BY p;
ERROR: could not identify an ordering operator for type point
TIP: Use an explicit ordering operator or modify the query.

Simple operators for point are the "below" and "above" operators <^ and >^. They compare simply the y part of the point. But:

    >  SELECT * FROM p ORDER BY p USING >^;
ERROR: operator > is not a valid ordering operator
TIP: Ordering operators must be "<" or ">" members of __btree__ operator families.

ORDER BY USING requires an operator with defined semantics: Obviously it must be a binary operator, it must accept the same type as arguments and it must return boolean. I think it must also be transitive (if a < b and b < c then a < c). There may be more requirements. But all these requirements are also necessary for proper btree-index ordering. This explains the strange error messages containing the reference to btree.

ORDER BY USING also requires not just one operator to be defined but an operator class and an operator family. While one could implement sorting with only one operator, PostgreSQL tries to sort efficiently and minimize comparisons. Therefore, several operators are used even when you specify only one - the others must adhere to certain mathematical constraints - I've already mentioned transitivity, but there are more.

Switching Gears up

Let's define something suitable: An operator for points which compares only the y part.

The first step is to create a custom operator family which can be used by the btree index access method. see

    > CREATE OPERATOR FAMILY xyzfam USING btree;   -- superuser access required!
CREATE OPERATOR FAMILY

Next we must provide a comparator function which returns -1, 0, +1 when comparing two points. This function WILL be called internally!

    > CREATE FUNCTION xyz_v_cmp(p1 point, p2 point) RETURNS int 
AS $$BEGIN RETURN btfloat8cmp(p1[1],p2[1]); END $$ LANGUAGE plpgsql;
CREATE FUNCTION

Next we define the operator class for the family. See the manual for an explanation of the numbers.

    > CREATE OPERATOR CLASS xyz_ops FOR TYPE point USING btree FAMILY xyzfam AS 
OPERATOR 1 <^ ,
OPERATOR 3 ?- ,
OPERATOR 5 >^ ,
FUNCTION 1 xyz_v_cmp(point, point) ;
CREATE OPERATOR CLASS

This step combines several operators and functions and also defines their relationship and meaning. For example OPERATOR 1 means: This is the operator for less-than tests.

Now the operators <^ and >^ can be used in ORDER BY USING:

> INSERT INTO p SELECT point(floor(random()*100), floor(random()*100)) FROM generate_series(1, 5);
INSERT 0 5
> SELECT * FROM p ORDER BY p USING >^;
p
---------
(17,8)
(74,57)
(59,65)
(0,87)
(58,91)

Voila - sorted by y.

To sum it up: ORDER BY ... USING is an interesting look under the hood of PostgreSQL. But nothing you will require anytime soon unless you work in very specific areas of database technology.

Another example can be found in the Postgres docs. with source code for the example here and here. This example also shows how to create the operators.

ORDER BY the IN value list

You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().

Syntax will be like this:

select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering

PostgreSQL ORDER BY values in IN() clause

You can wrap it into a derived table:

SELECT *
FROM (
SELECT DISTINCT id
FROM items
WHERE id IN (5,2,9)
) t
ORDER BY
CASE id
WHEN 5 THEN 1
WHEN 2 THEN 2
WHEN 9 THEN 3
END

PostgreSQL Multiple USING clause

With inner joins, this query:

SELECT A.c1, A.c2, ...
FROM A JOIN
B
USING (userid) JOIN
C
USING (userid) ;

Is equivalent to:

SELECT A.c1, A.c2, ...
FROM A JOIN
B
ON B.userid = A.userid JOIN
C
ON C.userid = A.userid ; -- or "C.userid = B.userid", it doesn't matter

Note that NULL values fail the comparison for an INNER JOIN, so they are not an issue because they are not in the result set.

For LEFT OUTER JOIN, the logic would be the same as above. For a FULL JOIN:

SELECT A.c1, A.c2, ...
FROM A FULL JOIN
B
ON B.userid = A.userid FULL JOIN
C
ON C.userid = COALESCE(A.userid, B.userid) ;

Note that throughout the rest of the query userid, with no qualification, refers to the non-NULL value from whatever table.

Retain the order while select in postgres

You cannot; the order will depend on the execution plan, the physical order of the table and other things.

You could do that with an array and an explicit ORDER BY:

SELECT u.id
FROM users AS u
JOIN unnest(ARRAY[2,2,1]) WITH ORDINALITY AS arr(elem, ord)
ON u.id = arr.elem
ORDER BY arr.ord;

PostgreSQL -- is it possible to use named window in query's ORDER BY clause?

You cannot take the shortcut of using the window name in the ORDER BY clause. You will have to repeat the expressions there.

How to use an ALIAS in a PostgreSQL ORDER BY clause?

You can always ORDER BY this way:

select 
title,
( stock_one + stock_two ) as global_stock
from product
order by 2, 1

or wrap it in another SELECT:

SELECT *
from
(
select
title,
( stock_one + stock_two ) as global_stock
from product
) x
order by (case when global_stock = 0 then 1 else 0 end) desc, title

How does postgres order the results when ORDER BY is not provided

If no order by clause is given, postgres (and any other reasonable database, for that sake), should return the rows in the order it was able to produce them (be it from an internal cache, an index, or directly from the table).

Since the same algorithm is used on the same data, it isn't surprising you're getting the same rows in the same order. However, this does not mean you should rely on this ordering. If you do something to change the data's layout on the disk (e.g., back it up and restore it, or even rebuild the tables' indexes), you're likely to get a different ordering.

problem using ORDER BY in postgres with UNION clause

Leave out the table aliases and refer to the columns only by their table aliases:

SELECT users."uid", tax."tid"
FROM tax LEFT JOIN
users
ON tax."uid" = users."uid"
UNION
SELECT users."uid", tax."tid"
FROM tax LEFT JOIN
users
ON tax."uid" = users."uid"
ORDER BY "uid", "tid";

The SELECT DISTINCT is redundant, because UNION removes duplicates.

Postgres CASE in ORDER BY using an alias

Try to wrap it as a subquery:

SELECT * 
FROM
(
SELECT users.id,
GREATEST(
COALESCE(MAX(messages.created_at), '2012-07-25 16:05:41.870117'),
COALESCE(MAX(phone_calls.created_at), '2012-07-25 16:05:41.870117')
) AS latest_interaction
FROM users LEFT JOIN messages ON users.id = messages.user_id
LEFT JOIN phone_calls ON users.id = phone_calls.user_id
GROUP BY users.id
) Sub
ORDER BY
CASE WHEN(
latest_interaction > '2012-09-05 16:05:41.870117')
THEN 0
WHEN(latest_interaction > '2012-09-04 16:05:41.870117')
THEN 2
WHEN(latest_interaction > '2012-09-04 16:05:41.870117')
THEN 3
ELSE 4
END
LIMIT 5;


Related Topics



Leave a reply



Submit