Distinct() Function (Not Select Qualifier) in Postgres

distinct() function (not select qualifier) in postgres

(The question is old, but comes high in Google results for “sql distinct is not a function” (second, first of Stack Overflow) and yet is still missing a satisfying answer, so...)

Actually this is the ordinary DISTINCT qualifier on a SELECT -- but with a misleading syntax (you are right about that point).

DISTINCT is never a function, always a keyword. Here it is used (wrongly) as if it were a function, but

select distinct(pattern) as pattern, style, ... etc ...
from styleview
where ... etc ...

is in fact equivalent to all the following forms:

-- add a space after distinct:

select distinct (pattern) as pattern, style, ... etc ...
from styleview
where ... etc ...

-- remove parentheses around column name:

select distinct pattern as pattern, style, ... etc ...
from styleview
where ... etc ...

-- indent clauses contents:

select distinct
pattern as pattern, style, ... etc ...
from
styleview
where
... etc ...

-- remove redundant alias identical to column name:

select distinct
pattern, style, ... etc ...
from
styleview
where
... etc ...

Complementary reading:

  • http://weblogs.sqlteam.com/jeffs/archive/2007/10/12/sql-distinct-group-by.aspx
  • https://stackoverflow.com/a/1164529

Note: OMG Ponies in an answer to the present question mentioned the DISTINCT ON extension featured by PostgreSQL.

But (as Jay rightly remarked in a comment) it is not what is used here, because the query (and the results) would have been different, e.g.:

select distinct on(pattern) pattern, style, ... etc ...
from styleview
where ... etc ...
order by pattern, ... etc ...

equivalent to:

select  distinct on (pattern)
pattern, style, ... etc ...
from
styleview
where
... etc ...
order by
pattern, ... etc ...

Complementary reading:

  • http://www.noelherrick.com/blog/postgres-distinct-on

Note: Lukas Eder in an answer to the present question mentioned the syntax of using the DISTINCT keyword inside an aggregate function:

the COUNT(DISTINCT (foo, bar, ...)) syntax featured by HSQLDB

(or COUNT(DISTINCT foo, bar, ...) which works for MySQL too, but also for PostgreSQL, SQL Server, Oracle, and maybe others).

But (clearly enough) it is not what is used here.

Group By Vs Distinct in SQL

Why does this not work?

SELECT DISTINCT(continent), COUNT(name)
FROM world
WHERE population > 200000000;

That is simple. You have an aggregation query, because you have COUNT() in the SELECT. You have no GROUP BY, so any other columns references in the SELECT must be the arguments of aggregations columns. So, continent generates an error.

You seem to also be under the impression that the parentheses around continent have some significance. They do not. Not at all. SQL has a construct, SELECT DISTINCT, which selects distinct values of rows.

Also note that DISTINCT is almost never used with aggregation functions.

How do I (or can I) SELECT DISTINCT on multiple columns?

SELECT DISTINCT a,b,c FROM t

is roughly equivalent to:

SELECT a,b,c FROM t GROUP BY a,b,c

It's a good idea to get used to the GROUP BY syntax, as it's more powerful.

For your query, I'd do it like this:

UPDATE sales
SET status='ACTIVE'
WHERE id IN
(
SELECT id
FROM sales S
INNER JOIN
(
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(*) = 1
) T
ON S.saleprice=T.saleprice AND s.saledate=T.saledate
)

MySQL: Select DISTINCT / UNIQUE, but return all columns?

You're looking for a group by:

select *
from table
group by field1

Which can occasionally be written with a distinct on statement:

select distinct on field1 *
from table

On most platforms, however, neither of the above will work because the behavior on the other columns is unspecified. (The first works in MySQL, if that's what you're using.)

You could fetch the distinct fields and stick to picking a single arbitrary row each time.

On some platforms (e.g. PostgreSQL, Oracle, T-SQL) this can be done directly using window functions:

select *
from (
select *,
row_number() over (partition by field1 order by field2) as row_number
from table
) as rows
where row_number = 1

On others (MySQL, SQLite), you'll need to write subqueries that will make you join the entire table with itself (example), so not recommended.

optimizing query by removing distinct key word

There are two points about your query that should be mentioned

  1. implicit join vs explicit join which approximately has the same performance.

people often ask if there is a performance difference between implicit and explicit joins. The answer is: “Usually not”


  1. distinct vs group by which distinct is optimum for memory usage and group by is optimum for speed so the latter outperforms the former but requires a large amount of memory if needed.

The distinct approach is executed like:

  • Copy all business_key values to a temporary table

  • Sort the temporary table

  • Scan the temporary table, returning each item that is different from the one before it

The group by could be executed like:

  • Scan the full table, storing each value of business key in a hashtable

  • Return the keys of the hashtable

An astute explanation on the links below.

implicit join vs explicit join

distinct vs group by

What's faster, SELECT DISTINCT or GROUP BY in MySQL?

They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT under the hood).

If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members, just their keys. DISTINCT makes this explicit, so you can get away with a slightly dumber optimizer.

When in doubt, test!

Postgresql COUNT() on distinct composite PK

SELECT COUNT(DISTINCT ROW("t".id, "t".library_id))
FROM "ab_collection" "t"
WHERE t.library_id=1

from here https://github.com/yiisoft/yii/issues/3004#issuecomment-27601733



Related Topics



Leave a reply



Submit