How to sort table alphabetically by name initial?
the showed table data is the same as original, indicting ORDER BY does not modify the data, is this correct?
Yes, this is correct. A SELECT
statement does not change the data in a table. Only UPDATE
, DELETE
, INSERT
or TRUNCATE
statements will change the data.
However, your question shows a misconception on how a relational database works.
Rows in a table (of a relational database) are not sorted in any way. You can picture them as balls in a basket.
If you want to display data in a specific sort order, the only (really: the only) way to do that is to use an ORDER BY
in your SELECT statement. There is no alternative to that.
Postgres allows to define a VIEW that includes an ORDER BY
which might be an acceptable workaround for you:
CREATE VIEW sorted_employee;
AS
SELECT *
FROM employee
ORDER BY employeename ASC;
Then you can simply use
select *
from sorted_employees;
But be aware of the drawbacks. If you run select * from sorted_employees order by id
then the data will be sorted twice. Postgres is not smart enough to remove the (useless) order by
from the view's definition.
Some related questions:
- Default row order in SELECT query - SQL Server 2008 vs SQL 2012
- What is the default SQL result sort order with 'select *'?
- Is PostgreSQL order fully guaranteed if sorting on a non-unique attribute?
- Why do results from a SQL query not come back in the order I expect?
Postgres ordering is not consistent
It sounds like categories.position
and photos.display_priority
are not unique for all of the result rows. The database server does not specify the order for rows when the values used to order them are all equal; it is free to return them in any order, even if the table data has not changed between queries.
To get consistent ordering you will have to add a third sorting key that is guaranteed to be unique for all rows, such as the identity value for that particular row.
Are postgresql `SELECT DISTINCT` queries deterministic?
This answer assumes that the expressions in the select
are deterministic. Otherwise, the question seems trivial.
The ordering is not specified, so it could change between runs of the query -- or on a different system. However, the result set should be the same.
Your second quote from the documentation is for distinct on
. That is not-deterministic, unless you are using a stable sort.
Note: You might get non-deterministic results if you are using a case-insensitive collation. The built-in collations are case-sensitive; and case insensitivity means that the original expressions are not deterministic.
What is the difference between Postgres DISTINCT vs DISTINCT ON?
DISTINCT and DISTINCT ON have completely different semantics.
First the theory
DISTINCT applies to an entire tuple. Once the result of the query is computed, DISTINCT removes any duplicate tuples from the result.
For example, assume a table R with the following contents:
#table r;
a | b
---+---
1 | a
2 | b
3 | c
3 | d
2 | e
1 | a
(6 rows)
SELECT distinct * from R will result:
# select distinct * from r;
a | b
---+---
1 | a
3 | d
2 | e
2 | b
3 | c
(5 rows)
Note that distinct applies to the entire list of projected attributes: thus
select distinct * from R
is semantically equivalent to
select distinct a,b from R
You cannot issue
select a, distinct b From R
DISTINCT must follow SELECT. It applies to the entire tuple, not to an attribute of the result.
DISTINCT ON is a postgresql addition to the language. It is similar, but not identical, to group by.
Its syntax is:
SELECT DISTINCT ON (attributeList) <rest as any query>
For example:
SELECT DISTINCT ON (a) * from R
It semantics can be described as follows. Compute the as usual--without the DISTINCT ON (a)---but before the projection of the result, sort the current result and group it according to the attribute list in DISTINCT ON (similar to group by). Now, do the projection using the first tuple in each group and ignore the other tuples.
Example:
select * from r order by a;
a | b
---+---
1 | a
2 | e
2 | b
3 | c
3 | d
(5 rows)
Then for every different value of a (in this case, 1, 2 and 3), take the first tuple. Which is the same as:
SELECT DISTINCT on (a) * from r;
a | b
---+---
1 | a
2 | b
3 | c
(3 rows)
Some DBMS (most notably sqlite) will allow you to run this query:
SELECT a,b from R group by a;
And this give you a similar result.
Postgresql will allow this query, if and only if there is a functional dependency from a to b. In other words, this query will be valid if for any instance of the relation R, there is only one unique tuple for every value or a (thus selecting the first tuple is deterministic: there is only one tuple).
For instance, if the primary key of R is a, then a->b and:
SELECT a,b FROM R group by a
is identical to:
SELECT DISTINCT on (a) a, b from r;
Now, back to your problem:
First query:
SELECT DISTINCT count(dimension1)
FROM data_table;
computes the count of dimension1 (number of tuples in data_table that where dimension1 is not null). This query
returns one tuple, which is always unique (hence DISTINCT
is redundant).
Query 2:
SELECT count(*)
FROM (SELECT DISTINCT ON (dimension1) dimension1
FROM data_table
GROUP BY dimension1) AS tmp_table;
This is query in a query. Let me rewrite it for clarity:
WITH tmp_table AS (
SELECT DISTINCT ON (dimension1)
dimension1 FROM data_table
GROUP by dimension1)
SELECT count(*) from tmp_table
Let us compute first tmp_table. As I mentioned above,
let us first ignore the DISTINCT ON and do the rest of the
query. This is a group by by dimension1. Hence this part of the query
will result in one tuple per different value of dimension1.
Now, the DISTINCT ON. It uses dimension1 again. But dimension1 is unique already (due to the group by). Hence
this makes the DISTINCT ON superflouos (it does nothing).
The final count is simply a count of all the tuples in the group by.
As you can see, there is an equivalence in the following query (it applies to any relation with an attribute a):
SELECT (DISTINCT ON a) a
FROM R
and
SELECT a FROM R group by a
and
SELECT DISTINCT a FROM R
Warning
Using DISTINCT ON results in a query might be non-deterministic for a given instance of the database.
In other words, the query might return different results for the same tables.
One interesting aspect
Distinct ON emulates a bad behaviour of sqlite in a much cleaner way. Assume that R has two attributes a and b:
SELECT a, b FROM R group by a
is an illegal statement in SQL. Yet, it runs on sqlite. It simply takes a random value of b from any of the tuples in the group of same values of a.
In Postgresql this statement is illegal. Instead, you must use DISTINCT ON and write:
SELECT DISTINCT ON (a) a,b from R
Corollary
DISTINCT ON is useful in a group by when you want to access a value that is functionally dependent on the group by attributes. In other words, if you know that for every group of attributes they always have the same value of the third attribute, then use DISTINCT ON that group of attributes. Otherwise you would have to make a JOIN to retrieve that third attribute.
Related Topics
Hive Left Semi Join for 'Not Exists'
How to Use Oracle Outer Join with a Filter Where Clause
How to Limit The Amount of Results Returned in Sybase
Writing a Recursive SQL Query on a Self-Referencing Table
Using Where Clause with Between and Null Date Parameters
Creating Groups of Consecutive Days Meeting a Given Criteria
Sql Access Query- Update Row If Exists, Insert If Does Not
Query for Comma-Separated Ids to Comma-Separated Values
Confusing Error About Missing Left Parenthesis in SQL Statement
Orm or Something to Handle SQL Tables with an Order Column Efficiently
Extract Email Address from String Using Tsql
How to Determine If Null Is Contained in an Array in Postgres
Sql Create Statement Incorrect Syntax Near Auto Increment
Is There a Opposite Function to Isnull in SQL Server? to Do Is Not Null
Sql - Query to Insert a Column Value If It Does Not Exist in That Column
Sql Best Practices - Ok to Rely on Auto Increment Field to Sort Rows Chronologically