Ordering distinct column values by (first value of) other column in aggregate function
If this is part of a larger expression, it might be inconvenient to do a select distinct
in a subquery. In this case, you can take advantage of the fact that string_agg()
ignores NULL
input values and do something like:
select string_agg( (case when seqnum = 1 then sometext end) order by numval)
from (select sometext, row_number() over (partition by <whatever>, sometext order by numval) as seqnum
from t
) t
group by <whatever>
The subquery adds a column but does not require aggregating the data.
use distinct and order by in STRING_AGG function
The error message is quite clear. The expression that you use in the ORDER BY
clause must also appear in the aggregated part.
You could do:
SELECT STRING_AGG(DISTINCT foo.a::TEXT, ',' ORDER BY foo.a::TEXT DESC)
FROM (
SELECT 1 As a
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 2
) AS foo
Demo on DB Fiddle
While this will work, the problem with this solution is that it will order numbers as strings, that do not have the same ordering rules. String wise, 10
is less than 2
.
Another option is to use arrays: first, ARRAY_AGG()
can be used to aggregate the numbers (with proper, numeric ordering), then you can turn it to a comma-separated list of strings with ARRAY_TO_STRING()
.
SELECT ARRAY_TO_STRING(ARRAY_AGG(DISTINCT a ORDER BY a DESC), ',')
FROM (
SELECT 1 As a
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 2
) AS foo
Demo on DB Fiddle
Select Distinct on one column, without ordering by that column
The general answer to your question is that when using DISTINCT ON (x, ...) in SELECT statement in postgresql, the database sorts by the values in the distinct clause in order to make it easy to tell if the rows have distinct values (once they're ordered by the values, it only takes one pass for the db to remove duplicates, and it only needs to compare adjacent rows. Because of this, the db forces you to sort by the same columns in the distinct clause.
You can work around this by making your original query a subquery, like so:
SELECT t.id FROM
(SELECT DISTINCT ON (countries.id) countries.id
, province_infos.population
, country_infos.founding_date
FROM countries
...
ORDER BY countries.id, province_infos.population DESC, country_infos.founding_date ASC
)t
ORDER BY t.population DESC, T.founding_date ASC
Get distinct on one column, order by another
Leading expressions in ORDER BY
have to agree with expressions in DISTINCT ON
:
SELECT DISTINCT ON ("threadId") *
FROM messages
ORDER BY "threadId", "createdAt" DESC;
Detailed explanation:
- Select first row in each GROUP BY group?
If you want to order results in a different way (like commented), you'll have to wrap the query in an outer query with a second ORDER BY
. See:
- PostgreSQL DISTINCT ON with different ORDER BY
Or similar, depending on your exact situation and requirements. There may be sophistication to get best results. Recent example:
- How do I take a DISTINCT ON subquery that is ordered by a separate column, and make it fast?
Number of distinct column A group by B
Please try to use DISTINCT keyword inside your count
function:
SELECT travelmode, COUNT(DISTINCT segment_id) NumOfSegments
FROM temp_table
GROUP BY travelmode
How to get distinct prices by product with single row per product
Given your error message, and from what I read here on Stack Overflow from gurus like @GordonLinoff, you can't use DISTINCT
inside STRING_AGG
. A quick workaround would be to just subquery your table first and use DISTINCT
there to remove duplicates.
SELECT t.product, STRING_AGG(t.price::text, ',' ORDER BY price)
FROM
(
SELECT DISTINCT product, price
FROM (VALUES ('A', 100), ('A', 100), ('A', 200), ('B', 200), ('B', 200))
orderdetail (product, price)
) t
GROUP BY t.product
I tested this query on Postgres, and it returns this:
product | string_agg
text | text
A | 100,200
B | 200
distinct rows in R based on the order of other columns
Is this what you are looking for?
## data
data <- structure(list(id = c(1000L, 1000L, 1000L, 1000L, 1000L, 1000L,
1000L, 1000L, 1000L, 1000L, 1000L), decision = c(1L, 1L, 1L,
1L, 1L, 2L, 3L, 1L, 3L, 5L, 1L), nature = c(5L, 5L, 5L, 5L, 5L,
2L, 2L, 2L, 2L, 2L, 2L), period = c(1L, 2L, 1L, 2L, 1L, 1L, 2L,
1L, 2L, 1L, 2L), trial = c(1L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 2L,
3L, 3L)), row.names = c(NA, -11L), class = "data.frame")
library(dplyr)
data %>%
mutate(rownum = 1:n()) %>%
group_by(id, trial, period) %>%
mutate(maxrownum = max(rownum)) %>%
filter(rownum == maxrownum) %>%
select(-c(rownum, maxrownum))
I have created an identifier for the row number. Assuming that your data is ordered by attempt, choosing the rows where the row number is equal to max(row number) picks up the last attempt for each (id, trial, period) triple.
Output:
# Groups: id, trial, period [6]
id decision nature period trial
<int> <int> <int> <int> <int>
1 1000 2 2 1 1
2 1000 3 2 2 1
3 1000 1 2 1 2
4 1000 3 2 2 2
5 1000 5 2 1 3
6 1000 1 2 2 3
Select first row in each GROUP BY group?
On databases that support CTE and windowing functions:
WITH summary AS (
SELECT p.id,
p.customer,
p.total,
ROW_NUMBER() OVER(PARTITION BY p.customer
ORDER BY p.total DESC) AS rank
FROM PURCHASES p)
SELECT *
FROM summary
WHERE rank = 1
Supported by any database:
But you need to add logic to break ties:
SELECT MIN(x.id), -- change to MAX if you want the highest
x.customer,
x.total
FROM PURCHASES x
JOIN (SELECT p.customer,
MAX(total) AS max_total
FROM PURCHASES p
GROUP BY p.customer) y ON y.customer = x.customer
AND y.max_total = x.total
GROUP BY x.customer, x.total
How to use DISTINCT and ORDER BY in same SELECT statement?
The problem is that the columns used in the ORDER BY
aren't specified in the DISTINCT
. To do this, you need to use an aggregate function to sort on, and use a GROUP BY
to make the DISTINCT
work.
Try something like this:
SELECT DISTINCT Category, MAX(CreationDate)
FROM MonitoringJob
GROUP BY Category
ORDER BY MAX(CreationDate) DESC, Category
Related Topics
How to Set a Jdbc Timeout for a Single Query
Postgresql: Defining a Primary Key on a Large Database
How to Pass Schema as Parameter to a Stored Procedure in SQL Server
The Identifier That Starts with ...... Is Too Long. Maximum Length Is 128
Increment Counter or Insert Row in One Statement, in Sqlite
Row Locks - Manually Using Them
How to Select Avg of Multiple Columns on a Single Row
Importing and Validating Xml File Using Ssis or Just Plain T-Sql
How to Export Data from SQL Server 2008.2010 in Dml (Sql Script)
How to Get Just The First Row in a Result Set After Ordering
Orm or Something to Handle SQL Tables with an Order Column Efficiently
Psql: Fatal: Too Many Connections for Role
Sqlite: Autoincrement Primary Key Questions
How to Get Id of Newly Inserted Record Using Excel Vba