Ordering Distinct Column Values by (First Value Of) Other Column in Aggregate Function

Ordering distinct column values by (first value of) other column in aggregate function

If this is part of a larger expression, it might be inconvenient to do a select distinct in a subquery. In this case, you can take advantage of the fact that string_agg() ignores NULL input values and do something like:

select string_agg( (case when seqnum = 1 then sometext end) order by numval)
from (select sometext, row_number() over (partition by <whatever>, sometext order by numval) as seqnum
      from t
     ) t
group by <whatever>

The subquery adds a column but does not require aggregating the data.

use distinct and order by in STRING_AGG function

The error message is quite clear. The expression that you use in the ORDER BY clause must also appear in the aggregated part.

You could do:

SELECT STRING_AGG(DISTINCT foo.a::TEXT, ',' ORDER BY foo.a::TEXT DESC)
FROM (
    SELECT 1 As a
    UNION ALL SELECT 1
    UNION ALL SELECT 1
    UNION ALL SELECT 2
) AS foo

Demo on DB Fiddle

While this will work, the problem with this solution is that it will order numbers as strings, that do not have the same ordering rules. String wise, 10 is less than 2.

Another option is to use arrays: first, ARRAY_AGG() can be used to aggregate the numbers (with proper, numeric ordering), then you can turn it to a comma-separated list of strings with ARRAY_TO_STRING().

SELECT ARRAY_TO_STRING(ARRAY_AGG(DISTINCT a ORDER BY a DESC), ',')
FROM (
    SELECT 1 As a
    UNION ALL SELECT 1
    UNION ALL SELECT 1
    UNION ALL SELECT 2
) AS foo

Demo on DB Fiddle

Select Distinct on one column, without ordering by that column

The general answer to your question is that when using DISTINCT ON (x, ...) in SELECT statement in postgresql, the database sorts by the values in the distinct clause in order to make it easy to tell if the rows have distinct values (once they're ordered by the values, it only takes one pass for the db to remove duplicates, and it only needs to compare adjacent rows. Because of this, the db forces you to sort by the same columns in the distinct clause.

You can work around this by making your original query a subquery, like so:

SELECT t.id FROM
  (SELECT DISTINCT ON (countries.id) countries.id
    , province_infos.population
    , country_infos.founding_date
   FROM countries
   ...
   ORDER BY countries.id, province_infos.population DESC, country_infos.founding_date  ASC 
  )t
ORDER BY t.population DESC, T.founding_date ASC

Get distinct on one column, order by another

Leading expressions in ORDER BY have to agree with expressions in DISTINCT ON:

SELECT DISTINCT ON ("threadId") *
FROM   messages
ORDER  BY "threadId", "createdAt" DESC;

Detailed explanation:

Select first row in each GROUP BY group?

If you want to order results in a different way (like commented), you'll have to wrap the query in an outer query with a second ORDER BY. See:

PostgreSQL DISTINCT ON with different ORDER BY

Or similar, depending on your exact situation and requirements. There may be sophistication to get best results. Recent example:

How do I take a DISTINCT ON subquery that is ordered by a separate column, and make it fast?

Number of distinct column A group by B

Please try to use DISTINCT keyword inside your count function:

SELECT travelmode, COUNT(DISTINCT segment_id) NumOfSegments
FROM temp_table
GROUP BY travelmode

How to get distinct prices by product with single row per product

Given your error message, and from what I read here on Stack Overflow from gurus like @GordonLinoff, you can't use DISTINCT inside STRING_AGG. A quick workaround would be to just subquery your table first and use DISTINCT there to remove duplicates.

SELECT t.product, STRING_AGG(t.price::text, ',' ORDER BY price)
FROM
(
    SELECT DISTINCT product, price
    FROM (VALUES ('A', 100), ('A', 100), ('A', 200), ('B', 200), ('B', 200))
    orderdetail (product, price)
) t
GROUP BY t.product

I tested this query on Postgres, and it returns this:

product | string_agg
text    | text
A       | 100,200
B       | 200

distinct rows in R based on the order of other columns

Is this what you are looking for?

## data
data <- structure(list(id = c(1000L, 1000L, 1000L, 1000L, 1000L, 1000L,
1000L, 1000L, 1000L, 1000L, 1000L), decision = c(1L, 1L, 1L,
1L, 1L, 2L, 3L, 1L, 3L, 5L, 1L), nature = c(5L, 5L, 5L, 5L, 5L,
2L, 2L, 2L, 2L, 2L, 2L), period = c(1L, 2L, 1L, 2L, 1L, 1L, 2L,
1L, 2L, 1L, 2L), trial = c(1L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 2L,
3L, 3L)), row.names = c(NA, -11L), class = "data.frame")

library(dplyr)
data %>% 
    mutate(rownum = 1:n()) %>% 
    group_by(id, trial, period) %>%
    mutate(maxrownum = max(rownum)) %>% 
    filter(rownum == maxrownum) %>% 
    select(-c(rownum, maxrownum))

I have created an identifier for the row number. Assuming that your data is ordered by attempt, choosing the rows where the row number is equal to max(row number) picks up the last attempt for each (id, trial, period) triple.

Output:

# Groups:   id, trial, period [6]
     id decision nature period trial
  <int>    <int>  <int>  <int> <int>
1  1000        2      2      1     1
2  1000        3      2      2     1
3  1000        1      2      1     2
4  1000        3      2      2     2
5  1000        5      2      1     3
6  1000        1      2      2     3

Select first row in each GROUP BY group?

On databases that support CTE and windowing functions:

WITH summary AS (
    SELECT p.id, 
           p.customer, 
           p.total, 
           ROW_NUMBER() OVER(PARTITION BY p.customer 
                                 ORDER BY p.total DESC) AS rank
      FROM PURCHASES p)
 SELECT *
   FROM summary
 WHERE rank = 1

Supported by any database:

But you need to add logic to break ties:

  SELECT MIN(x.id),  -- change to MAX if you want the highest
         x.customer, 
         x.total
    FROM PURCHASES x
    JOIN (SELECT p.customer,
                 MAX(total) AS max_total
            FROM PURCHASES p
        GROUP BY p.customer) y ON y.customer = x.customer
                              AND y.max_total = x.total
GROUP BY x.customer, x.total

How to use DISTINCT and ORDER BY in same SELECT statement?

The problem is that the columns used in the ORDER BY aren't specified in the DISTINCT. To do this, you need to use an aggregate function to sort on, and use a GROUP BY to make the DISTINCT work.

Try something like this:

SELECT DISTINCT Category, MAX(CreationDate) 
FROM MonitoringJob 
GROUP BY Category 
ORDER BY MAX(CreationDate) DESC, Category

Ordering Distinct Column Values by (First Value Of) Other Column in Aggregate Function