Is There Something Equivalent to Argmax in SQL

SQL groupby argmax

This can work, if the `group_id, member_id, value' combination is unique.

SELECT 
x.group_id,
x.member_id,
x.value
FROM
table x
join
(SELECT
group_id,
max(t.value) as max_v
FROM table t
GROUP BY group_id
) y
ON y.max_v = x.value
AND y.group_id = y.group_id

how to get moving window argmax in PostgreSQL

Although a bit hard to read, you could do the following:

  1. Put all values for price inside this window into an array;
  2. Use array_position to find the value of the rolling max price;
  3. Adjust for row_number() by adding row_number() - 10 (the window size) to the output;
  4. Adjust for the start of the array by using GREATEST(row_number() - 10, 0) to prevent negative numbers:
WITH sample_table(s_id, s_date, price) AS (
VALUES ('ABC', '2020-06-10'::date, 322.390),
('ABC', '2020-06-11'::date, 312.150),
('ABC', '2020-06-12'::date, 309.080),
('ABC', '2020-06-15'::date, 308.280),
('ABC', '2020-06-16'::date, 315.640),
('ABC', '2020-06-17'::date, 314.390),
('ABC', '2020-06-18'::date, 312.300),
('ABC', '2020-06-19'::date, 314.380),
('ABC', '2020-06-22'::date, 311.050),
('ABC', '2020-06-23'::date, 314.500),
('ABC', '2020-06-24'::date, 310.510),
('ABC', '2020-06-25'::date, 307.640),
('ABC', '2020-06-26'::date, 306.390),
('ABC', '2020-06-29'::date, 304.610),
('ABC', '2020-06-30'::date, 310.200),
('ABC', '2020-07-01'::date, 311.890),
('ABC', '2020-07-02'::date, 315.700),
('ABC', '2020-07-06'::date, 317.680)
)
SELECT s_id,
s_date,
price,
row_number() over (PARTITION BY s_id ORDER BY s_date),
max(price) over (partition by s_id order by s_date rows 10 preceding) as roll_max,
GREATEST(row_number() over (PARTITION BY s_id ORDER BY s_date) - 10, 0)
+ array_position(
array_agg(price) over (partition by s_id order by s_date rows 10 preceding),
max(price) over (partition by s_id order by s_date rows 10 preceding)
) as argmax
FROM sample_table

or, with a subquery, but easier to read:

WITH sample_table(s_id, s_date, price) AS (
VALUES ('ABC', '2020-06-10'::date, 322.390),
('ABC', '2020-06-11'::date, 312.150),
('ABC', '2020-06-12'::date, 309.080),
('ABC', '2020-06-15'::date, 308.280),
('ABC', '2020-06-16'::date, 315.640),
('ABC', '2020-06-17'::date, 314.390),
('ABC', '2020-06-18'::date, 312.300),
('ABC', '2020-06-19'::date, 314.380),
('ABC', '2020-06-22'::date, 311.050),
('ABC', '2020-06-23'::date, 314.500),
('ABC', '2020-06-24'::date, 310.510),
('ABC', '2020-06-25'::date, 307.640),
('ABC', '2020-06-26'::date, 306.390),
('ABC', '2020-06-29'::date, 304.610),
('ABC', '2020-06-30'::date, 310.200),
('ABC', '2020-07-01'::date, 311.890),
('ABC', '2020-07-02'::date, 315.700),
('ABC', '2020-07-06'::date, 317.680)
)
SELECT s_id, s_date, price, row_number, roll_max,
GREATEST(row_number - 10, 0)
+ array_position(
prices,
roll_max
) as argmax
FROM (
SELECT s_id,
s_date,
price,
row_number() over (PARTITION BY s_id ORDER BY s_date),
max(price) over (partition by s_id order by s_date rows 10 preceding) as roll_max,
array_agg(price)
over (partition by s_id order by s_date rows 10 preceding) as prices
FROM sample_table
) as s

Getting row with MAX value together with SUM

This can be easily achieved using window functions:

SELECT a, b, c, s
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY b DESC) AS rn,
SUM(b) OVER (PARTITION BY a) AS s
FROM example) AS t
WHERE t.rn = 1
  • ROW_NUMBER enumerates records within each a partition: the record having the highest b value is assigned a value of 1, next record a value of 2, etc.
  • SUM(b) OVER (PARTITION BY a) returns the sum of all b within each a partition.

Rows with max value of each group

You need a subselect:

SELECT yourtable.*
FROM yourtable
LEFT JOIN (
SELECT grp_id, MAX(created) AS max
FROM yourtable
GROUP BY grp_id
) AS maxgroup ON (
(yourtable.grp_id = maxgroup.grp_id) AND (yourtable.created = maxgroup.max)
)

subselect the gets the ID/max value for each group, and the parent/outer query joins agains the subselect results to get the rest of the fields for the row(s) that the max value appears on.

BigQuery argmax: Is array order maintained when doing CROSS JOIN UNNEST

Short answer: no, order is not guaranteed to be maintained.

Long answer: in practice, you'll most likely see that order is maintained, but you should not depend on it. The example that you provided is similar to this type of query:

SELECT *
FROM (
SELECT 3 AS x UNION ALL
SELECT 2 UNION ALL
SELECT 1
ORDER BY x
)

What is the expected order of the output? The ORDER BY is in the subquery, and the outer query doesn't impose any ordering, so BigQuery (or whatever engine you run this in) is free to reorder the rows in the output as it sees fit. You may end up getting back 1, 2, 3, or you may receive 3, 2, 1 or any other ordering. The more general principle is that projections are not order-preserving.

While arrays have a well-defined order of their elements, when you use the UNNEST function, you're converting the array into a relation, which doesn't have a well-defined order unless you use ORDER BY. For example, consider this query:

SELECT ARRAY(SELECT x + 1 FROM UNNEST(arr) AS x) AS new_arr
FROM (SELECT [1, 2, 3] AS arr)

The new_arr array isn't actually guaranteed to have the elements [2, 3, 4] in that order, since the query inside the ARRAY function doesn't use ORDER BY. You can address this non-determinism by ordering based on the element offsets, however:

SELECT ARRAY(SELECT x + 1 FROM UNNEST(arr) AS x WITH OFFSET ORDER BY OFFSET) AS new_arr
FROM (SELECT [1, 2, 3] AS arr)

Now the output is guaranteed to be [2, 3, 4].

Going back to your original question, you can ensure that you get deterministic output by imposing an ordering in the subquery that computes the row numbers:

ranked_predictions AS (
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY OFFSET) AS rownum,
DENSE_RANK() OVER (PARTITION BY id ORDER BY flattened_prediction DESC) AS array_rank
FROM
predictions P
CROSS JOIN
UNNEST(P.prediction) AS flattened_prediction WITH OFFSET
)

I added the WITH OFFSET after the UNNEST, and ORDER BY OFFSET inside the ROW_NUMBER window in order to ensure that the row numbers are computed based on the original ordering of the array elements.

How do I do argmax in KDB?

A few different ways, but here is one using the find operator ?

 1 2 3?max 1 2 3

argmax in Spark DataFrames: how to retrieve the row with the maximum value

If schema is Orderable (schema contains only atomics / arrays of atomics / recursively orderable structs) you can use simple aggregations:

Python:

df.select(F.max(
F.struct("values", *(x for x in df.columns if x != "values"))
)).first()

Scala:

df.select(max(struct(
$"values" +: df.columns.collect {case x if x!= "values" => col(x)}: _*
))).first

Otherwise you can reduce over Dataset (Scala only) but it requires additional deserialization:

type T = ???

df.reduce((a, b) => if (a.getAs[T]("values") > b.getAs[T]("values")) a else b)

You can also oredrBy and limit(1) / take(1):

Scala:

df.orderBy(desc("values")).limit(1)
// or
df.orderBy(desc("values")).take(1)

Python:

df.orderBy(F.desc('values')).limit(1)
# or
df.orderBy(F.desc("values")).take(1)

Select only rows with max date

Your query returns what you need - only one row for each _id where column _status_set_at has its max value.
You do not need to change anything in your original query.

count(_id) shows how many rows for each _id in the original table, but not in a query result.
Query result has only one row for each _id because you group by _id.

This query shows that in your query result there is only one row for each _id

SELECT _id, max_status_set_at, count(_id) FROM (
SELECT _id, max(_status_set_at) max_status_set_at
FROM pikta.candidates_states
GROUP BY _id) t
GROUP BY _id

If you need apply a condition on max(_status_set_at) you can use HAVING



Related Topics



Leave a reply



Submit