Find Max Per Group and Return Another Column

Get the max value one column but group on another

Window function ROW_NUMBER() to the rescue.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (userId INT, action VARCHAR(10), actionTime DATETIME)
INSERT INTO @tbl (userId, action, actionTime) VALUES
(1, 'jump', '2022-01-01 07:50:00.000'),
(2, 'run' , '2022-01-02 07:50:00.000'),
(3, 'walk', '2022-01-01 07:50:00.000'),
(3, 'run' , '2022-01-04 07:50:00.000'),
(4, 'jump', '2022-01-01 07:50:00.000'),
(2, 'walk', '2022-01-01 07:50:00.000'),
(1, 'walk', '2022-01-01 01:50:00.000'),
(1, 'walk', '2022-01-03 04:50:00.000'),
(4, 'run' , '2022-01-03 07:50:00.000');
-- DDL and sample data population, end

;WITH rs AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY userId ORDER BY actionTime DESC) AS seq
FROM @tbl
)
SELECT *
FROM rs
WHERE seq = 1;

Output

+--------+--------+-------------------------+-----+
| userId | action | actionTime | seq |
+--------+--------+-------------------------+-----+
| 1 | walk | 2022-01-03 04:50:00.000 | 1 |
| 2 | run | 2022-01-02 07:50:00.000 | 1 |
| 3 | run | 2022-01-04 07:50:00.000 | 1 |
| 4 | run | 2022-01-03 07:50:00.000 | 1 |
+--------+--------+-------------------------+-----+

Pandas groupby and then find max value per group in another column

I think the following is gonna work.

test = your_dataset.groupby('product_id')['uxp_total_bought'].max()
test = test.reset_index()
test = your_dataset.loc[uxp.groupby("user_id")["uxp_total_bought"].idxmax()]
del test["uxp_total_bought"]
test.rename(columns = {"product_id":"favourite_product_id"}, inplace = True)

Find max per group and return another column

Base solution, not as simple as Dan M's:

testMatrix <- data.frame(GroupID = c(1,1,2), ElementID = c(10,20,30), 
Value=c(300,100,200), Name=c("A","B","C"))

A <- lapply(split(testMatrix, testMatrix$GroupID), function(x) {
x[which.max(x$Value), c(1, 4)]
}
)
do.call(rbind, A)

Get records with max value for each group of grouped SQL results

There's a super-simple way to do this in mysql:

select * 
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`

This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.

You avoid complicated subqueries that try to find the max() etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)

Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.

Version 5.7 update:

Since version 5.7, the sql-mode setting includes ONLY_FULL_GROUP_BY by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).

Using group by to get the value corresponding to the max value of another column

Up front, "never" (okay, almost never) use df$ within a dplyr pipe. In this case, df$value[which.max(df$age)] is referencing the original data each time, not the grouped data. Inside each group in this dataset, value is length 3 whereas df$value is length 9.

The only times I feel it is appropriate to use df$ (referencing the original value of the current dataset) inside a pipe is when it is required to look at pre-pipeline data, in absence of any grouping, reordering, or new variables created outside of the currently-saved (pre-pipeline) version of df.

dplyr

library(dplyr)
df %>%
group_by(groups) %>%
mutate(new_value = value[which.max(age)]) %>%
ungroup()
# # A tibble: 9 x 4
# groups age value new_value
# <dbl> <dbl> <dbl> <dbl>
# 1 1 12 1 3
# 2 1 23 2 3
# 3 1 34 3 3
# 4 2 13 4 6
# 5 2 24 5 6
# 6 2 35 6 6
# 7 3 13 7 9
# 8 3 25 8 9
# 9 3 36 9 9

data.table

library(data.table)
DT <- as.data.table(df)
DT[, new_value := value[which.max(age)], by = .(groups)]

base R

df$new_value <- ave(seq_len(nrow(df)), df$groups,
FUN = function(i) df$value[i][which.max(df$age[i])])
df
# groups age value new_value
# 1 1 12 1 3
# 2 1 23 2 3
# 3 1 34 3 3
# 4 2 13 4 6
# 5 2 24 5 6
# 6 2 35 6 6
# 7 3 13 7 9
# 8 3 25 8 9
# 9 3 36 9 9

The base R approach seems to be the least-elegant-looking solution. I believe that ave is the best approach, but it has many limitations, first being that it only works on one value-object (value) in the absence of others (we need to know age).

select max, group by and display other column that's not in group by clause

Try the following query:

Solution #1:

SELECT 
products.name,
products.type,
products.price
FROM products
INNER JOIN
(
SELECT type,MAX(price) max_price
FROM products
GROUP BY type ) t
ON products.type = t.type
AND products.price = t.max_price;

Demo Here

Solution #2:

SELECT
products.name,
products.type,
products.price
FROM
products
WHERE (type, price) IN (
SELECT type, MAX(price) max_price
FROM products
GROUP BY type )

See Demo

EDIT:


Note: Both solutions might give you multiple products under same type if they share the same maximum price.

If you strictly want at most one item from each type then you need to group by again in the last line.

So for both solutions the last line would be:

GROUP BY products.type, products.price

See Demo of it

Find maximum value of one column based on group_by multiple other columns

We can use slice_max instead of summarise to return all the columns after the select step

library(dplyr)
df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
slice_max(order_by = 'ord', n = 1)

If we need to create a new column, use mutate

df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
mutate(ordMax = max(ord, na.rm = TRUE)) %>%
ungroup

How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?

You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:

SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime


Related Topics



Leave a reply



Submit