Get the max value one column but group on another
Window function ROW_NUMBER()
to the rescue.
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (userId INT, action VARCHAR(10), actionTime DATETIME)
INSERT INTO @tbl (userId, action, actionTime) VALUES
(1, 'jump', '2022-01-01 07:50:00.000'),
(2, 'run' , '2022-01-02 07:50:00.000'),
(3, 'walk', '2022-01-01 07:50:00.000'),
(3, 'run' , '2022-01-04 07:50:00.000'),
(4, 'jump', '2022-01-01 07:50:00.000'),
(2, 'walk', '2022-01-01 07:50:00.000'),
(1, 'walk', '2022-01-01 01:50:00.000'),
(1, 'walk', '2022-01-03 04:50:00.000'),
(4, 'run' , '2022-01-03 07:50:00.000');
-- DDL and sample data population, end
;WITH rs AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY userId ORDER BY actionTime DESC) AS seq
FROM @tbl
)
SELECT *
FROM rs
WHERE seq = 1;
Output
+--------+--------+-------------------------+-----+
| userId | action | actionTime | seq |
+--------+--------+-------------------------+-----+
| 1 | walk | 2022-01-03 04:50:00.000 | 1 |
| 2 | run | 2022-01-02 07:50:00.000 | 1 |
| 3 | run | 2022-01-04 07:50:00.000 | 1 |
| 4 | run | 2022-01-03 07:50:00.000 | 1 |
+--------+--------+-------------------------+-----+
Pandas groupby and then find max value per group in another column
I think the following is gonna work.
test = your_dataset.groupby('product_id')['uxp_total_bought'].max()
test = test.reset_index()
test = your_dataset.loc[uxp.groupby("user_id")["uxp_total_bought"].idxmax()]
del test["uxp_total_bought"]
test.rename(columns = {"product_id":"favourite_product_id"}, inplace = True)
Find max per group and return another column
Base solution, not as simple as Dan M's:
testMatrix <- data.frame(GroupID = c(1,1,2), ElementID = c(10,20,30),
Value=c(300,100,200), Name=c("A","B","C"))
A <- lapply(split(testMatrix, testMatrix$GroupID), function(x) {
x[which.max(x$Value), c(1, 4)]
}
)
do.call(rbind, A)
Get records with max value for each group of grouped SQL results
There's a super-simple way to do this in mysql:
select *
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`
This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.
You avoid complicated subqueries that try to find the max()
etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)
Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.
Version 5.7 update:
Since version 5.7, the sql-mode
setting includes ONLY_FULL_GROUP_BY
by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).
Using group by to get the value corresponding to the max value of another column
Up front, "never" (okay, almost never) use df$
within a dplyr pipe. In this case, df$value[which.max(df$age)]
is referencing the original data each time, not the grouped data. Inside each group in this dataset, value
is length 3 whereas df$value
is length 9.
The only times I feel it is appropriate to use df$
(referencing the original value of the current dataset) inside a pipe is when it is required to look at pre-pipeline data, in absence of any grouping, reordering, or new variables created outside of the currently-saved (pre-pipeline) version of df
.
dplyr
library(dplyr)
df %>%
group_by(groups) %>%
mutate(new_value = value[which.max(age)]) %>%
ungroup()
# # A tibble: 9 x 4
# groups age value new_value
# <dbl> <dbl> <dbl> <dbl>
# 1 1 12 1 3
# 2 1 23 2 3
# 3 1 34 3 3
# 4 2 13 4 6
# 5 2 24 5 6
# 6 2 35 6 6
# 7 3 13 7 9
# 8 3 25 8 9
# 9 3 36 9 9
data.table
library(data.table)
DT <- as.data.table(df)
DT[, new_value := value[which.max(age)], by = .(groups)]
base R
df$new_value <- ave(seq_len(nrow(df)), df$groups,
FUN = function(i) df$value[i][which.max(df$age[i])])
df
# groups age value new_value
# 1 1 12 1 3
# 2 1 23 2 3
# 3 1 34 3 3
# 4 2 13 4 6
# 5 2 24 5 6
# 6 2 35 6 6
# 7 3 13 7 9
# 8 3 25 8 9
# 9 3 36 9 9
The base R approach seems to be the least-elegant-looking solution. I believe that ave
is the best approach, but it has many limitations, first being that it only works on one value-object (value
) in the absence of others (we need to know age
).
select max, group by and display other column that's not in group by clause
Try the following query:
Solution #1:
SELECT
products.name,
products.type,
products.price
FROM products
INNER JOIN
(
SELECT type,MAX(price) max_price
FROM products
GROUP BY type ) t
ON products.type = t.type
AND products.price = t.max_price;
Demo Here
Solution #2:
SELECT
products.name,
products.type,
products.price
FROM
products
WHERE (type, price) IN (
SELECT type, MAX(price) max_price
FROM products
GROUP BY type )
See Demo
EDIT:
Note: Both solutions might give you multiple products under same type
if they share the same maximum
price.
If you strictly want at most one item from each type then you need to group by
again in the last line.
So for both solutions the last line would be:
GROUP BY products.type, products.price
See Demo of it
Find maximum value of one column based on group_by multiple other columns
We can use slice_max
instead of summarise
to return all the columns after the select
step
library(dplyr)
df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
slice_max(order_by = 'ord', n = 1)
If we need to create a new column, use mutate
df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
mutate(ordMax = max(ord, na.rm = TRUE)) %>%
ungroup
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten
table on BOTH fields:
SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
Related Topics
Mlogit: Missing Value Where True/False Needed
Create Group Based on Fuzzy Criteria
How to Round Percentage to 2 Decimal Places in Ggplot2
How to Create a Rank Variable Under Certain Conditions
Change Position of Tick Marks of a Single Graph, Using Ggplot2
How to Use Stat_Bin2D() to Compute Counts Labels in Ggplot2
Linear Regression with Constraints on The Coefficients
Aggregating Rows for Multiple Columns in R
Error in Xj[I]: Invalid Subscript Type 'List'
Summing Multiple Columns in an R Data-Frame Quickly
Merge Data Based on Nearest Date R
Piecewise Function Fitting with Nls() in R
Ggplot2: Making Changes to Symbols in The Legend
Ggplot2 Ggsave Function Causes Graphics Device to Not Display Plots