Finding Running Maximum by Group

Finding running maximum by group

you can do it so:

df$curMax <- ave(df$var, df$group, FUN=cummax)

R/dplyr Get running maximum value

you can simply use cummax:

library(dplyr)
df %>%
arrange(time) %>%
mutate(maxvalue = cummax(value))

Get the row(s) which have the max value in groups using groupby

In [1]: df
Out[1]:
Sp Mt Value count
0 MM1 S1 a 3
1 MM1 S1 n 2
2 MM1 S3 cb 5
3 MM2 S3 mk 8
4 MM2 S4 bg 10
5 MM2 S4 dgd 1
6 MM4 S2 rd 2
7 MM4 S2 cb 2
8 MM4 S2 uyi 7

In [2]: df.groupby(['Mt'], sort=False)['count'].max()
Out[2]:
Mt
S1 3
S3 8
S4 10
S2 7
Name: count

To get the indices of the original DF you can do:

In [3]: idx = df.groupby(['Mt'])['count'].transform(max) == df['count']

In [4]: df[idx]
Out[4]:
Sp Mt Value count
0 MM1 S1 a 3
3 MM2 S3 mk 8
4 MM2 S4 bg 10
8 MM4 S2 uyi 7

Note that if you have multiple max values per group, all will be returned.

Update

On a hail mary chance that this is what the OP is requesting:

In [5]: df['count_max'] = df.groupby(['Mt'])['count'].transform(max)

In [6]: df
Out[6]:
Sp Mt Value count count_max
0 MM1 S1 a 3 3
1 MM1 S1 n 2 3
2 MM1 S3 cb 5 8
3 MM2 S3 mk 8 8
4 MM2 S4 bg 10 10
5 MM2 S4 dgd 1 10
6 MM4 S2 rd 2 7
7 MM4 S2 cb 2 7
8 MM4 S2 uyi 7 7

Get records with max value for each group of grouped SQL results

There's a super-simple way to do this in mysql:

select * 
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`

This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.

You avoid complicated subqueries that try to find the max() etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)

Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.

Version 5.7 update:

Since version 5.7, the sql-mode setting includes ONLY_FULL_GROUP_BY by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).

How to get the maximum value by group

If you know sql this is easier to understand

library(sqldf)
sqldf('select year, max(score) from mydata group by year')

Update (2016-01): Now you can also use dplyr

library(dplyr)
mydata %>% group_by(year) %>% summarise(max = max(score))

R find min and max for each group based on other row

With tidyverse you can try the following approach. First, put your data into long form targeting your year columns. Then, group_by both group and name (which contains the year) and only include subgroups that have a value of x, and keep rows that have condition of 1. Then group_by just group and summarise to get the min and max years. Note, you may wish to convert your year data to numeric after removing x by filtering on condition.

library(tidyverse)

df1 %>%
pivot_longer(cols = -c(group, condition)) %>%
group_by(group, name) %>%
filter(any(value == "x"), condition == 1) %>%
group_by(group) %>%
summarise(min = min(value),
max = max(value))

Output

# A tibble: 3 x 3
group min max
<chr> <chr> <chr>
1 a 2010 2013
2 b 2011 2015
3 c 2010 2014

In R, how do I add a max by group?

Try

# This is how you create your data.frame
group<-c("A","A","A","A","A","B","B","C","C","C")
replicate<-c(1,2,3,4,5,1,2,1,2,3)
x<-data.frame(group,replicate) # here you don't need c()

# Here's my solution
Max <- tapply(x$replicate, x$group,max)
data.frame(x, max.per.group=rep(Max, table(x$group)))
group replicate max.per.group
1 A 1 5
2 A 2 5
3 A 3 5
4 A 4 5
5 A 5 5
6 B 1 2
7 B 2 2
8 C 1 3
9 C 2 3
10 C 3 3


Related Topics



Leave a reply



Submit