Extract the maximum value within each group in a dataframe
There are many possibilities to do this in R. Here are some of them:
df <- read.table(header = TRUE, text = 'Gene Value
A 12
A 10
B 3
B 5
B 6
C 1
D 3
D 4')
# aggregate
aggregate(df$Value, by = list(df$Gene), max)
aggregate(Value ~ Gene, data = df, max)
# tapply
tapply(df$Value, df$Gene, max)
# split + lapply
lapply(split(df, df$Gene), function(y) max(y$Value))
# plyr
require(plyr)
ddply(df, .(Gene), summarise, Value = max(Value))
# dplyr
require(dplyr)
df %>% group_by(Gene) %>% summarise(Value = max(Value))
# data.table
require(data.table)
dt <- data.table(df)
dt[ , max(Value), by = Gene]
# doBy
require(doBy)
summaryBy(Value~Gene, data = df, FUN = max)
# sqldf
require(sqldf)
sqldf("select Gene, max(Value) as Value from df group by Gene", drv = 'SQLite')
# ave
df[as.logical(ave(df$Value, df$Gene, FUN = function(x) x == max(x))),]
Get the row(s) which have the max value in groups using groupby
In [1]: df
Out[1]:
Sp Mt Value count
0 MM1 S1 a 3
1 MM1 S1 n 2
2 MM1 S3 cb 5
3 MM2 S3 mk 8
4 MM2 S4 bg 10
5 MM2 S4 dgd 1
6 MM4 S2 rd 2
7 MM4 S2 cb 2
8 MM4 S2 uyi 7
In [2]: df.groupby(['Mt'], sort=False)['count'].max()
Out[2]:
Mt
S1 3
S3 8
S4 10
S2 7
Name: count
To get the indices of the original DF you can do:
In [3]: idx = df.groupby(['Mt'])['count'].transform(max) == df['count']
In [4]: df[idx]
Out[4]:
Sp Mt Value count
0 MM1 S1 a 3
3 MM2 S3 mk 8
4 MM2 S4 bg 10
8 MM4 S2 uyi 7
Note that if you have multiple max values per group, all will be returned.
Update
On a hail mary chance that this is what the OP is requesting:
In [5]: df['count_max'] = df.groupby(['Mt'])['count'].transform(max)
In [6]: df
Out[6]:
Sp Mt Value count count_max
0 MM1 S1 a 3 3
1 MM1 S1 n 2 3
2 MM1 S3 cb 5 8
3 MM2 S3 mk 8 8
4 MM2 S4 bg 10 10
5 MM2 S4 dgd 1 10
6 MM4 S2 rd 2 7
7 MM4 S2 cb 2 7
8 MM4 S2 uyi 7 7
Get the max value from each group with pandas.DataFrame.groupby
From your original DataFrame you can .value_counts
, which returns a descending count within group, and then given this sorting drop_duplicates
will keep the most frequent within group.
df1 = (df.groupby('col1')['col2'].value_counts()
.rename('counts').reset_index()
.drop_duplicates('col1'))
col1 col2 counts
0 A AY 3
2 B BX 3
4 C CX 5
Select the row with the maximum value in each group
Here's a data.table
solution:
require(data.table) ## 1.9.2
group <- as.data.table(group)
If you want to keep all the entries corresponding to max values of pt
within each group:
group[group[, .I[pt == max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2
If you'd like just the first max value of pt
:
group[group[, .I[which.max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2
In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.
How to find the maximum value within each group and then recode all other values in the group as zero?
You can try this
df %>%
group_by(Id) %>%
mutate(maxByGroup = (which.max(value) == seq_along(value)) * value) %>%
ungroup()
which gives
Id value maxByGroup
<dbl> <dbl> <dbl>
1 1 500 500
2 1 500 0
3 1 500 0
4 2 250 250
5 2 250 0
6 2 250 0
7 3 300 300
8 3 300 0
9 3 300 0
10 4 400 400
11 4 400 0
12 4 400 0
Multiple column groupby with pandas to find maximum value for each group
I would do it by using merge
on the grouped data.
Based on this data:
df = pd.DataFrame({'Feature':['age']*9+['talk']*9,
'value':(['No']*3+['Yes']*3+['[Null]']*3)*2,
'frequency':[2700,1707,83,222,15,8,323,8,5,20,170,500,210,1500,809,234,43,85],
'label':['N','P','O']*6})
Using:
df.groupby(['Feature','value'],as_index=False)['frequency'].max().merge(df,on=['Feature','Value','frequency'])
Outputs:
Feature value frequency label
0 age No 2700 N
1 age Yes 222 N
2 age [Null] 323 N
3 talk No 500 O
4 talk Yes 1500 P
5 talk [Null] 234 N
Adding the extra column can be done via a simple assignment:
df_1['sum_no_max'] = df.groupby(['Feature','value'])['frequency'].sum().values - df_1['frequency'].values
Finally outputting:
Feature value frequency label sum_no_max
0 age No 2700 N 1790
1 age Yes 222 N 23
2 age [Null] 323 N 13
3 talk No 500 O 190
4 talk Yes 1500 P 1019
5 talk [Null] 234 N 128
Groupby and filter by max value in pandas
You can do this:
latest = df.query('Value==1').groupby("ID").max("year").assign(Latest = "Latest")
pd.merge(df,latest,how="outer")
Value ID Date Latest
0 1 5 2012 NaN
1 1 5 2013 Latest
2 0 12 2017 NaN
3 0 12 2022 NaN
4 1 27 2005 NaN
5 1 27 2011 Latest
Related Topics
How to Declare a Vector of Zeros in R
Add Row to a Data Frame With Total Sum for Each Column
Removing Columns That Are All 0
How to Change Y Axis Limits in Decimal Points in R
Minimum (Or Maximum) Value of Each Row Across Multiple Columns
How to Find the Closest Date to a Given Date
Pass a String as Variable Name in Dplyr::Filter
How to Join (Merge) Data Frames (Inner, Outer, Left, Right)
Why Are These Numbers Not Equal
Split Comma-Separated Strings in a Column into Separate Rows
Collapse/Concatenate/Aggregate a Column to a Single Comma Separated String Within Each Group
Order Bars in Ggplot2 Bar Graph
Error: Could Not Find Function ... in R
Numbering Rows Within Groups in a Data Frame
Mean Per Group in a Data.Frame
Replacing Nas With Latest Non-Na Value