Finding Maximum Value of One Column (By Group) and Inserting Value into Another Data Frame in R

Finding maximum value of one column (by group) and inserting value into another data frame in R

It sounds like you're just looking for aggregate:

> aggregate(cbind(x1, x2, x3, x4) ~ country1 + year, Data, max)
country1 year x1 x2 x3 x4
1 B 1998 30 10 30 2
2 A 2000 95 90 25 90
3 C 2005 90 90 5 40

It's not very clear from your question how you want to proceed from there though....

Find maximum value of one column based on group_by multiple other columns

We can use slice_max instead of summarise to return all the columns after the select step

library(dplyr)
df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
slice_max(order_by = 'ord', n = 1)

If we need to create a new column, use mutate

df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
mutate(ordMax = max(ord, na.rm = TRUE)) %>%
ungroup

How to find the maximum value within each group and then recode all other values in the group as zero?

You can try this

df %>%
group_by(Id) %>%
mutate(maxByGroup = (which.max(value) == seq_along(value)) * value) %>%
ungroup()

which gives

      Id value maxByGroup
<dbl> <dbl> <dbl>
1 1 500 500
2 1 500 0
3 1 500 0
4 2 250 250
5 2 250 0
6 2 250 0
7 3 300 300
8 3 300 0
9 3 300 0
10 4 400 400
11 4 400 0
12 4 400 0

Extract the maximum value within each group in a dataframe

There are many possibilities to do this in R. Here are some of them:

df <- read.table(header = TRUE, text = 'Gene   Value
A 12
A 10
B 3
B 5
B 6
C 1
D 3
D 4')

# aggregate
aggregate(df$Value, by = list(df$Gene), max)
aggregate(Value ~ Gene, data = df, max)

# tapply
tapply(df$Value, df$Gene, max)

# split + lapply
lapply(split(df, df$Gene), function(y) max(y$Value))

# plyr
require(plyr)
ddply(df, .(Gene), summarise, Value = max(Value))

# dplyr
require(dplyr)
df %>% group_by(Gene) %>% summarise(Value = max(Value))

# data.table
require(data.table)
dt <- data.table(df)
dt[ , max(Value), by = Gene]

# doBy
require(doBy)
summaryBy(Value~Gene, data = df, FUN = max)

# sqldf
require(sqldf)
sqldf("select Gene, max(Value) as Value from df group by Gene", drv = 'SQLite')

# ave
df[as.logical(ave(df$Value, df$Gene, FUN = function(x) x == max(x))),]

Select the row with the maximum value in each group

Here's a data.table solution:

require(data.table) ## 1.9.2
group <- as.data.table(group)

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.

Assigning max value of column grouped by another column dynamically in dplyr

If we need to apply on multiple columns use mutate_at

my.events %>% 
group_by(x) %>%
mutate_at(vars(starts_with("type")), max)
# A tibble: 5 x 4
# Groups: x [4]
# x title typethx typesea
# <date> <chr> <dbl> <dbl>
#1 2016-11-24 Thanksgiving 1 0
#2 2016-11-25 Thanksgiving 2 0
#3 2016-11-26 Thanksgiving 3 1
#4 2016-11-26 Season 3 1
#5 2016-11-27 Season 0 2

In R, how do I add a max by group?

Try

# This is how you create your data.frame
group<-c("A","A","A","A","A","B","B","C","C","C")
replicate<-c(1,2,3,4,5,1,2,1,2,3)
x<-data.frame(group,replicate) # here you don't need c()

# Here's my solution
Max <- tapply(x$replicate, x$group,max)
data.frame(x, max.per.group=rep(Max, table(x$group)))
group replicate max.per.group
1 A 1 5
2 A 2 5
3 A 3 5
4 A 4 5
5 A 5 5
6 B 1 2
7 B 2 2
8 C 1 3
9 C 2 3
10 C 3 3

creating new column based on highest values

You can use the pmax function from baseR to pull the max value across a defined set of columns in your dataframe. In our case this will be inspecting the education and education_partner fields.

new_data <- data %>%
mutate(highest_degree = pmax(education, education_partner, na.rm = TRUE))

Output:

  ID marital education education_partner highest_degree
1 1 1 14 18 18
2 2 4 18 NA 18
3 3 0 10 NA 10
4 4 2 12 14 14

Find the maximum value for a specific value in a column?

An option with base R

aggregate(cbind(Highest_Mutation_Frequency = Mutation_Frequency) ~ Gene.Name, data, FUN = max)

How to get the maximum value by group

If you know sql this is easier to understand

library(sqldf)
sqldf('select year, max(score) from mydata group by year')

Update (2016-01): Now you can also use dplyr

library(dplyr)
mydata %>% group_by(year) %>% summarise(max = max(score))


Related Topics



Leave a reply



Submit