Calculate the mean of some columns using dplyr::mutate
You can use rowMeans
with select(., BL1:BL9)
; Here select(., BL1:BL9)
select columns from BL1
to BL9
and rowMeans
calculate the row average; You can't directly use a character vector in mutate
as columns, which will be treated as is instead of columns:
test %>% mutate(ave = rowMeans(select(., BL1:BL9)))
# BL1 BL2 BL3 BL4 BL5 BL6 BL7 BL8 BL9 BL10 BL11 BL12 ave
#1 5 11 1 1 12 5 10 12 6 11 12 9 7.000000
#2 1 10 5 11 7 6 5 9 9 1 8 4 7.000000
#3 8 10 1 2 7 12 5 9 5 3 3 11 6.555556
#4 5 2 5 4 9 5 5 3 5 2 8 1 4.777778
#5 9 1 1 10 3 5 1 9 9 6 3 12 5.333333
#6 9 7 9 6 3 2 5 4 9 5 1 2 6.000000
#7 3 3 1 9 7 8 7 9 9 11 12 9 6.222222
#8 12 9 3 3 9 11 4 2 5 12 12 12 6.444444
#9 1 7 7 12 6 6 5 3 10 12 5 10 6.333333
#10 12 7 7 1 2 8 5 8 11 9 1 5 6.777778
#11 9 1 5 8 12 6 6 11 3 12 3 9 6.777778
#12 5 6 1 11 10 12 6 7 8 7 8 2 7.333333
How to mutate a new column with row means for select columns in grouped_tbl using dplyr r?
It would be inefficent to use rowwise
, instead better option is rowMeans
after select
ing the columns of interest
library(dplyr)
clean_bmk %>%
ungroup %>%
mutate(
BMK_Mean_Strategic = rowMeans(select(., strategic), na.rm = TRUE),
BMK_Mean_DiffChange = rowMeans(select(., diffchange), na.rm = TRUE),
BMK_Mean_Failure = rowMeans(select(., failure), na.rm = TRUE),
BMK_Mean_Narrow = rowMeans(select(., narrow), na.rm = TRUE),
BMK_R1_Performance = rowMeans(select(., performance_vars), na.rm=TRUE),
BMK_R2_Promotion = rowMeans(select(., promote_vars), na.rm=TRUE),
BMK_R3_Derail = rowMeans(select(., derail_vars), na.rm=TRUE))
Using a reproducible example
data(mtcars)
#v1 <- c('mpg', 'disp')
mtcars %>%
transmute(newMean = rowMeans(select(., v1), na.rm = TRUE)) %>%
head
# newMean
#Mazda RX4 90.50
#Mazda RX4 Wag 90.50
#Datsun 710 65.40
#Hornet 4 Drive 139.70
#Hornet Sportabout 189.35
#Valiant 121.55
Use of a like operator in dplyr
To elaborate on @r2evans comment, what you are looking for is grepl()
. This function can tell you whether a string exists in another string. It will return a TRUE or FALSE. You don't actually need the mutate or the case when, and could do it with Base R:
Var1 <- c("Free Throw", "stepback jumpshot", "pull up jumpshot", "hail mary")
df <- data.frame(Var1)
df$Var2 <- ifelse(grepl("jumpshot", Var1, fixed = TRUE), "Jumpshot", Var1)
df
# Var1 Var2
# 1 Free Throw Free Throw
# 2 stepback jumpshot Jumpshot
# 3 pull up jumpshot Jumpshot
# 4 hail mary hail mary
But if you really want to use dplyr
functions, the case statement @r2evans gave will work:
Var1 <- c("Free Throw", "stepback jumpshot", "pull up jumpshot", "hail mary")
df <- data.frame(Var1)
df2 <- df %>%
mutate(Var2 = case_when(grepl("jumpshot", Var1) ~ "Jumpshot",
grepl("block", Var1) ~ "Block",
TRUE ~ Var1))
df2
# Var1 Var2
# 1 Free Throw Free Throw
# 2 stepback jumpshot Jumpshot
# 3 pull up jumpshot Jumpshot
# 4 hail mary hail mary
Using dplyr and mutate to create new columns based on groups and last n rows
You might be able to do something like this:
- Set the data as data.table
setDT(data)
- Create a small function that returns a list of vectors, showing the sequential last
n
points, given a vector as input
f <- function(x,n=3) lapply(n:1,\(i) x[i:(i+length(x)-1)])
- Apply that function by the grouping vars of interest, remembering to order first. For example to get the prior points, just by player id, you can use
f()
like this:
data[
order(-match_id),
c("last3", "last2", "last1"):=f(points,3),
by=player_id][]
- If you want to group by venue and opponent as well, do this:
data[
order(-match_id),
c("last3", "last2", "last1"):=f(points,3),
by=.(player_id, venue,opponent)][]
Output (by player_id):
match_id player_id venue opponent points last3 last2 last1
<char> <char> <char> <char> <num> <num> <num> <num>
1: match_1 player_1 venue A opponent A 5 NA NA 5
2: match_1 player_2 venue A opponent A 10 NA NA 10
3: match_1 player_3 venue A opponent A 15 NA NA 15
4: match_2 player_1 venue B opponent B 1 NA 5 1
5: match_2 player_2 venue B opponent B 2 NA 10 2
6: match_2 player_3 venue B opponent B 3 NA 15 3
7: match_3 player_1 venue C opponent C 5 5 1 5
8: match_3 player_2 venue C opponent C 7 10 2 7
9: match_3 player_3 venue C opponent C 9 15 3 9
10: match_4 player_1 venue C opponent C 11 1 5 11
11: match_4 player_2 venue C opponent C 2 2 7 2
12: match_4 player_3 venue C opponent C 6 3 9 6
If you want the combined column, you can do this, assuming that you assign the result of the above to r1
r1[, combined:=paste(last3,last2,last1,sep = ","), by=1:nrow(r1)][]
Output:
match_id player_id venue opponent points last3 last2 last1 combined
<char> <char> <char> <char> <num> <num> <num> <num> <char>
1: match_1 player_1 venue A opponent A 5 NA NA 5 NA,NA,5
2: match_1 player_2 venue A opponent A 10 NA NA 10 NA,NA,10
3: match_1 player_3 venue A opponent A 15 NA NA 15 NA,NA,15
4: match_2 player_1 venue B opponent B 1 NA 5 1 NA,5,1
5: match_2 player_2 venue B opponent B 2 NA 10 2 NA,10,2
6: match_2 player_3 venue B opponent B 3 NA 15 3 NA,15,3
7: match_3 player_1 venue C opponent C 5 5 1 5 5,1,5
8: match_3 player_2 venue C opponent C 7 10 2 7 10,2,7
9: match_3 player_3 venue C opponent C 9 15 3 9 15,3,9
10: match_4 player_1 venue C opponent C 11 1 5 11 1,5,11
11: match_4 player_2 venue C opponent C 2 2 7 2 2,7,2
12: match_4 player_3 venue C opponent C 6 3 9 6 3,9,6
Here is the minimal set of code required:
library(data.table)
setDT(data)
f <- function(x,n=3) lapply(n:1,\(i) x[i:(i+length(x)-1)])
data[order(-match_id),c("last3", "last2", "last1"):=f(points,3),by=player_id]
data[, combined:=paste(last3,last2,last1,sep = ","), by=1:nrow(data)]
Update -
The OP now wants to exclude some rows (skip over those rows) under certain conditions. If a mask can be passed to f()
, which indicates which rows to include, then, we can adjust f()
like this:
f <- function(x,n=3,m=rep(TRUE,length(x))) {
x[!m] <- NA
lapply(n:1,function(i) x[i:(i+length(x)-1)])
}
This example uses the above adjusted version of f()
to skip over rows where game_x==0
data[
order(-match_id),
c("last3", "last2", "last1"):=f(points,3,game_x==1),
by=.(player_id)][order(player_id,-match_id)][]
Another update!,
Now the OP wants to completely exclude game_x=0
rows.
rbind(
data[game_x==0],
data[game_x==1][
order(-match_id),
c("last5", "last4", "last3", "last2", "last1"):=f(points,5),
by=.(player_id)][order(player_id,-match_id)],
fill=TRUE
)
Output:
match_id player_id venue opponent game_x points last5 last4 last3 last2 last1
<num> <char> <char> <char> <num> <num> <num> <num> <num> <num> <num>
1: 3 player_1 venue B opponent A 0 15 NA NA NA NA NA
2: 5 player_1 venue B opponent C 0 2 NA NA NA NA NA
3: 4 player_1 venue B opponent C 1 1 NA NA 5 10 1
4: 2 player_1 venue A opponent B 1 10 NA NA NA 5 10
5: 1 player_1 venue A opponent A 1 5 NA NA NA NA 5
Custom function to mutate a new column for row means using starts_with()
We can use quo_name
to assign column names
library(dplyr)
library(rlang)
continent_mean <- function(df, continent) {
df %>%
select(starts_with(continent)) %>%
mutate(!!quo_name(continent) := rowMeans(., na.rm = TRUE))
}
continent_mean(df, "asia")
# asia_bangkok asia_tokyo asia_kathmandu asia
#1 NA 41 51 46
#2 NA 42 52 47
#3 33 43 NA 38
#4 NA 44 54 49
#5 35 45 55 45
#6 36 46 56 46
#7 NA 47 57 52
#8 38 48 NA 43
#9 39 49 NA 44
#10 40 NA 60 50
Using base R, we can do similar thing by
continent_mean <- function(df, continent) {
df1 <- df[startsWith(names(df), "asia")]
df1[continent] <- rowMeans(df1, na.rm = TRUE)
df1
}
If we want rowMeans
of all the continents together we can use split.default
sapply(split.default(df, sub("_.*", "", names(df))), rowMeans, na.rm = TRUE)
# asia europe
# [1,] 46 1
# [2,] 47 17
# [3,] 38 13
# [4,] 49 14
# [5,] 45 20
# [6,] 46 6
# [7,] 52 17
# [8,] 43 23
# [9,] 44 19
#[10,] 50 20
Mutate across multiple columns using dplyr
Two possibilities using dplyr
:
library(dplyr)
mtcars %>%
rowwise() %>%
mutate(varmean = mean(c_across(mpg:vs)))
This returns
# A tibble: 32 x 12
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb varmean
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 40.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 40.1
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 31.7
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 52.8
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 73.2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 47.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 81.2
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 33.1
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 36.7
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 42.8
# ... with 22 more rows
and without rowwise()
and using base R
s rowMeans()
:
mtcars %>%
mutate(varmean = rowMeans(across(mpg:vs)))
returns
mpg cyl disp hp drat wt qsec vs am gear carb varmean
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 39.99750
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 40.09938
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 31.69750
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 52.76687
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 73.16375
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 47.69250
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 81.24000
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 33.12250
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 36.69625
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 42.80750
Related Topics
How to Match by Nearest Date from Two Data Frames
Passing Several Arguments to Fun of Lapply (And Others *Apply)
How to Deal with "Data of Class Uneval" Error from Ggplot2
Creating Regular 15-Minute Time-Series from Irregular Time-Series
How to Determine the Namespace of a Function
Improve Centering County Names Ggplot & Maps
Collect All User Inputs Throughout the Shiny App
Typeof Returns Integer for Something That Is Clearly a Factor
Find Neighbouring Elements of a Matrix in R
Randomly Insert Nas into Dataframe Proportionaly
How to Use Multiple Versions of the Same R Package
Error in File(File, "Rt"):Cannot Open the Connection
How to Redirect Console Output to a Variable
Plots Generated by 'Plot' and 'Ggplot' Side-By-Side