Dplyr - Using Mutate() Like Rowmeans()

Calculate the mean of some columns using dplyr::mutate

You can use rowMeans with select(., BL1:BL9); Here select(., BL1:BL9) select columns from BL1 to BL9 and rowMeans calculate the row average; You can't directly use a character vector in mutate as columns, which will be treated as is instead of columns:

test %>% mutate(ave = rowMeans(select(., BL1:BL9)))

# BL1 BL2 BL3 BL4 BL5 BL6 BL7 BL8 BL9 BL10 BL11 BL12 ave
#1 5 11 1 1 12 5 10 12 6 11 12 9 7.000000
#2 1 10 5 11 7 6 5 9 9 1 8 4 7.000000
#3 8 10 1 2 7 12 5 9 5 3 3 11 6.555556
#4 5 2 5 4 9 5 5 3 5 2 8 1 4.777778
#5 9 1 1 10 3 5 1 9 9 6 3 12 5.333333
#6 9 7 9 6 3 2 5 4 9 5 1 2 6.000000
#7 3 3 1 9 7 8 7 9 9 11 12 9 6.222222
#8 12 9 3 3 9 11 4 2 5 12 12 12 6.444444
#9 1 7 7 12 6 6 5 3 10 12 5 10 6.333333
#10 12 7 7 1 2 8 5 8 11 9 1 5 6.777778
#11 9 1 5 8 12 6 6 11 3 12 3 9 6.777778
#12 5 6 1 11 10 12 6 7 8 7 8 2 7.333333

How to mutate a new column with row means for select columns in grouped_tbl using dplyr r?

It would be inefficent to use rowwise, instead better option is rowMeans after selecting the columns of interest

library(dplyr)
clean_bmk %>%
ungroup %>%
mutate(
BMK_Mean_Strategic = rowMeans(select(., strategic), na.rm = TRUE),
BMK_Mean_DiffChange = rowMeans(select(., diffchange), na.rm = TRUE),
BMK_Mean_Failure = rowMeans(select(., failure), na.rm = TRUE),
BMK_Mean_Narrow = rowMeans(select(., narrow), na.rm = TRUE),
BMK_R1_Performance = rowMeans(select(., performance_vars), na.rm=TRUE),
BMK_R2_Promotion = rowMeans(select(., promote_vars), na.rm=TRUE),
BMK_R3_Derail = rowMeans(select(., derail_vars), na.rm=TRUE))

Using a reproducible example

data(mtcars)
#v1 <- c('mpg', 'disp')
mtcars %>%
transmute(newMean = rowMeans(select(., v1), na.rm = TRUE)) %>%
head
# newMean
#Mazda RX4 90.50
#Mazda RX4 Wag 90.50
#Datsun 710 65.40
#Hornet 4 Drive 139.70
#Hornet Sportabout 189.35
#Valiant 121.55

Use of a like operator in dplyr

To elaborate on @r2evans comment, what you are looking for is grepl(). This function can tell you whether a string exists in another string. It will return a TRUE or FALSE. You don't actually need the mutate or the case when, and could do it with Base R:

Var1 <-  c("Free Throw", "stepback jumpshot", "pull up jumpshot", "hail mary")

df <- data.frame(Var1)

df$Var2 <- ifelse(grepl("jumpshot", Var1, fixed = TRUE), "Jumpshot", Var1)

df

# Var1 Var2
# 1 Free Throw Free Throw
# 2 stepback jumpshot Jumpshot
# 3 pull up jumpshot Jumpshot
# 4 hail mary hail mary

But if you really want to use dplyr functions, the case statement @r2evans gave will work:


Var1 <- c("Free Throw", "stepback jumpshot", "pull up jumpshot", "hail mary")

df <- data.frame(Var1)

df2 <- df %>%
mutate(Var2 = case_when(grepl("jumpshot", Var1) ~ "Jumpshot",
grepl("block", Var1) ~ "Block",
TRUE ~ Var1))
df2

# Var1 Var2
# 1 Free Throw Free Throw
# 2 stepback jumpshot Jumpshot
# 3 pull up jumpshot Jumpshot
# 4 hail mary hail mary

Using dplyr and mutate to create new columns based on groups and last n rows

You might be able to do something like this:


  1. Set the data as data.table
setDT(data)
  1. Create a small function that returns a list of vectors, showing the sequential last n points, given a vector as input
f <- function(x,n=3) lapply(n:1,\(i) x[i:(i+length(x)-1)])

  1. Apply that function by the grouping vars of interest, remembering to order first. For example to get the prior points, just by player id, you can use f() like this:
data[
order(-match_id),
c("last3", "last2", "last1"):=f(points,3),
by=player_id][]

  1. If you want to group by venue and opponent as well, do this:
data[
order(-match_id),
c("last3", "last2", "last1"):=f(points,3),
by=.(player_id, venue,opponent)][]

Output (by player_id):

    match_id player_id   venue   opponent points last3 last2 last1
<char> <char> <char> <char> <num> <num> <num> <num>
1: match_1 player_1 venue A opponent A 5 NA NA 5
2: match_1 player_2 venue A opponent A 10 NA NA 10
3: match_1 player_3 venue A opponent A 15 NA NA 15
4: match_2 player_1 venue B opponent B 1 NA 5 1
5: match_2 player_2 venue B opponent B 2 NA 10 2
6: match_2 player_3 venue B opponent B 3 NA 15 3
7: match_3 player_1 venue C opponent C 5 5 1 5
8: match_3 player_2 venue C opponent C 7 10 2 7
9: match_3 player_3 venue C opponent C 9 15 3 9
10: match_4 player_1 venue C opponent C 11 1 5 11
11: match_4 player_2 venue C opponent C 2 2 7 2
12: match_4 player_3 venue C opponent C 6 3 9 6

If you want the combined column, you can do this, assuming that you assign the result of the above to r1

r1[, combined:=paste(last3,last2,last1,sep = ","), by=1:nrow(r1)][]

Output:

    match_id player_id   venue   opponent points last3 last2 last1 combined
<char> <char> <char> <char> <num> <num> <num> <num> <char>
1: match_1 player_1 venue A opponent A 5 NA NA 5 NA,NA,5
2: match_1 player_2 venue A opponent A 10 NA NA 10 NA,NA,10
3: match_1 player_3 venue A opponent A 15 NA NA 15 NA,NA,15
4: match_2 player_1 venue B opponent B 1 NA 5 1 NA,5,1
5: match_2 player_2 venue B opponent B 2 NA 10 2 NA,10,2
6: match_2 player_3 venue B opponent B 3 NA 15 3 NA,15,3
7: match_3 player_1 venue C opponent C 5 5 1 5 5,1,5
8: match_3 player_2 venue C opponent C 7 10 2 7 10,2,7
9: match_3 player_3 venue C opponent C 9 15 3 9 15,3,9
10: match_4 player_1 venue C opponent C 11 1 5 11 1,5,11
11: match_4 player_2 venue C opponent C 2 2 7 2 2,7,2
12: match_4 player_3 venue C opponent C 6 3 9 6 3,9,6

Here is the minimal set of code required:

library(data.table)

setDT(data)

f <- function(x,n=3) lapply(n:1,\(i) x[i:(i+length(x)-1)])

data[order(-match_id),c("last3", "last2", "last1"):=f(points,3),by=player_id]
data[, combined:=paste(last3,last2,last1,sep = ","), by=1:nrow(data)]

Update -

The OP now wants to exclude some rows (skip over those rows) under certain conditions. If a mask can be passed to f(), which indicates which rows to include, then, we can adjust f() like this:

f <- function(x,n=3,m=rep(TRUE,length(x))) {
x[!m] <- NA
lapply(n:1,function(i) x[i:(i+length(x)-1)])
}

This example uses the above adjusted version of f() to skip over rows where game_x==0

data[
order(-match_id),
c("last3", "last2", "last1"):=f(points,3,game_x==1),
by=.(player_id)][order(player_id,-match_id)][]

Another update!,

Now the OP wants to completely exclude game_x=0 rows.

rbind(
data[game_x==0],
data[game_x==1][
order(-match_id),
c("last5", "last4", "last3", "last2", "last1"):=f(points,5),
by=.(player_id)][order(player_id,-match_id)],
fill=TRUE
)

Output:

   match_id player_id   venue   opponent game_x points last5 last4 last3 last2 last1
<num> <char> <char> <char> <num> <num> <num> <num> <num> <num> <num>
1: 3 player_1 venue B opponent A 0 15 NA NA NA NA NA
2: 5 player_1 venue B opponent C 0 2 NA NA NA NA NA
3: 4 player_1 venue B opponent C 1 1 NA NA 5 10 1
4: 2 player_1 venue A opponent B 1 10 NA NA NA 5 10
5: 1 player_1 venue A opponent A 1 5 NA NA NA NA 5

Custom function to mutate a new column for row means using starts_with()

We can use quo_name to assign column names

library(dplyr)
library(rlang)

continent_mean <- function(df, continent) {
df %>%
select(starts_with(continent)) %>%
mutate(!!quo_name(continent) := rowMeans(., na.rm = TRUE))
}

continent_mean(df, "asia")


# asia_bangkok asia_tokyo asia_kathmandu asia
#1 NA 41 51 46
#2 NA 42 52 47
#3 33 43 NA 38
#4 NA 44 54 49
#5 35 45 55 45
#6 36 46 56 46
#7 NA 47 57 52
#8 38 48 NA 43
#9 39 49 NA 44
#10 40 NA 60 50

Using base R, we can do similar thing by

continent_mean <- function(df, continent)  {
df1 <- df[startsWith(names(df), "asia")]
df1[continent] <- rowMeans(df1, na.rm = TRUE)
df1
}

If we want rowMeans of all the continents together we can use split.default

sapply(split.default(df, sub("_.*", "", names(df))), rowMeans, na.rm = TRUE)

# asia europe
# [1,] 46 1
# [2,] 47 17
# [3,] 38 13
# [4,] 49 14
# [5,] 45 20
# [6,] 46 6
# [7,] 52 17
# [8,] 43 23
# [9,] 44 19
#[10,] 50 20

Mutate across multiple columns using dplyr

Two possibilities using dplyr:

library(dplyr)

mtcars %>%
rowwise() %>%
mutate(varmean = mean(c_across(mpg:vs)))

This returns

# A tibble: 32 x 12
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb varmean
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 40.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 40.1
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 31.7
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 52.8
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 73.2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 47.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 81.2
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 33.1
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 36.7
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 42.8
# ... with 22 more rows

and without rowwise() and using base Rs rowMeans():

mtcars %>% 
mutate(varmean = rowMeans(across(mpg:vs)))

returns

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb  varmean
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 39.99750
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 40.09938
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 31.69750
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 52.76687
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 73.16375
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 47.69250
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 81.24000
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 33.12250
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 36.69625
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 42.80750


Related Topics



Leave a reply



Submit