Dplyr - Using Mutate() Like Rowmeans()

Calculate the mean of some columns using dplyr::mutate

You can use rowMeans with select(., BL1:BL9); Here select(., BL1:BL9) select columns from BL1 to BL9 and rowMeans calculate the row average; You can't directly use a character vector in mutate as columns, which will be treated as is instead of columns:

test %>% mutate(ave = rowMeans(select(., BL1:BL9)))

#   BL1 BL2 BL3 BL4 BL5 BL6 BL7 BL8 BL9 BL10 BL11 BL12      ave
#1    5  11   1   1  12   5  10  12   6   11   12    9 7.000000
#2    1  10   5  11   7   6   5   9   9    1    8    4 7.000000
#3    8  10   1   2   7  12   5   9   5    3    3   11 6.555556
#4    5   2   5   4   9   5   5   3   5    2    8    1 4.777778
#5    9   1   1  10   3   5   1   9   9    6    3   12 5.333333
#6    9   7   9   6   3   2   5   4   9    5    1    2 6.000000
#7    3   3   1   9   7   8   7   9   9   11   12    9 6.222222
#8   12   9   3   3   9  11   4   2   5   12   12   12 6.444444
#9    1   7   7  12   6   6   5   3  10   12    5   10 6.333333
#10  12   7   7   1   2   8   5   8  11    9    1    5 6.777778
#11   9   1   5   8  12   6   6  11   3   12    3    9 6.777778
#12   5   6   1  11  10  12   6   7   8    7    8    2 7.333333

How to mutate a new column with row means for select columns in grouped_tbl using dplyr r?

It would be inefficent to use rowwise, instead better option is rowMeans after selecting the columns of interest

library(dplyr)
clean_bmk %>% 
    ungroup %>%
    mutate(
      BMK_Mean_Strategic = rowMeans(select(., strategic),  na.rm = TRUE),
       BMK_Mean_DiffChange = rowMeans(select(., diffchange), na.rm = TRUE),
       BMK_Mean_Failure = rowMeans(select(., failure), na.rm = TRUE),
       BMK_Mean_Narrow = rowMeans(select(., narrow), na.rm = TRUE),
       BMK_R1_Performance = rowMeans(select(., performance_vars), na.rm=TRUE),
       BMK_R2_Promotion = rowMeans(select(., promote_vars), na.rm=TRUE),
       BMK_R3_Derail = rowMeans(select(., derail_vars), na.rm=TRUE))

Using a reproducible example

data(mtcars)
#v1 <- c('mpg', 'disp')
mtcars %>%
   transmute(newMean = rowMeans(select(., v1), na.rm = TRUE)) %>%
   head  
#                  newMean
#Mazda RX4           90.50
#Mazda RX4 Wag       90.50
#Datsun 710          65.40
#Hornet 4 Drive     139.70
#Hornet Sportabout  189.35
#Valiant            121.55

Use of a like operator in dplyr

To elaborate on @r2evans comment, what you are looking for is grepl(). This function can tell you whether a string exists in another string. It will return a TRUE or FALSE. You don't actually need the mutate or the case when, and could do it with Base R:

Var1 <-  c("Free Throw", "stepback jumpshot", "pull up jumpshot", "hail mary")

df <- data.frame(Var1) 

df$Var2 <- ifelse(grepl("jumpshot", Var1, fixed = TRUE), "Jumpshot", Var1)

df

#                Var1       Var2
# 1        Free Throw Free Throw
# 2 stepback jumpshot   Jumpshot
# 3  pull up jumpshot   Jumpshot
# 4         hail mary  hail mary

But if you really want to use dplyr functions, the case statement @r2evans gave will work:


Var1 <-  c("Free Throw", "stepback jumpshot", "pull up jumpshot", "hail mary")

df <- data.frame(Var1) 

df2 <- df %>% 
  mutate(Var2 = case_when(grepl("jumpshot", Var1) ~ "Jumpshot", 
                          grepl("block", Var1) ~ "Block", 
                          TRUE ~ Var1))
df2

#                Var1       Var2
# 1        Free Throw Free Throw
# 2 stepback jumpshot   Jumpshot
# 3  pull up jumpshot   Jumpshot
# 4         hail mary  hail mary

Using dplyr and mutate to create new columns based on groups and last n rows

You might be able to do something like this:

Set the data as data.table

setDT(data)

Create a small function that returns a list of vectors, showing the sequential last n points, given a vector as input

f <- function(x,n=3) lapply(n:1,\(i) x[i:(i+length(x)-1)])

Apply that function by the grouping vars of interest, remembering to order first. For example to get the prior points, just by player id, you can use f() like this:

data[
  order(-match_id),
  c("last3", "last2", "last1"):=f(points,3),
  by=player_id][]

If you want to group by venue and opponent as well, do this:

data[
  order(-match_id),
  c("last3", "last2", "last1"):=f(points,3),
  by=.(player_id, venue,opponent)][]

Output (by player_id):

    match_id player_id   venue   opponent points last3 last2 last1
      <char>    <char>  <char>     <char>  <num> <num> <num> <num>
 1:  match_1  player_1 venue A opponent A      5    NA    NA     5
 2:  match_1  player_2 venue A opponent A     10    NA    NA    10
 3:  match_1  player_3 venue A opponent A     15    NA    NA    15
 4:  match_2  player_1 venue B opponent B      1    NA     5     1
 5:  match_2  player_2 venue B opponent B      2    NA    10     2
 6:  match_2  player_3 venue B opponent B      3    NA    15     3
 7:  match_3  player_1 venue C opponent C      5     5     1     5
 8:  match_3  player_2 venue C opponent C      7    10     2     7
 9:  match_3  player_3 venue C opponent C      9    15     3     9
10:  match_4  player_1 venue C opponent C     11     1     5    11
11:  match_4  player_2 venue C opponent C      2     2     7     2
12:  match_4  player_3 venue C opponent C      6     3     9     6

If you want the combined column, you can do this, assuming that you assign the result of the above to r1

r1[, combined:=paste(last3,last2,last1,sep = ","), by=1:nrow(r1)][]

Output:

    match_id player_id   venue   opponent points last3 last2 last1 combined
      <char>    <char>  <char>     <char>  <num> <num> <num> <num>   <char>
 1:  match_1  player_1 venue A opponent A      5    NA    NA     5  NA,NA,5
 2:  match_1  player_2 venue A opponent A     10    NA    NA    10 NA,NA,10
 3:  match_1  player_3 venue A opponent A     15    NA    NA    15 NA,NA,15
 4:  match_2  player_1 venue B opponent B      1    NA     5     1   NA,5,1
 5:  match_2  player_2 venue B opponent B      2    NA    10     2  NA,10,2
 6:  match_2  player_3 venue B opponent B      3    NA    15     3  NA,15,3
 7:  match_3  player_1 venue C opponent C      5     5     1     5    5,1,5
 8:  match_3  player_2 venue C opponent C      7    10     2     7   10,2,7
 9:  match_3  player_3 venue C opponent C      9    15     3     9   15,3,9
10:  match_4  player_1 venue C opponent C     11     1     5    11   1,5,11
11:  match_4  player_2 venue C opponent C      2     2     7     2    2,7,2
12:  match_4  player_3 venue C opponent C      6     3     9     6    3,9,6

Here is the minimal set of code required:

library(data.table)

setDT(data)

f <- function(x,n=3) lapply(n:1,\(i) x[i:(i+length(x)-1)])

data[order(-match_id),c("last3", "last2", "last1"):=f(points,3),by=player_id]
data[, combined:=paste(last3,last2,last1,sep = ","), by=1:nrow(data)]

Update -

The OP now wants to exclude some rows (skip over those rows) under certain conditions. If a mask can be passed to f(), which indicates which rows to include, then, we can adjust f() like this:

f <- function(x,n=3,m=rep(TRUE,length(x))) {
  x[!m] <- NA
  lapply(n:1,function(i) x[i:(i+length(x)-1)])
}

This example uses the above adjusted version of f() to skip over rows where game_x==0

data[
  order(-match_id),
  c("last3", "last2", "last1"):=f(points,3,game_x==1),
  by=.(player_id)][order(player_id,-match_id)][]

Another update!,

Now the OP wants to completely exclude game_x=0 rows.

rbind(
  data[game_x==0], 
  data[game_x==1][
  order(-match_id),
  c("last5", "last4", "last3", "last2", "last1"):=f(points,5),
  by=.(player_id)][order(player_id,-match_id)],
  fill=TRUE
)

Output:

   match_id player_id   venue   opponent game_x points last5 last4 last3 last2 last1
      <num>    <char>  <char>     <char>  <num>  <num> <num> <num> <num> <num> <num>
1:        3  player_1 venue B opponent A      0     15    NA    NA    NA    NA    NA
2:        5  player_1 venue B opponent C      0      2    NA    NA    NA    NA    NA
3:        4  player_1 venue B opponent C      1      1    NA    NA     5    10     1
4:        2  player_1 venue A opponent B      1     10    NA    NA    NA     5    10
5:        1  player_1 venue A opponent A      1      5    NA    NA    NA    NA     5

Custom function to mutate a new column for row means using starts_with()

We can use quo_name to assign column names

library(dplyr)
library(rlang)

continent_mean <- function(df, continent)  {
    df %>%
      select(starts_with(continent)) %>%
      mutate(!!quo_name(continent) := rowMeans(., na.rm = TRUE))
}

continent_mean(df, "asia")


#   asia_bangkok asia_tokyo asia_kathmandu asia
#1            NA         41             51   46
#2            NA         42             52   47
#3            33         43             NA   38
#4            NA         44             54   49
#5            35         45             55   45
#6            36         46             56   46
#7            NA         47             57   52
#8            38         48             NA   43
#9            39         49             NA   44
#10           40         NA             60   50

Using base R, we can do similar thing by

continent_mean <- function(df, continent)  {
     df1 <- df[startsWith(names(df), "asia")]
     df1[continent] <- rowMeans(df1, na.rm = TRUE)
     df1
}

If we want rowMeans of all the continents together we can use split.default

sapply(split.default(df, sub("_.*", "", names(df))), rowMeans, na.rm = TRUE)

#      asia europe
# [1,]   46      1
# [2,]   47     17
# [3,]   38     13
# [4,]   49     14
# [5,]   45     20
# [6,]   46      6
# [7,]   52     17
# [8,]   43     23
# [9,]   44     19
#[10,]   50     20

Mutate across multiple columns using dplyr

Two possibilities using dplyr:

library(dplyr)

mtcars %>% 
  rowwise() %>% 
  mutate(varmean = mean(c_across(mpg:vs)))

This returns

# A tibble: 32 x 12
# Rowwise: 
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb varmean
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4    40.0
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4    40.1
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1    31.7
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1    52.8
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2    73.2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1    47.7
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4    81.2
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2    33.1
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2    36.7
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4    42.8
# ... with 22 more rows

and without rowwise() and using base Rs rowMeans():

mtcars %>% 
  mutate(varmean = rowMeans(across(mpg:vs)))

returns

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb  varmean
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 39.99750
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 40.09938
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 31.69750
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 52.76687
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 73.16375
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 47.69250
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 81.24000
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 33.12250
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 36.69625
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 42.80750