Dplyr Mutate/Replace Several Columns on a Subset of Rows

dplyr mutate/replace several columns on a subset of rows

These solutions (1) maintain the pipeline, (2) do not overwrite the input and (3) only require that the condition be specified once:

1a) mutate_cond Create a simple function for data frames or data tables that can be incorporated into pipelines. This function is like mutate but only acts on the rows satisfying the condition:

mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
  condition <- eval(substitute(condition), .data, envir)
  .data[condition, ] <- .data[condition, ] %>% mutate(...)
  .data
}

DF %>% mutate_cond(measure == 'exit', qty.exit = qty, cf = 0, delta.watts = 13)

1b) mutate_last This is an alternative function for data frames or data tables which again is like mutate but is only used within group_by (as in the example below) and only operates on the last group rather than every group. Note that TRUE > FALSE so if group_by specifies a condition then mutate_last will only operate on rows satisfying that condition.

mutate_last <- function(.data, ...) {
  n <- n_groups(.data)
  indices <- attr(.data, "indices")[[n]] + 1
  .data[indices, ] <- .data[indices, ] %>% mutate(...)
  .data
}


DF %>% 
   group_by(is.exit = measure == 'exit') %>%
   mutate_last(qty.exit = qty, cf = 0, delta.watts = 13) %>%
   ungroup() %>%
   select(-is.exit)

2) factor out condition Factor out the condition by making it an extra column which is later removed. Then use ifelse, replace or arithmetic with logicals as illustrated. This also works for data tables.

library(dplyr)

DF %>% mutate(is.exit = measure == 'exit',
              qty.exit = ifelse(is.exit, qty, qty.exit),
              cf = (!is.exit) * cf,
              delta.watts = replace(delta.watts, is.exit, 13)) %>%
       select(-is.exit)

3) sqldf We could use SQL update via the sqldf package in the pipeline for data frames (but not data tables unless we convert them -- this may represent a bug in dplyr. See dplyr issue 1579). It may seem that we are undesirably modifying the input in this code due to the existence of the update but in fact the update is acting on a copy of the input in the temporarily generated database and not on the actual input.

library(sqldf)

DF %>% 
   do(sqldf(c("update '.' 
                 set 'qty.exit' = qty, cf = 0, 'delta.watts' = 13 
                 where measure = 'exit'", 
              "select * from '.'")))

4) row_case_when Also check out row_case_when defined in
Returning a tibble: how to vectorize with case_when? . It uses a syntax similar to case_when but applies to rows.

library(dplyr)

DF %>%
  row_case_when(
    measure == "exit" ~ data.frame(qty.exit = qty, cf = 0, delta.watts = 13),
    TRUE ~ data.frame(qty.exit, cf, delta.watts)
  )

Note 1: We used this as DF

set.seed(1)
DF <- data.frame(site = sample(1:6, 50, replace=T),
                 space = sample(1:4, 50, replace=T),
                 measure = sample(c('cfl', 'led', 'linear', 'exit'), 50, 
                               replace=T),
                 qty = round(runif(50) * 30),
                 qty.exit = 0,
                 delta.watts = sample(10.5:100.5, 50, replace=T),
                 cf = runif(50))

Note 2: The problem of how to easily specify updating a subset of rows is also discussed in dplyr issues 134, 631, 1518 and 1573 with 631 being the main thread and 1573 being a review of the answers here.

dplyr: Replace multiple values based on condition in a selection of columns

A dplyr solution:

library(dplyr)
dt %>%
  mutate(across(3:5, ~ ifelse(measure == "led", stringr::str_replace_all(
    as.character(.),
    c("2" = "X", "3" = "Y")
  ), .)))

Result:

   measure site space qty qty.exit cf
 1:     led    4     1   4        6  3
 2:    exit    4     2   1        4  6
 3:     cfl    1     4   6        2  3
 4:  linear    3     4   1        3  5
 5:     cfl    5     1   6        1  6
 6:    exit    4     3   2        6  4
 7:    exit    5     1   4        2  5
 8:    exit    1     4   3        6  4
 9:  linear    3     1   5        4  1
10:     led    4     1   1        1  1
11:    exit    5     4   3        5  2
12:     cfl    4     2   4        5  5
13:     led    4     X   Y        Y  4
...

R How to mutate a subset of rows

Using data.table, we'd do:

setDT(data)[colA == "ABC", ColB := "XXXX"]

and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies.

We call this sub-assign by reference. You can read more about it in the new HTML vignettes.

Mutate a subset of rows, but keep all rows with dplyr

You need an ifelse statement to identify gender == kvinnor.

library(dplyr)

df %>% mutate(neg_kv = ifelse(gender == "kvinnor", -1 * population, population))

# A tibble: 20 × 7
   region marriage_status   age gender  population  year neg_kv
   <chr>  <chr>           <dbl> <chr>        <dbl> <dbl>  <dbl>
 1 Riket  ogifta             15 män          56031  1968  56031
 2 Riket  ogifta             15 kvinnor      52959  1968 -52959
 3 Riket  ogifta             16 män          55917  1968  55917
 4 Riket  ogifta             16 kvinnor      52979  1968 -52979
 5 Riket  ogifta             17 män          55922  1968  55922
 6 Riket  ogifta             17 kvinnor      52050  1968 -52050
 7 Riket  ogifta             18 män          58681  1968  58681
 8 Riket  ogifta             18 kvinnor      51862  1968 -51862
 9 Riket  ogifta             19 män          60387  1968  60387
10 Riket  ogifta             19 kvinnor      49750  1968 -49750
11 Riket  ogifta             20 män          62487  1968  62487
12 Riket  ogifta             20 kvinnor      50089  1968 -50089
13 Riket  ogifta             21 män          60714  1968  60714
14 Riket  ogifta             21 kvinnor      43413  1968 -43413
15 Riket  ogifta             22 män          56801  1968  56801
16 Riket  ogifta             22 kvinnor      36301  1968 -36301
17 Riket  ogifta             23 män          49862  1968  49862
18 Riket  ogifta             23 kvinnor      29227  1968 -29227
19 Riket  ogifta             24 män          42143  1968  42143
20 Riket  ogifta             24 kvinnor      23155  1968 -23155

R - mutate a subset of columns only on a subset of rows

See this post for more info

df1 %>%
  mutate_at(vars(starts_with("B")),
            .funs = list(~ if_else(Date %in% as.Date(c("2020-01-01", "2020-01-06")), 0.2 * ., .)))

changing multiple column values given a condition in dplyr

You can use mutate_at and pass the columns x1,x2,x3 to .vars parameter:

dta <- data.frame(na.ind = 1:3, x1 = 2:4, x2 = 2:4, x3 = 2:4, x4 = 2:4)
dta
#  na.ind x1 x2 x3 x4
#1      1  2  2  2  2
#2      2  3  3  3  3
#3      3  4  4  4  4

dta %>% mutate_at(.vars = c("x1", "x2", "x3"), funs(ifelse(na.ind == 1, NA, .)))
#  na.ind x1 x2 x3 x4
#1      1 NA NA NA  2
#2      2  3  3  3  3
#3      3  4  4  4  4

Using the dplyr mutate function to replace multiple values

You can use simple ifelse here but in case if you have multiple values to replace you can consider recode or case_when :

library(dplyr)

dat %>%
  mutate(allele = recode(allele, `0` = 'AA/Aa', `1` = 'aa'), 
         case = recode(case, `0` = 'control', `1` = 'case'))

How to mutate a subset of columns with dplyr?

Guiding from this similar question and considering dft as your input, you can try :

dft %>%
  dplyr::mutate_each(funs(replace(., . == "d", "nval")), matches("a_"))

which gives:

## A tibble: 10 × 3
#     a_a   a_b     a
#   <chr> <chr> <chr>
#1      a     a     a
#2      b     b     b
#3      c     c     c
#4   nval  nval     d
#5      e     e     e
#6      f     f     f
#7      g     g     g
#8      h     h     h
#9      i     i     i
#10     j     j     j

Replace values based on ID's for determinate columns

We may group by the prefix part of 'ID' after removing the characters from _ with str_remove, then mutate across the columns 'a' to 'd', by selecting the values where 'el' is 'y'

library(dplyr)
library(stringr)
df1 %>% 
  dplyr::group_by(grp = stringr::str_remove(ID, "_.*")) %>% 
  dplyr::mutate(across(a:d, ~ .[el == 'y'])) %>%
  ungroup %>% 
  dplyr::select(-grp)

-output

# A tibble: 8 × 9
  ID            n  post date       el        a     b     c     d
  <chr>     <int> <int> <chr>      <chr> <int> <dbl> <int> <int>
1 100_left      4    50 10/11/2020 y       190  5.41     4   300
2 100_right     4    50 10/11/2020 n       190  5.41     4   300
3 101_left      4    50 10/11/2020 y       180  5.49     6   360
4 101_right     4    50 10/11/2020 n       180  5.49     6   360
5 102_left      4    50 10/11/2020 y       190  5.5      3   300
6 102_right     4    50 10/11/2020 n       190  5.5      3   300
7 103_left      4    50 10/11/2020 y       190  5.39     3   170
8 103_right     4    50 10/11/2020 n       190  5.39     3   170

data

df1 <- structure(list(ID = c("100_left", "100_right", "101_left", "101_right", 
"102_left", "102_right", "103_left", "103_right"), n = c(4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L), post = c(50L, 50L, 50L, 50L, 50L, 
50L, 50L, 50L), date = c("10/11/2020", "10/11/2020", "10/11/2020", 
"10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020"
), el = c("y", "n", "y", "n", "y", "n", "y", "n"), a = c(190L, 
NA, 180L, NA, 190L, NA, 190L, NA), b = c(5.41, 5.4, 5.49, 5.48, 
5.5, 5.46, 5.39, 5.44), c = c(4L, 5L, 6L, 6L, 3L, 5L, 3L, 3L), 
    d = c(300L, 200L, 360L, 180L, 300L, 200L, 170L, 360L)), 
class = "data.frame", row.names = c(NA, 
-8L))

How to use mutate() +across() only for specific rows

A dplyr option with mutate and across using matches for the specific columns. You can use the following code:

library(dplyr)

df %>% 
  mutate(across(matches(".I|.V"), ~ if_else(row_number() %in% grep("in %", name), ./100, .)))

Output:

# A tibble: 4 × 4
  name          val.I   val.V `val.%`
  <chr>         <dbl>   <dbl>   <dbl>
1 Peter        123      12.4       14
2 Peter in %     1.11    5.32      57
3 Harald      2222    3333        444
4 Harald in %    0.22    0.15     203

Dplyr Mutate/Replace Several Columns on a Subset of Rows