Dplyr Mutate/Replace Several Columns on a Subset of Rows

dplyr mutate/replace several columns on a subset of rows

These solutions (1) maintain the pipeline, (2) do not overwrite the input and (3) only require that the condition be specified once:

1a) mutate_cond Create a simple function for data frames or data tables that can be incorporated into pipelines. This function is like mutate but only acts on the rows satisfying the condition:

mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
condition <- eval(substitute(condition), .data, envir)
.data[condition, ] <- .data[condition, ] %>% mutate(...)
.data
}

DF %>% mutate_cond(measure == 'exit', qty.exit = qty, cf = 0, delta.watts = 13)

1b) mutate_last This is an alternative function for data frames or data tables which again is like mutate but is only used within group_by (as in the example below) and only operates on the last group rather than every group. Note that TRUE > FALSE so if group_by specifies a condition then mutate_last will only operate on rows satisfying that condition.

mutate_last <- function(.data, ...) {
n <- n_groups(.data)
indices <- attr(.data, "indices")[[n]] + 1
.data[indices, ] <- .data[indices, ] %>% mutate(...)
.data
}


DF %>%
group_by(is.exit = measure == 'exit') %>%
mutate_last(qty.exit = qty, cf = 0, delta.watts = 13) %>%
ungroup() %>%
select(-is.exit)

2) factor out condition Factor out the condition by making it an extra column which is later removed. Then use ifelse, replace or arithmetic with logicals as illustrated. This also works for data tables.

library(dplyr)

DF %>% mutate(is.exit = measure == 'exit',
qty.exit = ifelse(is.exit, qty, qty.exit),
cf = (!is.exit) * cf,
delta.watts = replace(delta.watts, is.exit, 13)) %>%
select(-is.exit)

3) sqldf We could use SQL update via the sqldf package in the pipeline for data frames (but not data tables unless we convert them -- this may represent a bug in dplyr. See dplyr issue 1579). It may seem that we are undesirably modifying the input in this code due to the existence of the update but in fact the update is acting on a copy of the input in the temporarily generated database and not on the actual input.

library(sqldf)

DF %>%
do(sqldf(c("update '.'
set 'qty.exit' = qty, cf = 0, 'delta.watts' = 13
where measure = 'exit'",
"select * from '.'")))

4) row_case_when Also check out row_case_when defined in
Returning a tibble: how to vectorize with case_when? . It uses a syntax similar to case_when but applies to rows.

library(dplyr)

DF %>%
row_case_when(
measure == "exit" ~ data.frame(qty.exit = qty, cf = 0, delta.watts = 13),
TRUE ~ data.frame(qty.exit, cf, delta.watts)
)

Note 1: We used this as DF

set.seed(1)
DF <- data.frame(site = sample(1:6, 50, replace=T),
space = sample(1:4, 50, replace=T),
measure = sample(c('cfl', 'led', 'linear', 'exit'), 50,
replace=T),
qty = round(runif(50) * 30),
qty.exit = 0,
delta.watts = sample(10.5:100.5, 50, replace=T),
cf = runif(50))

Note 2: The problem of how to easily specify updating a subset of rows is also discussed in dplyr issues 134, 631, 1518 and 1573 with 631 being the main thread and 1573 being a review of the answers here.

dplyr: Replace multiple values based on condition in a selection of columns

A dplyr solution:

library(dplyr)
dt %>%
mutate(across(3:5, ~ ifelse(measure == "led", stringr::str_replace_all(
as.character(.),
c("2" = "X", "3" = "Y")
), .)))

Result:

   measure site space qty qty.exit cf
1: led 4 1 4 6 3
2: exit 4 2 1 4 6
3: cfl 1 4 6 2 3
4: linear 3 4 1 3 5
5: cfl 5 1 6 1 6
6: exit 4 3 2 6 4
7: exit 5 1 4 2 5
8: exit 1 4 3 6 4
9: linear 3 1 5 4 1
10: led 4 1 1 1 1
11: exit 5 4 3 5 2
12: cfl 4 2 4 5 5
13: led 4 X Y Y 4
...

R How to mutate a subset of rows

Using data.table, we'd do:

setDT(data)[colA == "ABC", ColB := "XXXX"]

and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies.

We call this sub-assign by reference. You can read more about it in the new HTML vignettes.

Mutate a subset of rows, but keep all rows with dplyr

You need an ifelse statement to identify gender == kvinnor.

library(dplyr)

df %>% mutate(neg_kv = ifelse(gender == "kvinnor", -1 * population, population))

# A tibble: 20 × 7
region marriage_status age gender population year neg_kv
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 Riket ogifta 15 män 56031 1968 56031
2 Riket ogifta 15 kvinnor 52959 1968 -52959
3 Riket ogifta 16 män 55917 1968 55917
4 Riket ogifta 16 kvinnor 52979 1968 -52979
5 Riket ogifta 17 män 55922 1968 55922
6 Riket ogifta 17 kvinnor 52050 1968 -52050
7 Riket ogifta 18 män 58681 1968 58681
8 Riket ogifta 18 kvinnor 51862 1968 -51862
9 Riket ogifta 19 män 60387 1968 60387
10 Riket ogifta 19 kvinnor 49750 1968 -49750
11 Riket ogifta 20 män 62487 1968 62487
12 Riket ogifta 20 kvinnor 50089 1968 -50089
13 Riket ogifta 21 män 60714 1968 60714
14 Riket ogifta 21 kvinnor 43413 1968 -43413
15 Riket ogifta 22 män 56801 1968 56801
16 Riket ogifta 22 kvinnor 36301 1968 -36301
17 Riket ogifta 23 män 49862 1968 49862
18 Riket ogifta 23 kvinnor 29227 1968 -29227
19 Riket ogifta 24 män 42143 1968 42143
20 Riket ogifta 24 kvinnor 23155 1968 -23155

R - mutate a subset of columns only on a subset of rows

See this post for more info

df1 %>%
mutate_at(vars(starts_with("B")),
.funs = list(~ if_else(Date %in% as.Date(c("2020-01-01", "2020-01-06")), 0.2 * ., .)))

changing multiple column values given a condition in dplyr

You can use mutate_at and pass the columns x1,x2,x3 to .vars parameter:

dta <- data.frame(na.ind = 1:3, x1 = 2:4, x2 = 2:4, x3 = 2:4, x4 = 2:4)
dta
# na.ind x1 x2 x3 x4
#1 1 2 2 2 2
#2 2 3 3 3 3
#3 3 4 4 4 4

dta %>% mutate_at(.vars = c("x1", "x2", "x3"), funs(ifelse(na.ind == 1, NA, .)))
# na.ind x1 x2 x3 x4
#1 1 NA NA NA 2
#2 2 3 3 3 3
#3 3 4 4 4 4

Using the dplyr mutate function to replace multiple values

You can use simple ifelse here but in case if you have multiple values to replace you can consider recode or case_when :

library(dplyr)

dat %>%
mutate(allele = recode(allele, `0` = 'AA/Aa', `1` = 'aa'),
case = recode(case, `0` = 'control', `1` = 'case'))

How to mutate a subset of columns with dplyr?

Guiding from this similar question and considering dft as your input, you can try :

dft %>%
dplyr::mutate_each(funs(replace(., . == "d", "nval")), matches("a_"))

which gives:

## A tibble: 10 × 3
# a_a a_b a
# <chr> <chr> <chr>
#1 a a a
#2 b b b
#3 c c c
#4 nval nval d
#5 e e e
#6 f f f
#7 g g g
#8 h h h
#9 i i i
#10 j j j

Replace values based on ID's for determinate columns

We may group by the prefix part of 'ID' after removing the characters from _ with str_remove, then mutate across the columns 'a' to 'd', by selecting the values where 'el' is 'y'

library(dplyr)
library(stringr)
df1 %>%
dplyr::group_by(grp = stringr::str_remove(ID, "_.*")) %>%
dplyr::mutate(across(a:d, ~ .[el == 'y'])) %>%
ungroup %>%
dplyr::select(-grp)

-output

# A tibble: 8 × 9
ID n post date el a b c d
<chr> <int> <int> <chr> <chr> <int> <dbl> <int> <int>
1 100_left 4 50 10/11/2020 y 190 5.41 4 300
2 100_right 4 50 10/11/2020 n 190 5.41 4 300
3 101_left 4 50 10/11/2020 y 180 5.49 6 360
4 101_right 4 50 10/11/2020 n 180 5.49 6 360
5 102_left 4 50 10/11/2020 y 190 5.5 3 300
6 102_right 4 50 10/11/2020 n 190 5.5 3 300
7 103_left 4 50 10/11/2020 y 190 5.39 3 170
8 103_right 4 50 10/11/2020 n 190 5.39 3 170

data

df1 <- structure(list(ID = c("100_left", "100_right", "101_left", "101_right", 
"102_left", "102_right", "103_left", "103_right"), n = c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L), post = c(50L, 50L, 50L, 50L, 50L,
50L, 50L, 50L), date = c("10/11/2020", "10/11/2020", "10/11/2020",
"10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020"
), el = c("y", "n", "y", "n", "y", "n", "y", "n"), a = c(190L,
NA, 180L, NA, 190L, NA, 190L, NA), b = c(5.41, 5.4, 5.49, 5.48,
5.5, 5.46, 5.39, 5.44), c = c(4L, 5L, 6L, 6L, 3L, 5L, 3L, 3L),
d = c(300L, 200L, 360L, 180L, 300L, 200L, 170L, 360L)),
class = "data.frame", row.names = c(NA,
-8L))

How to use mutate() +across() only for specific rows

A dplyr option with mutate and across using matches for the specific columns. You can use the following code:

library(dplyr)

df %>%
mutate(across(matches(".I|.V"), ~ if_else(row_number() %in% grep("in %", name), ./100, .)))

Output:

# A tibble: 4 × 4
name val.I val.V `val.%`
<chr> <dbl> <dbl> <dbl>
1 Peter 123 12.4 14
2 Peter in % 1.11 5.32 57
3 Harald 2222 3333 444
4 Harald in % 0.22 0.15 203


Related Topics



Leave a reply



Submit