Conditionally Replace Values of Subset of Rows With Column Name in R Using Only Tidy

Conditionally replace values of subset of rows with column name in R using only tidy

You can do a little tidyeval in your mutate_at function to get the column name, then an ifelse (or whatever other logic structure you might want) to replace certain values.

library(tidyverse)

wl %>%
mutate_at(vars(starts_with("AB")), function(x) {
x_var <- rlang::enquo(x)
ifelse(x == "Y", rlang::quo_name(x_var), x)
})
#> # A tibble: 4 x 5
#> x multi ABC ABD ABE
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 Y "" "" ABE
#> 2 2 Y ABC "" ABE
#> 3 3 Y ABC "" ""
#> 4 4 Y "" "" ABE

Created on 2018-08-16 by the reprex package (v0.2.0).

Using dplyr to conditionally replace values in a column

Assuming your data frame is dat and your column is var:

dat = dat %>% mutate(candy.flag = factor(ifelse(var == "Candy", "Candy", "Non-Candy")))

Replace only some NA values for selected rows and for only a column in R

df$type[!df$Asked & is.na(df$type)] <- "Replies" gets you to your desired table:

> type <-
+ c(NA, rep("Question",3), NA, NA, rep("Answer",4), rep(NA, 3), rep("Answer",2),
+ NA, "Question", NA, rep("Answer",2), NA,NA)
> Asked <- c(
+ T, rep(F, 9), T, rep(F, 4), T, rep(F, 4), T,F
+ )
> df <- data.frame(title = 1:22, comments = 1:22, type, Asked)
> df$type[!df$Asked & is.na(df$type)] <- "Replies"
> df
title comments type Asked
1 1 1 <NA> TRUE
2 2 2 Question FALSE
3 3 3 Question FALSE
4 4 4 Question FALSE
5 5 5 Replies FALSE
6 6 6 Replies FALSE
7 7 7 Answer FALSE
8 8 8 Answer FALSE
9 9 9 Answer FALSE
10 10 10 Answer FALSE
11 11 11 <NA> TRUE
12 12 12 Replies FALSE
13 13 13 Replies FALSE
14 14 14 Answer FALSE
15 15 15 Answer FALSE
16 16 16 <NA> TRUE
17 17 17 Question FALSE
18 18 18 Replies FALSE
19 19 19 Answer FALSE
20 20 20 Answer FALSE
21 21 21 <NA> TRUE
22 22 22 Replies FALSE

dplyr mutate/replace several columns on a subset of rows

These solutions (1) maintain the pipeline, (2) do not overwrite the input and (3) only require that the condition be specified once:

1a) mutate_cond Create a simple function for data frames or data tables that can be incorporated into pipelines. This function is like mutate but only acts on the rows satisfying the condition:

mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
condition <- eval(substitute(condition), .data, envir)
.data[condition, ] <- .data[condition, ] %>% mutate(...)
.data
}

DF %>% mutate_cond(measure == 'exit', qty.exit = qty, cf = 0, delta.watts = 13)

1b) mutate_last This is an alternative function for data frames or data tables which again is like mutate but is only used within group_by (as in the example below) and only operates on the last group rather than every group. Note that TRUE > FALSE so if group_by specifies a condition then mutate_last will only operate on rows satisfying that condition.

mutate_last <- function(.data, ...) {
n <- n_groups(.data)
indices <- attr(.data, "indices")[[n]] + 1
.data[indices, ] <- .data[indices, ] %>% mutate(...)
.data
}


DF %>%
group_by(is.exit = measure == 'exit') %>%
mutate_last(qty.exit = qty, cf = 0, delta.watts = 13) %>%
ungroup() %>%
select(-is.exit)

2) factor out condition Factor out the condition by making it an extra column which is later removed. Then use ifelse, replace or arithmetic with logicals as illustrated. This also works for data tables.

library(dplyr)

DF %>% mutate(is.exit = measure == 'exit',
qty.exit = ifelse(is.exit, qty, qty.exit),
cf = (!is.exit) * cf,
delta.watts = replace(delta.watts, is.exit, 13)) %>%
select(-is.exit)

3) sqldf We could use SQL update via the sqldf package in the pipeline for data frames (but not data tables unless we convert them -- this may represent a bug in dplyr. See dplyr issue 1579). It may seem that we are undesirably modifying the input in this code due to the existence of the update but in fact the update is acting on a copy of the input in the temporarily generated database and not on the actual input.

library(sqldf)

DF %>%
do(sqldf(c("update '.'
set 'qty.exit' = qty, cf = 0, 'delta.watts' = 13
where measure = 'exit'",
"select * from '.'")))

4) row_case_when Also check out row_case_when defined in
Returning a tibble: how to vectorize with case_when? . It uses a syntax similar to case_when but applies to rows.

library(dplyr)

DF %>%
row_case_when(
measure == "exit" ~ data.frame(qty.exit = qty, cf = 0, delta.watts = 13),
TRUE ~ data.frame(qty.exit, cf, delta.watts)
)

Note 1: We used this as DF

set.seed(1)
DF <- data.frame(site = sample(1:6, 50, replace=T),
space = sample(1:4, 50, replace=T),
measure = sample(c('cfl', 'led', 'linear', 'exit'), 50,
replace=T),
qty = round(runif(50) * 30),
qty.exit = 0,
delta.watts = sample(10.5:100.5, 50, replace=T),
cf = runif(50))

Note 2: The problem of how to easily specify updating a subset of rows is also discussed in dplyr issues 134, 631, 1518 and 1573 with 631 being the main thread and 1573 being a review of the answers here.

R: Conditionally replacing values based on column pre-fixes and suffixes

Another attempt which should essentially only be one assignment operation. Using @alistaire's data again:

vars <- c("x","y")
foo[vars] <- Map(pmax, foo[vars], bar[match(foo$id, bar$id), vars], na.rm=TRUE)
foo

# id x y z
#1 1 10 1 1
#2 2 9 2 2
#3 3 NA 3 3
#4 4 1 4 4
#5 5 3 5 5
#6 6 8 6 6

dplyr: Replace multiple values based on condition in a selection of columns

A dplyr solution:

library(dplyr)
dt %>%
mutate(across(3:5, ~ ifelse(measure == "led", stringr::str_replace_all(
as.character(.),
c("2" = "X", "3" = "Y")
), .)))

Result:

   measure site space qty qty.exit cf
1: led 4 1 4 6 3
2: exit 4 2 1 4 6
3: cfl 1 4 6 2 3
4: linear 3 4 1 3 5
5: cfl 5 1 6 1 6
6: exit 4 3 2 6 4
7: exit 5 1 4 2 5
8: exit 1 4 3 6 4
9: linear 3 1 5 4 1
10: led 4 1 1 1 1
11: exit 5 4 3 5 2
12: cfl 4 2 4 5 5
13: led 4 X Y Y 4
...

How to replace if the NA values in any column that should replace values by the next column's values in R programming

I guess you already have answer to the first part of your question, here is an alternative way using replace. To drop columns that have all NA in them you can use select with where.

library(dplyr)

df1 %>%
mutate(across(.fns = ~replace(., . == '', 'N')),
GID = sub('N', '', GID)) %>%
select(-where(~all(is.na(.)))) %>%
rename_with(~names(df1)[seq_along(.)])

# GID ColA
#1 1 2
#2 2 4
#3 3 4
#4 4 5
#5 5 5
#6 G1 N
#7 MG2 1
#8 MG3 1
#9 G4 N

conditionally renaming cells based on their current value

Would something like this, using the tidyverse,

First, loading packages,

# install.packages(c("tidyverse"), dependencies = TRUE) 
library(tidyverse)

Second, creating data, (see other examples)

df <- tribble(
~name, ~sub_name, ~level,
"Food", "Food", "group",
"Food", "Fruit and vegetables", "subgroup",
"Food", "Meat, poultry and fish", "subgroup")
df
# A tibble: 3 x 3
name sub_name level
<chr> <chr> <chr>
1 Food Food group
2 Food Fruit and vegetables subgroup
3 Food Meat, poultry and fish subgroup

Third, recode using case_when (see more examples)

df <- df %>% mutate(level = case_when(
level == "group" ~ "primary",
level == "subgroup" ~ "secondary",
TRUE ~ "other"
))

Forth, take a look at the recoded data,

df
# A tibble: 3 x 3
name sub_name level
<chr> <chr> <chr>
1 Food Food primary
2 Food Fruit and vegetables secondary
3 Food Meat, poultry and fish secondary

Fifth, filter() (see more filter options)

df2 <- df %>% filter(level != "primary")

df2
# A tibble: 2 x 3
name sub_name level
<chr> <chr> <chr>
1 Food Fruit and vegetables secondary
2 Food Meat, poultry and fish secondary

Replace value with the name of its respective column

The coding below enabled me to replace every "true" value (character) into its respective column name.

##Replace every "true" value with its respective column name
w <- which(df=="true",arr.ind=TRUE)
df[w] <- names(df)[w[,"col"]]


Related Topics



Leave a reply



Submit