Conditionally replace values of subset of rows with column name in R using only tidy
You can do a little tidyeval in your mutate_at
function to get the column name, then an ifelse
(or whatever other logic structure you might want) to replace certain values.
library(tidyverse)
wl %>%
mutate_at(vars(starts_with("AB")), function(x) {
x_var <- rlang::enquo(x)
ifelse(x == "Y", rlang::quo_name(x_var), x)
})
#> # A tibble: 4 x 5
#> x multi ABC ABD ABE
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 Y "" "" ABE
#> 2 2 Y ABC "" ABE
#> 3 3 Y ABC "" ""
#> 4 4 Y "" "" ABE
Created on 2018-08-16 by the reprex package (v0.2.0).
Using dplyr to conditionally replace values in a column
Assuming your data frame is dat
and your column is var
:
dat = dat %>% mutate(candy.flag = factor(ifelse(var == "Candy", "Candy", "Non-Candy")))
Replace only some NA values for selected rows and for only a column in R
df$type[!df$Asked & is.na(df$type)] <- "Replies"
gets you to your desired table:
> type <-
+ c(NA, rep("Question",3), NA, NA, rep("Answer",4), rep(NA, 3), rep("Answer",2),
+ NA, "Question", NA, rep("Answer",2), NA,NA)
> Asked <- c(
+ T, rep(F, 9), T, rep(F, 4), T, rep(F, 4), T,F
+ )
> df <- data.frame(title = 1:22, comments = 1:22, type, Asked)
> df$type[!df$Asked & is.na(df$type)] <- "Replies"
> df
title comments type Asked
1 1 1 <NA> TRUE
2 2 2 Question FALSE
3 3 3 Question FALSE
4 4 4 Question FALSE
5 5 5 Replies FALSE
6 6 6 Replies FALSE
7 7 7 Answer FALSE
8 8 8 Answer FALSE
9 9 9 Answer FALSE
10 10 10 Answer FALSE
11 11 11 <NA> TRUE
12 12 12 Replies FALSE
13 13 13 Replies FALSE
14 14 14 Answer FALSE
15 15 15 Answer FALSE
16 16 16 <NA> TRUE
17 17 17 Question FALSE
18 18 18 Replies FALSE
19 19 19 Answer FALSE
20 20 20 Answer FALSE
21 21 21 <NA> TRUE
22 22 22 Replies FALSE
dplyr mutate/replace several columns on a subset of rows
These solutions (1) maintain the pipeline, (2) do not overwrite the input and (3) only require that the condition be specified once:
1a) mutate_cond Create a simple function for data frames or data tables that can be incorporated into pipelines. This function is like mutate
but only acts on the rows satisfying the condition:
mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
condition <- eval(substitute(condition), .data, envir)
.data[condition, ] <- .data[condition, ] %>% mutate(...)
.data
}
DF %>% mutate_cond(measure == 'exit', qty.exit = qty, cf = 0, delta.watts = 13)
1b) mutate_last This is an alternative function for data frames or data tables which again is like mutate
but is only used within group_by
(as in the example below) and only operates on the last group rather than every group. Note that TRUE > FALSE so if group_by
specifies a condition then mutate_last
will only operate on rows satisfying that condition.
mutate_last <- function(.data, ...) {
n <- n_groups(.data)
indices <- attr(.data, "indices")[[n]] + 1
.data[indices, ] <- .data[indices, ] %>% mutate(...)
.data
}
DF %>%
group_by(is.exit = measure == 'exit') %>%
mutate_last(qty.exit = qty, cf = 0, delta.watts = 13) %>%
ungroup() %>%
select(-is.exit)
2) factor out condition Factor out the condition by making it an extra column which is later removed. Then use ifelse
, replace
or arithmetic with logicals as illustrated. This also works for data tables.
library(dplyr)
DF %>% mutate(is.exit = measure == 'exit',
qty.exit = ifelse(is.exit, qty, qty.exit),
cf = (!is.exit) * cf,
delta.watts = replace(delta.watts, is.exit, 13)) %>%
select(-is.exit)
3) sqldf We could use SQL update
via the sqldf package in the pipeline for data frames (but not data tables unless we convert them -- this may represent a bug in dplyr. See dplyr issue 1579). It may seem that we are undesirably modifying the input in this code due to the existence of the update
but in fact the update
is acting on a copy of the input in the temporarily generated database and not on the actual input.
library(sqldf)
DF %>%
do(sqldf(c("update '.'
set 'qty.exit' = qty, cf = 0, 'delta.watts' = 13
where measure = 'exit'",
"select * from '.'")))
4) row_case_when Also check out row_case_when
defined in
Returning a tibble: how to vectorize with case_when? . It uses a syntax similar to case_when
but applies to rows.
library(dplyr)
DF %>%
row_case_when(
measure == "exit" ~ data.frame(qty.exit = qty, cf = 0, delta.watts = 13),
TRUE ~ data.frame(qty.exit, cf, delta.watts)
)
Note 1: We used this as DF
set.seed(1)
DF <- data.frame(site = sample(1:6, 50, replace=T),
space = sample(1:4, 50, replace=T),
measure = sample(c('cfl', 'led', 'linear', 'exit'), 50,
replace=T),
qty = round(runif(50) * 30),
qty.exit = 0,
delta.watts = sample(10.5:100.5, 50, replace=T),
cf = runif(50))
Note 2: The problem of how to easily specify updating a subset of rows is also discussed in dplyr issues 134, 631, 1518 and 1573 with 631 being the main thread and 1573 being a review of the answers here.
R: Conditionally replacing values based on column pre-fixes and suffixes
Another attempt which should essentially only be one assignment operation. Using @alistaire's data again:
vars <- c("x","y")
foo[vars] <- Map(pmax, foo[vars], bar[match(foo$id, bar$id), vars], na.rm=TRUE)
foo
# id x y z
#1 1 10 1 1
#2 2 9 2 2
#3 3 NA 3 3
#4 4 1 4 4
#5 5 3 5 5
#6 6 8 6 6
dplyr: Replace multiple values based on condition in a selection of columns
A dplyr
solution:
library(dplyr)
dt %>%
mutate(across(3:5, ~ ifelse(measure == "led", stringr::str_replace_all(
as.character(.),
c("2" = "X", "3" = "Y")
), .)))
Result:
measure site space qty qty.exit cf
1: led 4 1 4 6 3
2: exit 4 2 1 4 6
3: cfl 1 4 6 2 3
4: linear 3 4 1 3 5
5: cfl 5 1 6 1 6
6: exit 4 3 2 6 4
7: exit 5 1 4 2 5
8: exit 1 4 3 6 4
9: linear 3 1 5 4 1
10: led 4 1 1 1 1
11: exit 5 4 3 5 2
12: cfl 4 2 4 5 5
13: led 4 X Y Y 4
...
How to replace if the NA values in any column that should replace values by the next column's values in R programming
I guess you already have answer to the first part of your question, here is an alternative way using replace
. To drop columns that have all NA
in them you can use select
with where
.
library(dplyr)
df1 %>%
mutate(across(.fns = ~replace(., . == '', 'N')),
GID = sub('N', '', GID)) %>%
select(-where(~all(is.na(.)))) %>%
rename_with(~names(df1)[seq_along(.)])
# GID ColA
#1 1 2
#2 2 4
#3 3 4
#4 4 5
#5 5 5
#6 G1 N
#7 MG2 1
#8 MG3 1
#9 G4 N
conditionally renaming cells based on their current value
Would something like this, using the tidyverse,
First, loading packages,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(tidyverse)
Second, creating data, (see other examples)
df <- tribble(
~name, ~sub_name, ~level,
"Food", "Food", "group",
"Food", "Fruit and vegetables", "subgroup",
"Food", "Meat, poultry and fish", "subgroup")
df
# A tibble: 3 x 3
name sub_name level
<chr> <chr> <chr>
1 Food Food group
2 Food Fruit and vegetables subgroup
3 Food Meat, poultry and fish subgroup
Third, recode using case_when
(see more examples)
df <- df %>% mutate(level = case_when(
level == "group" ~ "primary",
level == "subgroup" ~ "secondary",
TRUE ~ "other"
))
Forth, take a look at the recoded data,
df
# A tibble: 3 x 3
name sub_name level
<chr> <chr> <chr>
1 Food Food primary
2 Food Fruit and vegetables secondary
3 Food Meat, poultry and fish secondary
Fifth, filter()
(see more filter options)
df2 <- df %>% filter(level != "primary")
df2
# A tibble: 2 x 3
name sub_name level
<chr> <chr> <chr>
1 Food Fruit and vegetables secondary
2 Food Meat, poultry and fish secondary
Replace value with the name of its respective column
The coding below enabled me to replace every "true" value (character) into its respective column name.
##Replace every "true" value with its respective column name
w <- which(df=="true",arr.ind=TRUE)
df[w] <- names(df)[w[,"col"]]
Related Topics
How to Specify the Size of a Graph in Ggplot2 Independent of Axis Labels
Ggplot2: Setting Geom_Bar Baseline to 1 Instead of Zero
R Count Distinct Elements Based on Two Columns by Group
Order Bars in Ggplot2 Bar Graph
How to Sort a Character Vector Where Elements Contain Letters and Numbers
Shading a Kernel Density Plot Between Two Points.
Getting the Top Values by Group
Specify Custom Date Format For Colclasses Argument in Read.Table/Read.Csv
Dplyr Conditional Summarise Function
How to Add a Diagonal Line to a Plot
Saving Output of Confusionmatrix as a .Csv Table
Transpose/Reshape Dataframe Without "Timevar" from Long to Wide Format
How to Get Summary Statistics by Group
Increasing (Or Decreasing) the Memory Available to R Processes
Split a Large Dataframe into a List of Data Frames Based on Common Value in Column