Best way to replace a lengthy ifelse structure in R
Come up with a matching table and merge.
I'll do the first couple statements for brevity, hopefully you get the point:
library(data.table); setDT(df)
match_table <-
data.table(Code = c(89:102),
tCode = c(rep(78, 9), 79, 79, 80, 80))
df[match.table, tCode := tCode, on = "Code"]
R: Simplifying long ifelse statement
Two alternative methods, both using merges/joins. One advantage of this approach is that it is much easier to maintain: you have well-structured and manageable tables of procedures instead of (potentially really-long) lines of code with your ifelse
statement. The comments suggesting %in%
also reduce this problem, though you'll deal with manageable vectors instead of mangeable frames.
Fake data:
library(dplyr)
library(tidyr)
vet <- data_frame(ProcedureCode = c('6160', '2028', '2029'))
One frame per procedure type. This is manageable, but might be annoying if you have a lot of different types. Repeat the
left_join
for each type.abs <- data_frame(ab=TRUE, ProcedureCode = c('6160', '2028'))
antis <- data_frame(antibiotic=TRUE, ProcedureCode = c('2029'))
vet %>%
left_join(abs, by = "ProcedureCode") %>%
left_join(antis, by = "ProcedureCode") %>%
mutate_at(vars(ab, antibiotic), funs(!is.na(.)))
# # A tibble: 3 × 3
# ProcedureCode ab antibiotic
# <chr> <lgl> <lgl>
# 1 6160 TRUE FALSE
# 2 2028 TRUE FALSE
# 3 2029 FALSE TRUEThe use of
ab=TRUE
(etc) is so that there is a column to merge. The rows that do not match will have anNA
, which mandates the need for!is.na(.)
to convertT,NA,T
toT,F,T
.You could even use vectors of procedure codes instead, something like:
vet %>%
left_join(data_frame(ab=TRUE, ProcedureCode=vector_of_abs), by = "ProcedureCode") %>%
...Though that really only helps if you already have the codes as vectors, otherwise it seems to be solely whichever is easier for you to maintain.
One frame with all procedures, requiring only a single frame for types and a single
left_join
.procedures <- tibble::tribble(
~ProcedureCode, ~procedure,
'6160' , 'ab',
'2028' , 'ab',
'2029' , 'antibiotic'
)
left_join(vet, procedures, by = "ProcedureCode")
# # A tibble: 3 × 2
# ProcedureCode procedure
# <chr> <chr>
# 1 6160 ab
# 2 2028 ab
# 3 2029 antibioticYou can either keep it as-is (if it makes sense to store it that way) or
spread
it to be like the others:left_join(vet, procedures, by = "ProcedureCode") %>%
mutate(ignore=TRUE) %>%
spread(procedure, ignore) %>%
mutate_at(vars(ab, antibiotic), funs(!is.na(.)))
# # A tibble: 3 × 3
# ProcedureCode ab antibiotic
# <chr> <lgl> <lgl>
# 1 2028 TRUE FALSE
# 2 2029 FALSE TRUE
# 3 6160 TRUE FALSE(Order after the join/merge is different here, but the data remains the same.)
(I used logical
s, it's easy enough to convert them to 1s and 0s, perhaps mutate(ab=1L*ab)
or mutate(ab=as.integer(ab))
.)
Change length.out in ifelse function
As you mention, your approach leads to a NA
in the first element of the vector returned by f
. This first element is not similar to the previous (since there is none), so we would like to have the first value unchanged.
A straightforward approach is to do just that. Apologies, it does not answer your title question although it does solve your problem.
f <- function(x) {
# storing the output of ifelse in a variable
out <- ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
Note that this will not take care properly of elements repeated more than twice (e.g. c(1,2,2,2,3)
). Also, this will change all your element the same way. So in c(1,2,2,1,2,2)
, all the second twos will be changed the same way. This may or mat not be something you want.
You could hack something (a comment suggests ?rle
), but I suggest changing the way you randomize your data, if this makes sense with your particular data.
Instead of adding 0.001*sd
, maybe you could add a gaussian noise with this standard dev? This depends on your application obviously.
f <- function(x) {
# adding gaussian noise with small sd to repeated values
# storing the output in a variable `out`
out <- ifelse(x==shift(x),
x + rnorm(length(x), mean=0,
sd=0.01*sd(x, na.rm = TRUE)),
x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
It depends on what is your purpose for getting rid of exact duplicated values.
Nested ifelse: improved syntax
With full respect to the OP's remarkable effort to improve nested ifelse()
, I prefer a different approach which I believe is easy to write, concise, maintainable and fast:
xx <- data.frame(a=c(1L,2L,1L,3L), b=1:4)
library(data.table)
# coerce to data.table, and set the default first
setDT(xx)[, c:= -b]
xx[a == 1L, c := b] # 1st special case
xx[a == 2L, c := 100L*b] # 2nd special case, note use of integer 100L
# xx[a == 3L, c := ...] # other cases
# xx[a == 4L, c := ...]
#...
xx
# a b c
#1: 1 1 1
#2: 2 2 200
#3: 1 3 3
#4: 3 4 -4
Note that for the 2nd special case b
is multiplied by the integer constant 100L
to make sure that the right hand sides are all of type integer in order to avoid type conversion to double.
Edit 2: This can also be written in an even more concise (but still maintainable) way as a one-liner:
setDT(xx)[, c:= -b][a == 1L, c := b][a == 2L, c := 100*b][]
data.table
chaining works here, because c
is updated in place so that subsequent expressions are acting on all rows of xx
even if the previous expression was a selective update of a subset of rows.
Edit 1: This approach can be implemented with base R as well:
xx <- data.frame(a=c(1L,2L,1L,3L), b=1:4)
xx$c <- -xx$b
idx <- xx$a == 1L; xx$c[idx] <- xx$b[idx]
idx <- xx$a == 2L; xx$c[idx] <- 100 * xx$b[idx]
xx
# a b c
#1 1 1 1
#2 2 2 200
#3 1 3 3
#4 3 4 -4
Replace using ifelse not reduce length of vector in R?
I think the following piece of code produces the desirable output.
library(tidyverse)
set.seed(111)
level <- c("high","high","high","high","low","low","low")
val <- rnorm(7,35,6)
df <- data.frame(level, val)
df$level <- as.factor(as.character(df$level))
df$new.val <- df$val
df %>%
mutate(new.val = as.integer(case_when(
val > 35 & level == "high" ~ 0,
val > 30 & level == "low" ~ 0,
TRUE ~ new.val
)))
level val new.val
1 high 36.41132 0
2 high 33.01558 33
3 high 33.13026 33
4 high 21.18593 21
5 low 33.97474 0
6 low 35.84167 0
7 low 26.01544 26
Avoiding writing a long if-else statement in R
If there is a pattern in the if else statements, we can create the set of expressions beforehand and use !!!
to unqoute and splice them into arguments to case_when
:
x_gt_cond <- rep(c(-Inf, 5, 10, 12, 20), 2)
x_le_cond <- rep(c(5, 10, 12, 20 ,30), 2)
y_gt_cond <- rep(c(-Inf, 100), each = 5)
y_le_cond <- rep(c(100, 1000), each = 5)
z <- 1:10
cases <- paste("x > ", x_gt_cond, "& x <= ", x_le_cond,
"& y > ", y_gt_cond, "& y <= ", y_le_cond, "~ ", z)
library(dplyr)
library(rlang)
df %>%
mutate(z = case_when(!!!parse_exprs(cases)))
The trick is to use -Inf
and Inf
for the lower and upper bounds so that you have balanced conditions for x
and y
. What's elegant about this solution is that you can add more conditions simply by altering the _cond
vectors.
Output:
> cases
[1] "x > -Inf & x <= 5 & y > -Inf & y <= 100 ~ 1"
[2] "x > 5 & x <= 10 & y > -Inf & y <= 100 ~ 2"
[3] "x > 10 & x <= 12 & y > -Inf & y <= 100 ~ 3"
[4] "x > 12 & x <= 20 & y > -Inf & y <= 100 ~ 4"
[5] "x > 20 & x <= 30 & y > -Inf & y <= 100 ~ 5"
[6] "x > -Inf & x <= 5 & y > 100 & y <= 1000 ~ 6"
[7] "x > 5 & x <= 10 & y > 100 & y <= 1000 ~ 7"
[8] "x > 10 & x <= 12 & y > 100 & y <= 1000 ~ 8"
[9] "x > 12 & x <= 20 & y > 100 & y <= 1000 ~ 9"
[10] "x > 20 & x <= 30 & y > 100 & y <= 1000 ~ 10"
id x y z
1 1 13 8440 NA
2 2 3 1467 NA
3 3 5 2699 NA
4 4 24 5286 NA
5 5 5 2378 NA
6 6 16 268 9
7 7 19 2910 NA
8 8 19 706 9
9 9 24 6212 NA
10 10 7 6026 NA
...
How to conditionally replace values in r data frame using if/then statement
You can use ifelse
, like this:
df$customer_id <- ifelse(df$customer %in% c('paramount', 'pixar'), 99, df$customer_id)
The syntax is simple:
ifelse(condition, result if TRUE, result if FALSE)
This is vectorized, so you can use it on a dataframe column.
R ifelse to replace values in a column
This should work, using the working example:
var <- c("Private", "Private", "?", "Private")
df <- data.frame(var)
df$var[which(df$var == "?")] = "Private"
Then this will replace the values of "?" with "Private"
The reason your replacement isn't working (I think) is as if the value in df$var
isn't "?"
then it replaces the element of the vector with the whole df$var
column, not just reinserting the element you want.
Alternatives to nested ifelse statements in R
You can vectorize using max.col
indx <- names(df)[max.col(df[-1], ties.method = "first") + 1L]
df$firstyear <- as.numeric(sub("in", "20", indx))
df
# id in05 in06 in07 in08 in09 firstyear
# 1 a 1 0 1 0 0 2005
# 2 b 0 0 1 1 0 2007
# 3 c 0 0 0 1 0 2008
# 4 d 1 1 1 1 1 2005
Using ifelse() with R and text removal: how to handle NA values?
There's an issue with your test
- it only returns a single value of FALSE
. If you instead use grepl
to test you get your expected result:
test_df$median_playtime_hours <- ifelse(
#if the data has hours in it, then...
test = grepl("hours", as.character(test_df$median_playtime)),
#text removal if it contains hours
as.numeric(gsub(pattern = " hours", replacement = "", x = as.character(test_df$median_playtime))),
#otherwise, remove minutes text and divide by 60
as.numeric(gsub(pattern = " minutes", replacement = "", x = test_df$median_playtime)) / 60
)
Related Topics
Programmatically Insert Header and Plot in Same Code Chunk with R Markdown Using Results='Asis'
Identifying Where Value Changes in R Data.Frame Column
Reduce File Size of R Markdown HTML Output
Regression (Logistic) in R: Finding X Value (Predictor) for a Particular Y Value (Outcome)
Twitter Data Analysis - Error in Term Document Matrix
How to Calculate Any Negative Number to the Power of Some Fraction in R
How to Put Values on a Boxplot for Median, 1St Quartile and Last Quartile
Align Two Data.Frames Next to Each Other with Knitr
Find All Combinations of Numbers That Sum to a Target
Use Rollapply and Zoo to Calculate Rolling Average of a Column of Variables
Using Dplyr Within a Function, Non-Standard Evaluation
Count Every Possible Pair of Values in a Column Grouped by Multiple Columns
Add Column Containing Data Frame Name to a List of Data Frames