several substitutions in one line R
By adding 2 to -1, 0, and 1, you could get indices into a vector of the desired outcomes:
c("no", "maybe", "yes")[dat + 2]
# [1] "no" "yes" "maybe" "yes" "yes" "no"
A related option could make use of the match
function to figure out the indexing:
c("no", "maybe", "yes")[match(dat, -1:1)]
# [1] "no" "yes" "maybe" "yes" "yes" "no"
Alternately, you could use a named vector for recoding:
unname(c("-1"="no", "0"="maybe", "1"="yes")[as.character(dat)])
# [1] "no" "yes" "maybe" "yes" "yes" "no"
You could also use a nested ifelse
:
ifelse(dat == -1, "no", ifelse(dat == 0, "maybe", "yes"))
# [1] "no" "yes" "maybe" "yes" "yes" "no"
If you don't mind loading a new package, the Recode
function from the car
package does this:
library(car)
Recode(dat, "-1='no'; 0='maybe'; 1='yes'")
# [1] "no" "yes" "maybe" "yes" "yes" "no"
Data:
dat <- c(-1, 1, 0, 1, 1, -1)
Note that all but the first will work if dat
were stored as a string; in the first you would need to use as.numeric(dat)
.
If code clarity is your main objective, then you should pick the one that you find easiest to understand -- I would personally pick the second or last but that is personal preference.
If code speed is of interest, then you can benchmark the solutions. Here's the benchmarks of the five options I've presented, also including the two other solutions currently posted as other answers, benchmarked on a random vector of length 100k:
set.seed(144)
dat <- sample(c(-1, 0, 1), replace=TRUE, 100000)
opt1 <- function(dat) c("no", "maybe", "yes")[dat + 2]
opt2 <- function(dat) c("no", "maybe", "yes")[match(dat, -1:1)]
opt3 <- function(dat) unname(c("-1"="no", "0"="maybe", "1"="yes")[as.character(dat)])
opt4 <- function(dat) ifelse(dat == -1, "no", ifelse(dat == 0, "maybe", "yes"))
opt5 <- function(dat) Recode(dat, "-1='no'; 0='maybe'; 1='yes'")
AnandaMahto <- function(dat) factor(dat, levels = c(-1, 0, 1), labels = c("no", "maybe", "yes"))
hrbrmstr <- function(dat) sapply(as.character(dat), switch, `-1`="no", `0`="maybe", `1`="yes", USE.NAMES=FALSE)
library(microbenchmark)
microbenchmark(opt1(dat), opt2(dat), opt3(dat), opt4(dat), opt5(dat), AnandaMahto(dat), hrbrmstr(dat))
# Unit: milliseconds
# expr min lq mean median uq max neval
# opt1(dat) 1.513500 2.553022 2.763685 2.656010 2.837673 4.384149 100
# opt2(dat) 2.153438 3.013502 3.251850 3.117058 3.269230 5.851234 100
# opt3(dat) 59.716271 61.890470 64.978685 62.509046 63.723048 144.708757 100
# opt4(dat) 62.934734 64.715815 71.181477 65.652195 71.123384 123.840577 100
# opt5(dat) 82.976441 84.849147 89.071808 85.752429 88.473162 155.347273 100
# AnandaMahto(dat) 57.267227 58.643889 60.508402 59.065642 60.368913 80.852157 100
# hrbrmstr(dat) 137.883307 148.626496 158.051220 153.441243 162.594752 228.271336 100
The first two options appear to be more than an order of magnitude quicker than any of the other options, though either the vector would have to be pretty huge or you would need to be repeating the operation a number of times for any of this to make a difference.
As pointed out by @AnandaMahto, these results are qualitatively different if we have character input instead of numeric input:
set.seed(144)
dat <- sample(c("-1", "0", "1"), replace=TRUE, 100000)
opt1 <- function(dat) c("no", "maybe", "yes")[as.numeric(dat) + 2]
opt2 <- function(dat) c("no", "maybe", "yes")[match(dat, -1:1)]
opt3 <- function(dat) unname(c("-1"="no", "0"="maybe", "1"="yes")[as.character(dat)])
opt4 <- function(dat) ifelse(dat == -1, "no", ifelse(dat == 0, "maybe", "yes"))
opt5 <- function(dat) Recode(dat, "-1='no'; 0='maybe'; 1='yes'")
AnandaMahto <- function(dat) factor(dat, levels = c(-1, 0, 1), labels = c("no", "maybe", "yes"))
hrbrmstr <- function(dat) sapply(dat, switch, `-1`="no", `0`="maybe", `1`="yes", USE.NAMES=FALSE)
library(microbenchmark)
microbenchmark(opt1(dat), opt2(dat), opt3(dat), opt4(dat), opt5(dat), AnandaMahto(dat), hrbrmstr(dat))
# Unit: milliseconds
# expr min lq mean median uq max neval
# opt1(dat) 8.397194 9.519075 10.784108 9.693706 10.163203 55.78417 100
# opt2(dat) 2.281438 3.091418 4.231162 3.210794 3.436038 49.39879 100
# opt3(dat) 3.606863 5.481115 6.466393 5.720282 6.344651 48.47924 100
# opt4(dat) 66.819638 69.996704 74.596960 71.290522 73.404043 127.52415 100
# opt5(dat) 32.897019 35.701401 38.488489 36.336489 38.950272 88.20915 100
# AnandaMahto(dat) 1.329443 2.114504 2.824306 2.275736 2.493907 46.19333 100
# hrbrmstr(dat) 81.898572 91.043729 154.331766 100.006203 141.425717 1594.17447 100
Now, the factor
solution proposed by @AnandaMahto is the quickest, followed by vector indexing with match
and named vector lookup. Again, all runtimes are fast enough that you would need a large vector or many runs for any of this to matter.
Replace multiple strings in one gsub() or chartr() statement in R?
You can use gsubfn
library(gsubfn)
gsubfn(".", list("'" = "", " " = "_"), x)
# [1] "ab_c"
Similarly, we can also use mgsub
which allows multiple replacement with multiple pattern to search
mgsub::mgsub(x, c("'", " "), c("", "_"))
#[1] "ab_c"
How to replace multiple strings with the same in R
sub("blue|red", "colour", vec)
use "|" (which means the logical OR
operator) between the words you want to substitute.
Use sub
to change only the first occurence and gsub
to change multiple occurences within the same string.
Type ?gsub
into R console for more information.
Replace multiple strings in a column of a data frame
You can do the following to add as many pattern-replacement pairs as you want in one line.
library(stringr)
vec <- c("Absent", "Absent", "Present", "Present", "XX", "YY", "ZZ")
str_replace_all(vec, c("Absent" = "A", "Present" = "P"))
# [1] "A" "A" "P" "P" "XX" "YY" "ZZ"
Replace one symbol in an expression with multiple values
Here's a take on a splicing function
splice <- function(x, replacements) {
if (is(x, "call")) {
as.call(do.call("c",lapply(as.list(x), splice, replacements), quote=T))
} else if (is(x, "name")) {
if (deparse(x) %in% names(replacements)) {
return(replacements[[deparse(x)]])
} else {
list(x)
}
} else {
list(x)
}
}
It seems to work with the sample input
splice(quote(f(x, 5) ), list(x=list(a = 1, b = quote(sym), c = "char" )))
# f(a = 1, b = sym, c = "char", 5)
splice(quote(g(f(h(y)), z)) , list(y=list(1,2,3)))
# g(f(h(1, 2, 3)), z)
splice(quote(g(f(h(y), z), z)), list(z=list(4, quote(x))) )
# g(f(h(y), 4, x), 4, x)
Basically you just swap out the symbol names. it should also work with single variable replacements that aren't in a list.
splice(quote(f(x,5)), list(x=7))
# f(7, 5)
You basically need to re-write the call by manipulating it as a list. This is what the tidyverse functions are doing behind the scene. They intercept the current call, re-write it, then evaluate the newly expanded call. substitute
will never work because you aren't just replacing one symbol with one value. You need to change the number of parameters you are passing to a function.
How to substitute multiple characters in a string in R?
A option is to use gsubfn
library(gsubfn)
gsubfn("\\w\\s\\w", setNames(as.list(c), sapply(c, function(x) gsub("-", " ", x))), s)
#[1] "a-bc de fg-hij-klmn 123-45 789"
Explanation: We match \\w\\s\\w
and replace them with patterns specified in the list
setNames(as.list(c), sapply(c, function(x) gsub("-", " ", x)))
#$`a b`
#[1] "a-b"
#
#$`g h`
#[1] "g-h"
#
#$`j k`
#[1] "j-k"
#
#$`x z`
#[1] "x-z"
#
#$`y 5`
#[1] "y-5"
#
#$`3 4`
#[1] "3-4"
Or even shorter (thanks to @Wen-Ben)
gsubfn("\\w\\s\\w", setNames(as.list(c), gsub("-", " ", c)), s)
How to replace multiple values at once
A possible solution using match
:
old <- 1:8
new <- c(2,4,6,8,1,3,5,7)
x[x %in% old] <- new[match(x, old, nomatch = 0)]
which gives:
> x
[1] 8 4 0 5 1 5 7 9
What this does:
- Create two vectors:
old
with the values that need to be replaced andnew
with the corresponding replacements. - Use
match
to see where values fromx
occur inold
. Usenomatch = 0
to remove theNA
's. This results in an indexvector of the position inold
for thex
values - This index vector can then be used to index
new
. - Only assign the values from
new
to the positions ofx
that are present inold
:x[x %in% old]
R - using substitute within a nested function
Here is a general outline that should help you solve your problem:
Inner <- function(x) {
my.call <- quote(substitute(x)) # we quote this here because we are going to re-use this expression
var.name <- eval(my.call)
for(i in rev(head(sys.frames(), -1L))) { # First frame doesn't matter since we already substituted for first level, reverse since sys.frames is in order of evaluation, and we want to go in reverse order
my.call[[2]] <- var.name # this is where we re-use it, modified to replace the variable
var.name <- eval(my.call, i)
}
return(var.name)
}
Outer <- function(y) Inner(y)
Outer2 <- function(z) Outer(z)
Now let's run the functions:
Inner(1 + 1)
# 1 + 1
Outer(2 + 2)
# 2 + 2
Outer2(3 + 3)
# 3 + 3
Inner always returns the outermost expression (you don't see y
or z
ever, just the expression as typed in .GlobalEnv
.
The trick here is to use sys.frames()
, and repeatedly substitute
until we get to the top level.
Note this assumes that all the "Outer" functions just forward their argument on to the next inner one. Things likely get a lot more complicated / impossible if you have something like:
Outer <- function(y) Inner(y + 1)
This code does not check for that type of issue, but you probably should in your code. Also, keep in mind that the assumption here is that your functions will only be called from the R command line. If someone wraps their functions around yours, you might get unexpected results.
Replace specific characters within strings
With a regular expression and the function gsub()
:
group <- c("12357e", "12575e", "197e18", "e18947")
group
[1] "12357e" "12575e" "197e18" "e18947"
gsub("e", "", group)
[1] "12357" "12575" "19718" "18947"
What gsub
does here is to replace each occurrence of "e"
with an empty string ""
.
See ?regexp
or gsub
for more help.
Can perl replace multiple keywords with their own substitute word in one go?
perl -E 'my %h = qw(apple green foo bar); say "apple foo" =~ s/(apple|foo)/$h{$1}/rge;'
Related Topics
Rename Columns in Multiple Dataframes, R
How to Filter on Partial Match Using Sparklyr
Fill in Na Based on the Last Non-Na Value for Each Group in R
Reduce Space Between Grid.Arrange Plots
Using R - Delete Rows When a Value Repeated Less Than 3 Times
Extracting Common Character Strings from Multiple Vectors of Different Lengths
Use Csl-File for PDF-Output in Bookdown
In Shiny Apps for R, How to Delay the Firing of a Reactive
Get the Vector of Values from Different Columns of a Matrix
Plot Negative Values in Logarithmic Scale with Ggplot 2
Cbind Two Lists of Data.Frames to a New List
How to Build a Crossword-Like Plot for a Boolean Matrix
Filling Bars in Barplot with Textiles in Ggplot2
Using Functions and Environments
Error in Terms.Formula(Formula):'.' in Formula and No 'Data' Argument
How to Get Discrete Factor Levels to Be Treated as Continuous