Match and Replace Multiple Strings in a Vector of Text Without Looping in R

Match and replace multiple strings in a vector of text without looping in R

1) gsubfn gsubfn in the gsubfn package is like gsub except the replacement string can be a character string, list, function or proto object. If its a list it will replace each matched string with the component of the list whose name equals the matched string.

library(gsubfn)
gsubfn("\\S+", setNames(as.list(b), a), c)

giving:

[1] "i am going to the party" "he would go too"    

2) gsub For a solution with no packages try this loop:

cc <- c
for(i in seq_along(a)) cc <- gsub(a[i], b[i], cc, fixed = TRUE)

giving:

> cc
[1] "i am going to the party" "he would go too"

R - How to replace a string from multiple matches (in a data frame)

Edit

Based on the input from Sri's comment I would suggest using:

library(gsubfn)
# words to be replaced
a <-c("Whats your","Whats your name", "name", "fro")
# their replacements
b <- c("What is yours","what is your name","names","froth")
# named list as an input for gsubfn
replacements <- setNames(as.list(b), a)
# the test string
input_string = "fro Whats your name and Where're name you from to and fro I Whats your"
# match entire words
gsubfn(paste(paste0("\\w*", names(replacements), "\\w*"), collapse = "|"), replacements, input_string)

Original

I would not say this is easier to read than your simple loop, but it might take better care of the overlapping replacements:

# define the sample dataset
input_string = "Whats your name and Where're you from"
matching <- data.frame(from_word=c("Whats your name", "name", "fro", "Where're", "Whats"),
to_word=c("what is your name","names","froth", "where are", "Whatsup"))

# load used library
library(gsubfn)

# make sure data is of class character
matching$from_word <- as.character(matching$from_word)
matching$to_word <- as.character(matching$to_word)

# extract the words in the sentence
test <- unlist(str_split(input_string, " "))
# find where individual words from sentence match with the list of replaceble words
test2 <- sapply(paste0("\\b", test, "\\b"), grepl, matching$from_word)
# change rownames to see what is the format of output from the above sapply
rownames(test2) <- matching$from_word
# reorder the data so that largest replacement blocks are at the top
test3 <- test2[order(rowSums(test2), decreasing = TRUE),]
# where the word is already being replaced by larger chunk, do not replace again
test3[apply(test3, 2, cumsum) > 1] <- FALSE

# define the actual pairs of replacement
replacements <- setNames(as.list(as.character(matching[,2])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1]),
as.character(matching[,1])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1])

# perform the replacement
gsubfn(paste(as.character(matching[,1])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1], collapse = "|"),
replacements,input_string)

Replacing multiple strings in a character vector withoug using a loop in R

You could use stringr.

As mentioned in ?str_replace:

To perform multiple replacements in each element of string, pass a
named vector (c(pattern1 = replacement1)) to str_replace_all.

So in your case:

library(stringr)

str_replace_all(script, setNames(sapply(pairs, "[[", 2), sapply(pairs, "[[", 1)))
# [1] "This is Depression with a mean of 10.1"

Avoid for loop in string replacement?

I'll bet there's another way to do this, but my first thought was gsubfn:

my_repl <- function(x){
switch(x,a = "[this was an a]",
b = "[this was a b]",
c = "[this was a c]",
z = "[this was a z]")
}

library(gsubfn)
start_string <- sample(letters[1:10], 10)
gsubfn("a|b|c|z",my_repl,x = start_string)

If the patterns you are search for a acceptably valid names for list elements, this will also work:

names(my_replacement) <- my_pattern
gsubfn("a|b|c|z",as.list(my_replacement),start_string)

Edit

But frankly, if I really had to do this a lot in my own code, I would probably just do the for loop thing, wrapped in a function. Here's a simple version using sub and gsub rather than the functions from stringr:

vsub <- function(pattern,replacement,x,all = TRUE,...){
FUN <- if (all) gsub else sub
for (i in seq_len(min(length(pattern),length(replacement)))){
x <- FUN(pattern = pattern[i],replacement = replacement[i],x,...)
}
x
}

vsub(my_pattern,my_replacement,start_string)

But of course, one of the reasons that there isn't a built-in function for this that's well known is probably that sequential replacements like this can't be pretty fragile, because they are so order dependent:

vsub(rev(my_pattern),rev(my_replacement),start_string)
[1] "i" "[this w[this was an a]s [this was an a] c]"
[3] "[this was an a]" "g"
[5] "j" "d"
[7] "f" "[this w[this was an a]s [this was an a] b]"
[9] "h" "e"

R: pass a vector of strings to replace all instances within a string

We can use gsubfn if we need to replace with numbers.

 library(gsubfn)
gsubfn("\\w+", as.list(setNames(1:3, numlist)), mystring)
#[1] "I have 1 cat, 2 dogs and 3 rabbits"

EDIT: I thought that we need to replace with numbers that corresponds to the words in 'numlist'. But, iff we need to replace with ##NUMBER## flag, one option is mgsub

 library(qdap)
mgsub(numlist, "##NUMBER##", mystring)
#[1] "I have ##NUMBER## cat, ##NUMBER## dogs and ##NUMBER## rabbits"

Replace multiple words in multiple strings

library(stringi)

stri_replace_all_regex(my_words, "\\b" %s+% my_replace$original %s+% "\\b", my_replace$replacement, vectorize_all = FALSE)

[1] "example R" "example R" "example R" "anthoer R" "now a C" "and another C" "example R tributary"

replace parts of a string with a vector

possible baseR-solution using sprintf()

animals = c("chickens","ducks") 
frequency = c(35,12)

sprintf( "%s has frequency of %s", animals, frequency)

[1] "chickens has frequency of 35" "ducks has frequency of 12"

also,

tex = "%s has frequency of %s"
sprintf( tex, animals, frequency )

will gave the same results.

Replace multiple strings/values based on separate list

Sometimes it helps to temporarily reshape the data. That way we can operate on all the X and Y values without iterating over them.

library(stringr)
library(tidyr)

## some data to work with
exd <- read.csv(text = "EVENT,ID,GROUP,YEAR,X.1,X.2,X.3,Y.1,Y.2,Y.3
1,1 John Smith,GROUP1,2015,19 John Smith,11 Adam Smith,9 Sam Smith,5 George Smith,13 Mike Smith,12 Luke Smith
2,2 John Smith,GROUP1,2015,1 George Smith,9 Luke Smith,19 Adam Smith,7 Sam Smith,17 Mike Smith,11 John Smith
3,3 John Smith,GROUP1,2015,5 George Smith,18 John Smith,12 Sam Smith,6 Luke Smith,2 Mike Smith,4 Adam Smith",
stringsAsFactors = FALSE)

## re-arrange to put X and Y columns into a single column
exd <- gather(exd, key = "var", value = "value", X.1, X.2, X.3, Y.1, Y.2, Y.3)

## find the X and Y values that contain the ID name
matches <- str_detect(exd$value, str_replace_all(exd$ID, "^\\d+ *", ""))

## replace X and Y values with the matching ID
exd[matches, "value"] <- exd$ID[matches]

## put it back in the original shape
exd <- spread(exd, key = "var", value = value)

exd
## EVENT ID GROUP YEAR X.1 X.2 X.3 Y.1 Y.2 Y.3
## 1 1 1 John Smith GROUP1 2015 1 John Smith 11 Adam Smith 9 Sam Smith 5 George Smith 13 Mike Smith 12 Luke Smith
## 2 2 2 John Smith GROUP1 2015 1 George Smith 9 Luke Smith 19 Adam Smith 7 Sam Smith 17 Mike Smith 2 John Smith
## 3 3 3 John Smith GROUP1 2015 5 George Smith 3 John Smith 12 Sam Smith 6 Luke Smith 2 Mike Smith 4 Adam Smith


Related Topics



Leave a reply



Submit