How to Replace the String Exactly Using Gsub()

How do I replace the string exactly using gsub()

As @koshke noted, a very similar question has been answered before (by me). ...But that was grep and this is gsub, so I'll answer it again:

"\<" is an escape sequence for the beginning of a word, and ">" is the end. In R strings you need to double the backslashes, so:

txt <- "a patterned layer within a microelectronic pattern."
txt_replaced <- gsub("\\<pattern\\>","form",txt)
txt_replaced
# [1] "a patterned layer within a microelectronic form."

Or, you could use \b instead of \< and \>. \b matches a word boundary so it can be used at both ends>

txt_replaced <- gsub("\\bpattern\\b","form",txt)

Also note that if you want to replace only ONE occurrence, you should use sub instead of gsub.

Using gsub to replace string and following n words

You could use

gsub("18th legislative period page \\d+ of \\d+", "", string)
# or without the newline symbol '\n'
gsub('\\s{2,}', ' ', gsub("18th legislative period page \\d+ of \\d+", "", string))

R - gsub - replace character exact match

Add the wordboundary (\\b) before and after the word 'amp' so that it won't match any words with a substring 'amp' in it

description_clean_df$description_clean <- gsub("\\bamp\\b", "", 
description_clean_df$description_clean)

How do I do an exact string match using gsub in R?

Use anchors instead here to match the entire string:

sub('^MOUNTAIN$', 'MOUNTAIN VIEW', raw, ignore.case = TRUE)
# [1] "MOUNTAIN VIEW" "MOUNTAIN VIEW"

If you desire, you can also use a capturing group and backreference it inside the replacement call:

sub('^(MOUNTAIN)$', '\\1 VIEW', raw, ignore.case = TRUE)

R: gsub of exact full string with fixed = T

If you want to exactly match full strings, i don't think you really want to use regular expressions in this case. How about just the match() function

fixedTrue<-function(x) {
m <- match(x, exact_orig)
x[!is.na(m)] <- exact_change[m[!is.na(m)]]
x
}

fixedTrue(c("32 oz","oz oz"))
# [1] "32 ct" "oz oz"

gsub replacing string with pattern matching code and not specific string variables

We can use a regex lookaround

sub("(?<=[0-9])(?=[A-Z])", "_", x, perl = TRUE)
#[1] "A12_SITE_1234_J_vvv.csv" "A12_SITA_1234_J_vvv.csv"
#[3] "A12_SITE_1678_H_vvv.csv" "A12_SITE_145_C_vvv.csv"

Or with capture groups ((..)) to capture the pattern as a group and then i n the replacement use the backreference (\\1, \\2) of the captured group

sub("([0-9])([A-Z])", "\\1_\\2", x, perl = TRUE)

In the OP's code, the pattern .* (any characters) followed by a number ([0-9]) and a alphabet ([A-Z]) is not captured, so it gets lost in the replacement. Also, in the replacement, if we use [0-9], it will taken as literal strings

How to find the exact match in for loop using gsub?

Use word boundaries \\b

gsub("\\bjava\\b", "xx", c("my java is", "this javascript is"))
#[1] "my xx is" "this javascript is"

You probably want

ll <- as.list(data$word)
data$new <- data$description
for(i in seq_len(nrow(data))) for(j in seq_along(ll)) {
data$new[i] <- gsub(paste0("\\b", ll[j], "\\b"), "xx", data$new[i],ignore.case = T)
}

Use string.gsub to replace strings, but only whole words

have a way of doing what I want now, but it's inelegant. Is there a better way?

There is an undocumented feature of Lua's pattern matching library called the Frontier Pattern, which will let you write something like this:

function replacetext(source, find, replace, wholeword)
if wholeword then
find = '%f[%a]'..find..'%f[%A]'
end
return (source:gsub(find,replace))
end

local source = 'test testing this test of testicular footest testimation test'
local find = 'test'
local replace = 'XXX'
print(replacetext(source, find, replace, false)) --> XXX XXXing this XXX of XXXicular fooXXX XXXimation XXX
print(replacetext(source, find, replace, true )) --> XXX testing this XXX of testicular footest testimation XXX

gsub back reference and replacement with empty string not identical

You want to extract the substring starting with file and ending with csv at the end of string.

Since gsub replaces the match, and you want to use it as an extraction function, you need to match all the text in the string.

As the text not matched with your regex is at the start of the string, you need to prepend your pattern with .* (this matches any zero or more chars, as many as possible, if you use TRE regex in base R functions, and any zero or more chars other than line break chars in PCRE/ICU regexps used in perl=TRUE powered base R functions and stringr/stringi functions):

vec = c("dir/file_version_1a.csv")
gsub(".*(file.*csv)$", "\\1", vec)

However, stringr::str_extract seems a more natural choice here:

stringr::str_extract(vec, "file.*csv$")
regmatches(vec, regexpr("file.*csv$",vec))

See the R demo online.



Related Topics



Leave a reply



Submit