How do I replace the string exactly using gsub()
As @koshke noted, a very similar question has been answered before (by me). ...But that was grep
and this is gsub
, so I'll answer it again:
"\<" is an escape sequence for the beginning of a word, and ">" is the end. In R strings you need to double the backslashes, so:
txt <- "a patterned layer within a microelectronic pattern."
txt_replaced <- gsub("\\<pattern\\>","form",txt)
txt_replaced
# [1] "a patterned layer within a microelectronic form."
Or, you could use \b
instead of \<
and \>
. \b
matches a word boundary so it can be used at both ends>
txt_replaced <- gsub("\\bpattern\\b","form",txt)
Also note that if you want to replace only ONE occurrence, you should use sub
instead of gsub
.
Using gsub to replace string and following n words
You could use
gsub("18th legislative period page \\d+ of \\d+", "", string)
# or without the newline symbol '\n'
gsub('\\s{2,}', ' ', gsub("18th legislative period page \\d+ of \\d+", "", string))
R - gsub - replace character exact match
Add the wordboundary (\\b
) before and after the word 'amp' so that it won't match any words with a substring 'amp' in it
description_clean_df$description_clean <- gsub("\\bamp\\b", "",
description_clean_df$description_clean)
How do I do an exact string match using gsub in R?
Use anchors instead here to match the entire string:
sub('^MOUNTAIN$', 'MOUNTAIN VIEW', raw, ignore.case = TRUE)
# [1] "MOUNTAIN VIEW" "MOUNTAIN VIEW"
If you desire, you can also use a capturing group and backreference it inside the replacement call:
sub('^(MOUNTAIN)$', '\\1 VIEW', raw, ignore.case = TRUE)
R: gsub of exact full string with fixed = T
If you want to exactly match full strings, i don't think you really want to use regular expressions in this case. How about just the match()
function
fixedTrue<-function(x) {
m <- match(x, exact_orig)
x[!is.na(m)] <- exact_change[m[!is.na(m)]]
x
}
fixedTrue(c("32 oz","oz oz"))
# [1] "32 ct" "oz oz"
gsub replacing string with pattern matching code and not specific string variables
We can use a regex lookaround
sub("(?<=[0-9])(?=[A-Z])", "_", x, perl = TRUE)
#[1] "A12_SITE_1234_J_vvv.csv" "A12_SITA_1234_J_vvv.csv"
#[3] "A12_SITE_1678_H_vvv.csv" "A12_SITE_145_C_vvv.csv"
Or with capture groups ((..)
) to capture the pattern as a group and then i n the replacement use the backreference (\\1, \\2
) of the captured group
sub("([0-9])([A-Z])", "\\1_\\2", x, perl = TRUE)
In the OP's code, the pattern .*
(any characters) followed by a number ([0-9]
) and a alphabet ([A-Z]
) is not captured, so it gets lost in the replacement. Also, in the replacement, if we use [0-9]
, it will taken as literal strings
How to find the exact match in for loop using gsub?
Use word boundaries \\b
gsub("\\bjava\\b", "xx", c("my java is", "this javascript is"))
#[1] "my xx is" "this javascript is"
You probably want
ll <- as.list(data$word)
data$new <- data$description
for(i in seq_len(nrow(data))) for(j in seq_along(ll)) {
data$new[i] <- gsub(paste0("\\b", ll[j], "\\b"), "xx", data$new[i],ignore.case = T)
}
Use string.gsub to replace strings, but only whole words
have a way of doing what I want now, but it's inelegant. Is there a better way?
There is an undocumented feature of Lua's pattern matching library called the Frontier Pattern, which will let you write something like this:
function replacetext(source, find, replace, wholeword)
if wholeword then
find = '%f[%a]'..find..'%f[%A]'
end
return (source:gsub(find,replace))
end
local source = 'test testing this test of testicular footest testimation test'
local find = 'test'
local replace = 'XXX'
print(replacetext(source, find, replace, false)) --> XXX XXXing this XXX of XXXicular fooXXX XXXimation XXX
print(replacetext(source, find, replace, true )) --> XXX testing this XXX of testicular footest testimation XXX
gsub back reference and replacement with empty string not identical
You want to extract the substring starting with file
and ending with csv
at the end of string.
Since gsub
replaces the match, and you want to use it as an extraction function, you need to match all the text in the string.
As the text not matched with your regex is at the start of the string, you need to prepend your pattern with .*
(this matches any zero or more chars, as many as possible, if you use TRE regex in base R functions, and any zero or more chars other than line break chars in PCRE/ICU regexps used in perl=TRUE
powered base R functions and stringr
/stringi
functions):
vec = c("dir/file_version_1a.csv")
gsub(".*(file.*csv)$", "\\1", vec)
However, stringr::str_extract
seems a more natural choice here:
stringr::str_extract(vec, "file.*csv$")
regmatches(vec, regexpr("file.*csv$",vec))
See the R demo online.
Related Topics
Find the Nearest X,Y Coordinate Using R
Why Is R Dplyr::Mutate Inconsistent with Custom Functions
Wrapping Base R Reshape for Ease-Of-Use
Mgcv Gam() Error: Model Has More Coefficients Than Data
Split Concatenated Column to Corresponding Column Positions
Understanding Lm and Environment
Creating Shiny Reactive Variable That Indicates Which Widget Was Last Modified
Visualising and Rotating a Matrix
R - Set Execution Time Limit in Loop
Scale Back Linear Regression Coefficients in R from Scaled and Centered Data
Legend Venn Diagram in Venneuler
How to Write an Xts Object Using Write.CSV in R
Find All Sequences with the Same Column Value
How to Round a Date to the Quarter Start/End
R: I Have to Do Softmatch in String