Making Gsub Only Replace Entire Words

Making gsub only replace entire words?

You are so close to getting this. You're already using paste to form the replacement string, why not use it to form the pattern string?

goodwords.corpus <- c("good")
test <- "I am having a good time goodnight"
for (i in 1:length(goodwords.corpus)){
test <-gsub(paste0('\\<', goodwords.corpus[[i]], '\\>'), paste(goodwords.corpus[[i]], "1234"), test)
}
test
# [1] "I am having a good 1234 time goodnight"

(paste0 is merely paste(..., sep='').)

(I posted this the same time as @MatthewLundberg, and his is also correct. I'm actually more familiar with using \b vice \<, but I thought I'd continue with using your code.)

Replace the whole word that starts with a pattern using gsub in R

You can use a negative look ahead provided by perl.. pattern=wasn(?!')t*

gsub("wasn(?!')t*","wasn't",test,perl=T)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't."

or you can do:

gsub("wasn'*t*","wasn't",test)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't."

For the second desired output:

gsub("wasn'*t*[.]?","wasn't",test)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't"

AFTER THE EDIT:

gsub("wasn[^. ]*","wasn't",test)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't. this wasn't meant to be. it wasn't simple"

How to replace entire characters with a specific pattern using gsub

Assume your string is like this

blah <- "blah AC_p3_s01_s24, blah, AC_p3_c01_c24, blah"

Then doing this:

gsub("AC_p3\\S*", "11", blah)

Gives you this:

# [1] "blah 11 blah, 11 blah"

Replace a whole word containing a pattern - gsub and R

You need

gsub("\\s*[[:alpha:]]*([[:alpha:]])\\1{2}[[:alpha:]]*", "", string)
gsub("\\s*\\p{L}*(\\p{L})\\1{2}\\p{L}*", "", string, perl=TRUE)
stringr::str_replace_all(string, "\\s*\\p{L}*(\\p{L})\\1{2}\\p{L}*", "")

See an R demo:

string <- "This is a baaaad unnnnecessary short word"
gsub("\\s*[[:alpha:]]*([[:alpha:]])\\1{2}[[:alpha:]]*", "", string)
gsub("\\s*\\p{L}*(\\p{L})\\1{2}\\p{L}*", "", string, perl=TRUE)
library(stringr)
str_replace_all(string, "\\s*\\p{L}*(\\p{L})\\1{2}\\p{L}*", "")

All yielding [1] "This is a short word".

See the regex demo. Regex details:

  • \s* - zero or more whitespaces
  • \p{L}* / [[:alpha:]]* - zero or more letters
  • (\p{L}) - Capturing group 1: any single letter
  • \1{2} - two occurrences of the same value as in Group 1
  • \p{L}* / [[:alpha:]]* - zero or more letters.

Use string.gsub to replace strings, but only whole words

have a way of doing what I want now, but it's inelegant. Is there a better way?

There is an undocumented feature of Lua's pattern matching library called the Frontier Pattern, which will let you write something like this:

function replacetext(source, find, replace, wholeword)
if wholeword then
find = '%f[%a]'..find..'%f[%A]'
end
return (source:gsub(find,replace))
end

local source = 'test testing this test of testicular footest testimation test'
local find = 'test'
local replace = 'XXX'
print(replacetext(source, find, replace, false)) --> XXX XXXing this XXX of XXXicular fooXXX XXXimation XXX
print(replacetext(source, find, replace, true )) --> XXX testing this XXX of testicular footest testimation XXX

Substituting whole words only using Regex in Lua

One possible solution that works both ways:

str = str:gsub("%a+", {he = "she", she = "he"})

%a+ matches one or more letter which is basically word. that match is replaced by the respective table entry or stays unchanged.

There are other ways to to this but this is probably the shortest way to achieve a two-wway solution.

Edit:

About the second param. I couldn't find any documentation about this.
Do you have any?

Not sure where you've been looking for documentation but the Lua Manual says:

string.gsub (s, pattern, repl [, n])

...

If repl is a table, then the table is queried for every match, using
the first capture as the key.



Related Topics



Leave a reply



Submit