In R, Use Gsub to Remove All Punctuation Except Period

in R, use gsub to remove all punctuation except period

You can put back some matches like this:

 sub("([.-])|[[:punct:]]", "\\1", as.matrix(z))
X..1. X..2.
[1,] "1" "6"
[2,] "2" "7.235"
[3,] "3" "8"
[4,] "4" "9"
[5,] "5" "-10"

Here I am keeping the . and -.

And I guess , the next step is to coerce you result to a numeric matrix, SO here I combine the 2 steps like this:

matrix(as.numeric(sub("([.-])|[[:punct:]]", "\\1", as.matrix(z))),ncol=2)
[,1] [,2]
[1,] 1 6.000
[2,] 2 7.235
[3,] 3 8.000
[4,] 4 9.000
[5,] 5 -10.000

Remove all punctuation except apostrophes in R


x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[^[:alnum:][:space:]']", "", x)

[1] "I like to chew gum but don't like bubble gum"

The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.

Remove punctuation from text (except the symbol &)

What about doing the inverse? i.e. replacing everything that is not a letter, a digit or a & with an empty string:

gsub("[^[:alnum:][:space:]&]", "", data)
# [1] "Type the command AT&W enter in order to save the new protocol on modem"

R: Remove punctuations except squared brackets [ ] and question mark ?

Assuming you have "x" as:

x <- c("Oh nooo!!! I don't like lemons [sad]", "What do [you] think about it?!")

you can try:

gsub("[^\\[\\]\\?[:^punct:]]", "", x, perl = TRUE)
# [1] "Oh nooo I dont like lemons [sad]" "What do [you] think about it?"

How to remove punctuation excluding negations?

We can do it in two steps, remove all punctuation excluding "'", then remove "'s" using fixed match:

gsub("'s", "", gsub("[^[:alnum:][:space:]']", "", s), fixed = TRUE)

Remove punctuation in R but leave punctuation/ sentence markers ! , . , ? at the end of a sentence

Using stringr and a not-not-statement (thanks to Chris Ruehlemann's comment):

s <- "not funny; - i did not like the movie / film at all (since the actors were terrible). however, i really enjoyed the scenery!"

str_remove_all(s, "[^[^[[:punct:]]]!|.|?]")
[1] "not funny i did not like the movie film at all since the actors were terrible. however i really enjoyed the scenery!"

Remove all punctuation except underline between characters in R with POSIX character class

You can use

gsub("[^_[:^punct:]]|_+\\b|\\b_+", "", test, perl=TRUE)

See the regex demo

Details:

  • [^_[:^punct:]] - any punctuation except _
  • | - or
  • _+\b - one or more _ at the end of a word
  • | - or
  • \b_+ - one or more _ at the start of a word

Removing punctuation except for apostrophes AND intra-word dashes with gsub in R WITHOUT accidently concatenating two words

You can go as far as leaving only leading/trailing whitespace with one function:

gsub("[[:punct:]]* *(\\w+[&'-]\\w+)|[[:punct:]]+ *| {2,}", " \\1", x)
# [1] "Good luck SPRINT I like good deals I can't lie brand-new stuff excites me got to say yo At&t why a dash apostrophe's I can do all-day But preventing concatenating is a new ballgame but why not "

If you're able to use the qdapRegex package, you could do:

library(qdapRegex)
rm_default(x, pattern = "[^ a-zA-Z&'-]|[&'-]{2,}", replacement = " ")
# [1] "Good luck SPRINT I like good deals I can't lie brand-new stuff excites me got to say yo At&t why a dash apostrophe's I can do all-day But preventing concatenating is a new ballgame but why not"


Related Topics



Leave a reply



Submit