in R, use gsub to remove all punctuation except period
You can put back some matches like this:
sub("([.-])|[[:punct:]]", "\\1", as.matrix(z))
X..1. X..2.
[1,] "1" "6"
[2,] "2" "7.235"
[3,] "3" "8"
[4,] "4" "9"
[5,] "5" "-10"
Here I am keeping the .
and -
.
And I guess , the next step is to coerce you result to a numeric matrix, SO here I combine the 2 steps like this:
matrix(as.numeric(sub("([.-])|[[:punct:]]", "\\1", as.matrix(z))),ncol=2)
[,1] [,2]
[1,] 1 6.000
[2,] 2 7.235
[3,] 3 8.000
[4,] 4 9.000
[5,] 5 -10.000
Remove all punctuation except apostrophes in R
x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[^[:alnum:][:space:]']", "", x)
[1] "I like to chew gum but don't like bubble gum"
The above regex is much more straight forward. It replaces everything that's not alphanumeric signs, space or apostrophe (caret symbol!) with an empty string.
Remove punctuation from text (except the symbol &)
What about doing the inverse? i.e. replacing everything that is not a letter, a digit or a &
with an empty string:
gsub("[^[:alnum:][:space:]&]", "", data)
# [1] "Type the command AT&W enter in order to save the new protocol on modem"
R: Remove punctuations except squared brackets [ ] and question mark ?
Assuming you have "x" as:
x <- c("Oh nooo!!! I don't like lemons [sad]", "What do [you] think about it?!")
you can try:
gsub("[^\\[\\]\\?[:^punct:]]", "", x, perl = TRUE)
# [1] "Oh nooo I dont like lemons [sad]" "What do [you] think about it?"
How to remove punctuation excluding negations?
We can do it in two steps, remove all punctuation excluding "'"
, then remove "'s"
using fixed match:
gsub("'s", "", gsub("[^[:alnum:][:space:]']", "", s), fixed = TRUE)
Remove punctuation in R but leave punctuation/ sentence markers ! , . , ? at the end of a sentence
Using stringr
and a not-not-statement (thanks to Chris Ruehlemann's comment):
s <- "not funny; - i did not like the movie / film at all (since the actors were terrible). however, i really enjoyed the scenery!"
str_remove_all(s, "[^[^[[:punct:]]]!|.|?]")
[1] "not funny i did not like the movie film at all since the actors were terrible. however i really enjoyed the scenery!"
Remove all punctuation except underline between characters in R with POSIX character class
You can use
gsub("[^_[:^punct:]]|_+\\b|\\b_+", "", test, perl=TRUE)
See the regex demo
Details:
[^_[:^punct:]]
- any punctuation except_
|
- or_+\b
- one or more_
at the end of a word|
- or\b_+
- one or more_
at the start of a word
Removing punctuation except for apostrophes AND intra-word dashes with gsub in R WITHOUT accidently concatenating two words
You can go as far as leaving only leading/trailing whitespace with one function:
gsub("[[:punct:]]* *(\\w+[&'-]\\w+)|[[:punct:]]+ *| {2,}", " \\1", x)
# [1] "Good luck SPRINT I like good deals I can't lie brand-new stuff excites me got to say yo At&t why a dash apostrophe's I can do all-day But preventing concatenating is a new ballgame but why not "
If you're able to use the qdapRegex package, you could do:
library(qdapRegex)
rm_default(x, pattern = "[^ a-zA-Z&'-]|[&'-]{2,}", replacement = " ")
# [1] "Good luck SPRINT I like good deals I can't lie brand-new stuff excites me got to say yo At&t why a dash apostrophe's I can do all-day But preventing concatenating is a new ballgame but why not"
Related Topics
Databricks Configure Using Cmd and R
Ggplot2 Heatmaps: Using Different Gradients for Categories
R on Windows: Character Encoding Hell
Convert Binary String to Binary or Decimal Value
Alternative to R's 'Memory.Size()' in Linux
How to Delete Everything After Nth Delimiter in R
Replace Na Values by Row Means
How to Remove Columns from a Data.Frame
Convert Character to Date *Quickly* in R
How to Fix the Aspect Ratio in Ggplot
How to Draw the Boxplot with Significant Level
Removing Na Observations with Dplyr::Filter()
R: How to Filter/Subset a Sequence of Dates
Error in Plot.Window(...):Need Finite 'Xlim' Values
Group Integer Vector into Consecutive Runs
Using Row-Wise Column Indices in a Vector to Extract Values from Data Frame