Remove all special characters from a string in R?
You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all
from the stringr
package, though gsub
from base R works just as well.
The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.
x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")
(The base R equivalent is gsub("[[:punct:]]", " ", x)
.)
An alternative is to swap out all non-alphanumeric characters.
str_replace_all(x, "[^[:alnum:]]", " ")
Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.
R: Replace Special Characters
We can match one or more characters that are not alpbabets and replace it with "S"
df$Q2 <- sub("[^A-Za-z]+", "S", df$Q2)
df$Q2
#[1] "aSk" "aSk" "aSk"
Or we capture only the alphabetic characters as a group (([A-Za-z]*
) from the start (*
) of the string, match the following characters that are non-alphabets and replace with the backreference of the captured group followed by "S"
sub("^([A-Za-z]*)[^A-Za-z]+", "\\1S", df$Q2)
#[1] "aSk" "aSk" "aSk"
tidyverse: replacing special characters in string
You can use -
stringr::str_replace_all(Test, pattern = "_", replacement = "\\\\_")
#[1] ".model" "sigma2" "log\\_lik" "AIC" "AICc" "BIC"
#[7] "ar\\_roots" "ma\\_roots"
While printing \
is escaped with another \
so you see two backslash. To see actual string use cat
cat(stringr::str_replace_all(Test, pattern = "_", replacement = "\\\\_"))
#.model sigma2 log\_lik AIC AICc BIC ar\_roots ma\_roots
Or with gsub
-
gsub("_", "\\\\_", Test)
R how to remove VERY special characters in strings?
So, I'm going to go ahead and make an answer, because I believe this is what you're looking for:
> s = "who are í ½í¸€ bringing?"
> rmSpec <- "í|½|€" # The "|" designates a logical OR in regular expressions.
> s.rem <- gsub(rmSpec, "", s) # gsub replace any matches in remSpec and replace them with "".
> s.rem
[1] "who are ¸ bringing?"
Now, this does have the caveat that you have to manually define the special character in the rmSpec
variable. Not sure if you know what special characters to remove or if you're looking for a more general solution.
EDIT:
So it appears you almost had it with iconv
, you were just missing the sub
argument. See below:
> s
[1] "who are í ½í¸€ bringing?"
> s2 <- iconv(s, "UTF-8", "ASCII", sub = "")
> s2
[1] "who are bringing?"
R: str_replace not replacing characters, including special characters (+), within string?
You want to replace based on a fixed string and not a regular expression. To prevent the second argument from being interpreted as a regular expression, use the fixed
function.
str_replace(formula, fixed("d12$cig_tax + ")," ")
# [1] "d12$r_hosp_tp ~ d12$alc_tax + d12$air_temp + d12$x_67 + d12$x_t67 + d12$qs_67 + x_31 "
Replacing a special character does not work with gsub
You have to escape the +
symbol, as it is a regex
command.
> gsub("Ã<U\\+009F>", "REPLACED", "Testing string Ã<U+009F> ")
[1] "Testing string REPLACED "
> gsub("â<U\\+0080><U\\+0093>", "REPLACED", "Testing string â<U+0080><U+0093> ")
[1] "Testing string REPLACED "
Related Topics
How Do We Plot Images at Given Coordinates in R
R 'Inf' When It Has Class 'Date' Is Printing 'Na'
List and Description of All Packages in Cran from Within R
How to Adjust the Font Size of Tablegrob
Specify Position of Geom_Text by Keywords Like "Top", "Bottom", "Left", "Right", "Center"
R: Faceted Bar Chart with Percentages Labels Independent for Each Plot
Findassocs for Multiple Terms in R
Data Difference in 'As.Posixct' with Excel
In R, How to Suppress "Note: No Visible Binding for Global Variable"
Print R-Squared for All of the Models Fit with Lmlist
Understanding Ddply Error Message - Argument "By" Is Missing, with No Default
How to Add Gaussian Curve to Histogram Created with Qplot
How to Always Display 3 Decimal Places in Datatables in R Shiny
How to Multiply a Single Column in a Data.Frame by a Number
Flatten Nested List into 1-Deep List
How to Optimize the Following Code with Nested While-Loop? Multicore an Option