R: Replacing Foreign Characters in a String

Remove all special characters from a string in R?

You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

An alternative is to swap out all non-alphanumeric characters.

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.

R: Replace Special Characters

We can match one or more characters that are not alpbabets and replace it with "S"

df$Q2 <- sub("[^A-Za-z]+", "S", df$Q2)
df$Q2
#[1] "aSk" "aSk" "aSk"

Or we capture only the alphabetic characters as a group (([A-Za-z]*) from the start (*) of the string, match the following characters that are non-alphabets and replace with the backreference of the captured group followed by "S"

sub("^([A-Za-z]*)[^A-Za-z]+", "\\1S", df$Q2)
#[1] "aSk" "aSk" "aSk"

tidyverse: replacing special characters in string

You can use -

stringr::str_replace_all(Test, pattern = "_", replacement = "\\\\_")

#[1] ".model" "sigma2" "log\\_lik" "AIC" "AICc" "BIC"
#[7] "ar\\_roots" "ma\\_roots"

While printing \ is escaped with another \ so you see two backslash. To see actual string use cat

cat(stringr::str_replace_all(Test, pattern = "_", replacement = "\\\\_"))

#.model sigma2 log\_lik AIC AICc BIC ar\_roots ma\_roots

Or with gsub -

gsub("_", "\\\\_", Test)

R how to remove VERY special characters in strings?

So, I'm going to go ahead and make an answer, because I believe this is what you're looking for:

> s = "who are í ½í¸€ bringing?"
> rmSpec <- "í|½|€" # The "|" designates a logical OR in regular expressions.
> s.rem <- gsub(rmSpec, "", s) # gsub replace any matches in remSpec and replace them with "".
> s.rem
[1] "who are ¸ bringing?"

Now, this does have the caveat that you have to manually define the special character in the rmSpec variable. Not sure if you know what special characters to remove or if you're looking for a more general solution.

EDIT:

So it appears you almost had it with iconv, you were just missing the sub argument. See below:

> s
[1] "who are í ½í¸€ bringing?"
> s2 <- iconv(s, "UTF-8", "ASCII", sub = "")
> s2
[1] "who are bringing?"

R: str_replace not replacing characters, including special characters (+), within string?

You want to replace based on a fixed string and not a regular expression. To prevent the second argument from being interpreted as a regular expression, use the fixed function.

str_replace(formula, fixed("d12$cig_tax + ")," ")
# [1] "d12$r_hosp_tp ~ d12$alc_tax + d12$air_temp + d12$x_67 + d12$x_t67 + d12$qs_67 + x_31 "

Replacing a special character does not work with gsub

You have to escape the + symbol, as it is a regex command.

> gsub("Ã<U\\+009F>", "REPLACED", "Testing string Ã<U+009F> ")
[1] "Testing string REPLACED "

> gsub("â<U\\+0080><U\\+0093>", "REPLACED", "Testing string â<U+0080><U+0093> ")
[1] "Testing string REPLACED "


Related Topics



Leave a reply



Submit