How to Remove Specific Special Characters in R

Remove all special characters from a string in R?

You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

An alternative is to swap out all non-alphanumeric characters.

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.

How to remove specific special characters in R

gsub("[^[:alnum:][:blank:]+?&/\\-]", "", c)
# [1] "In Acid-base reaction page4 why does it create water and not H+?"

Remove special characters from entire dataframe in R

Another solution is to convert the data frame to a matrix first then run the gsub and then convert back to a data frame as follows:

as.data.frame(gsub("[[:punct:]]", "", as.matrix(df))) 

R how to remove VERY special characters in strings?

So, I'm going to go ahead and make an answer, because I believe this is what you're looking for:

> s = "who are í ½í¸€ bringing?"
> rmSpec <- "í|½|€" # The "|" designates a logical OR in regular expressions.
> s.rem <- gsub(rmSpec, "", s) # gsub replace any matches in remSpec and replace them with "".
> s.rem
[1] "who are ¸ bringing?"

Now, this does have the caveat that you have to manually define the special character in the rmSpec variable. Not sure if you know what special characters to remove or if you're looking for a more general solution.

EDIT:

So it appears you almost had it with iconv, you were just missing the sub argument. See below:

> s
[1] "who are í ½í¸€ bringing?"
> s2 <- iconv(s, "UTF-8", "ASCII", sub = "")
> s2
[1] "who are bringing?"

R: Replace Special Characters

We can match one or more characters that are not alpbabets and replace it with "S"

df$Q2 <- sub("[^A-Za-z]+", "S", df$Q2)
df$Q2
#[1] "aSk" "aSk" "aSk"

Or we capture only the alphabetic characters as a group (([A-Za-z]*) from the start (*) of the string, match the following characters that are non-alphabets and replace with the backreference of the captured group followed by "S"

sub("^([A-Za-z]*)[^A-Za-z]+", "\\1S", df$Q2)
#[1] "aSk" "aSk" "aSk"

Remove special characters and numbers from column R

Remove X. and digits

str_remove_all(df$c, "[X.]|[:digit:]")
#> [1] "Int" "BI" "Int" "BI" "Int"

inside mutate:

df %>% 
mutate(c = str_remove_all(c, "[X.]|[:digit:]"))
#> c d
#> 1 Int 4
#> 2 BI 1
#> 3 Int 2
#> 4 BI 3
#> 5 Int 5

Remove special characters in R from .docx

There are several things that make this hard:

  1. You want to replace characters by something that's generally the same, not just converting encoding. In your example, "<e1><b8><9d>" does not stand for an "e", it stand for a complicated version of an "e", meaning R won't just change it. But there are functions to do that
  2. It looks like qdap.transcript tries to be helpful. At least what you show here, and your results are consistent with, them not being special characters, but just literally being "<e1><b8><9d>". So if you try to remove special characters, gsub happily complies, and removes the "<" and ">", leaving "e1" and so forth alone.

To solve your problem, I think you want to convert back to special characters, and then use stri_trans_general from the stringi package. I'm sure there are other likewise functions out there, but this one works for me. It turns out converting back to the special characters is the hard part, but I've got some working code:

library(stringi)
mystring <- 'If anyone knows how to simply change these special characters (i.e <e1><b8><9d> to e), again please feel free to update!'
pos <- gregexpr('(<[A-Fa-f0-9]{2}>)+', mystring)[[1]]

replace <- substring(mystring, pos, pos+attr(pos, 'match.length')-1)
replace <- sapply(replace, function(r) {
eval(parse(text=paste0('\'', gsub('>', '', gsub('<', '\\\\x', r)), '\'')))
})
for(i in seq_along(replace)) {
mystring <- sub('(<[A-Fa-f0-9]{2}>)+', replace[i], mystring)
}
mystring <- stri_trans_general(mystring, 'latin-ascii')

We first extract everything that looks like hexadecimals between "<" and ">", then convert them to literal "\xe1\xb8\x9d", and then ask R to process that, and replace the old values with those replacements.

Only at the last line we replace the special characters by (in this example) "e"

Removing special characters from a dataframe in R

We can loop over the columns, using gsub match characters that are not - or / or . or numbers and replace it with blanks (""), assign the result back to the dataset and convert the second column to numeric

df1[] <- lapply(df1, function(x) gsub("[^-0-9/.]+", "", x))
df1[,2] <- as.numeric(df1[,2])
df1
# Date NAV
#1 03/08/2017 209.0537
#2 02/08/2017 208.7831
#3 01/08/2017 208.7373

If this needs to be converted to xts

library(xts)
xts(df1[-1], order.by = as.Date(df1$Date, "%m/%d/%Y"))
# NAV
#2017-01-08 208.7373
#2017-02-08 208.7831
#2017-03-08 209.0537

data

df1 <- structure(list(Date = structure(c(3L, 2L, 1L), .Label = c("=\"01/08/2017\"", 
"=\"02/08/2017\"", "=\"03/08/2017\""), class = "factor"), NAV = structure(c(3L,
2L, 1L), .Label = c("=\"€208.7373\"", "=\"€208.7831\"",
"=\"€209.0537\""
), class = "factor")), .Names = c("Date", "NAV"), row.names = c(NA,
-3L), class = "data.frame")

Removing Special Characters in a Text File in R

gsub("[@#]([a-zA-Z]+)[@#]", "\\1", x)


Related Topics



Leave a reply



Submit