How to Remove + (Plus Sign) from String in R

How to remove + (plus sign) from string in R?

Try

test<- "sandwich=bread-mustard+ketchup"
test<-gsub("\\+","_",test)
test
[1] "sandwich=bread-mustard_ketchup"

+ is a special character. You need to escape it. Same as, for instance, .. If you google regex or regular expressions, you will find the corresponding lists of special characters. For instance, here + is described to indicate 1 or more of previous expression. More about special characters, regular expressions and R can be found here or here.

On a more general note, your above code could be written more efficiently by using:

 test<- "sandwich=bread-mustard+ketchup"
test<-gsub("[-|=|\\+]","_",test)
test
[1] "sandwich_bread_mustard_ketchup"

Here I have used a construct that can basically be read as [either this or that or something else], where | corresponds to or.

Remove all special characters from a string in R?

You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

An alternative is to swap out all non-alphanumeric characters.

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.

Remove plus sign (+) from string

Although the original answer to this question does achieve the intended effect, it is not the most efficient way to do this simple task. As noted in the comments above, the use of str_replace() is preferred in this case.

$variation = str_replace("+", "", $variation);

ORIGINAL ANSWER:

This works to remove only a plus sign:

$variation = preg_replace(/[+]/, "", $variation);

You can see it work here: http://www.phpliveregex.com/p/1Fb (be sure you select the preg_replace function)

how can I remove two consecutive pluses (+) from a formula/string?

Something like

as.formula( gsub( ""\\+s*\\+", "+", deparse(f)))

where f is your formula.

How to replace '+' using gsub() function in R

Simply replace it with fixed = TRUE (no need to use a regular expression) but you have to do the replacement for each "column" of the data.frame by specifying the column name:

txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf

gives

          job
1 government
2 poli+tician
3 parliament

Now replace the "+":

txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf

The result is:

         job
1 government
2 politician
3 parliament

Remove part of string after .

You just need to escape the period:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")

gsub("\\..*","",a)
[1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"

Split a string by a plus sign (+) character

Use

strsplit("(1)+(2)", "\\+")

or

strsplit("(1)+(2)", "+", fixed = TRUE)

The idea of using strsplit("(1)+(2)", "+") doesn't work since unless specified otherwise, the split argument is a regular expression, and the + character is special in regex. Other characters that also need extra care are

  • ?
  • *
  • .
  • ^
  • $
  • \
  • |
  • { }
  • [ ]
  • ( )

Split string in parts by minus and plus in R

We can provide a regular expression in strsplit, where we use ?= to lookahead to find the plus or minus sign, then split on that character. This will allow for the character itself to be retained rather than being dropped in the split.

strsplit(x, "(?<=.)(?=[+])|(?<=.)(?=[-])",perl = TRUE)

# [1] "-1x^2" "+3x^3" "-x^8" "+1" "-x"

Is there a way to keep only defined charaters in a string from a whitelist?

What about this:

string <- "opiqr8929348t89hr289r01++r42+3525"
gsub("[^0-9+]", "", string)
# [1] "89293488928901++42+3525"

This replaces everything that's not a 0-9 or plus with "".



Related Topics



Leave a reply



Submit