Remove Unwanted Symbols from Expression Function - R

Remove unwanted symbols from expression function - R

If this is about dynamically generating/formatting an output string, you can also use stringr::str_interp:

# Sample data
set.seed(2017);
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)

# Fit
m <- lm(y ~ x, df);

# Extract coefficients and generate string
a <- coef(m)[1];
b <- coef(m)[2];
r2 <- summary(m)$r.squared;
stringr::str_interp("y = $[2.0f]{a} + $[2.0f]{b} x, R2 = $[4.3f]{r2}")
#[1] "y = 9 + 3 x, R2 = 0.793"

Or use sprintf:

sprintf("y = %2.0f + %2.0f x, R2 = %4.3f", a, b, r2);
#[1] "y = 9 + 3 x, R2 = 0.793"

Remove all special characters from a string in R?

You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

An alternative is to swap out all non-alphanumeric characters.

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.

How to remove specific special characters in R

gsub("[^[:alnum:][:blank:]+?&/\\-]", "", c)
# [1] "In Acid-base reaction page4 why does it create water and not H+?"

How can I remove non-numeric characters from strings using gsub in R?

Simply use

gsub("[^0-9.-]", "", x)

You can in case of multiple - and . have a second regEx dealing with that.
If you struggle with it, open a new question.


(Make sure to change . with , if needed)

Remove unwanted text from string

This should do it.

library(stringr)

x <- "yada yada.useful text here. googletag.cmd.push(function() { googletag.display('div-gpt-ad-447281037690072557-2'); });useful text here. yada yada"

x %>% str_remove("googletag.*\\}\\)")

Explanation

The regex looks for "googletag" (where your unwanted string starts)

.* means any number of characters

\\}\\) until we find })

the double backslashes are "R slang" other regex would mostly only use one backslash.

remove unwanted text with function and for loop

Here is an approach using base functions. I agree that you should just collapse the unwanted strings into a single character. I think the for loop is a little unneeded here and is leading to some of your return problems as pointed out by @wusel.

# create food list
food <- list("apples, watermelon and peaches", "onions and broccoli", "peaches, nectarines", "rutabega")

unwanted_foods <- paste(unwanted_foods, collapse = "|") # collapse to single character vector

foods <- lapply(food, function(x) {
out <- x[!grepl(pattern = unwanted_foods, x = x)] # subset each list item
return(out)
})
# returns a list of same length as input. Removed foods are now an empty character string
# Filter out empty lists
foods <- Filter(length, foods)

How to remove characters in r

You can use function sub. Double \\ is used because $ is a special regular expression character, so it needs to be escaped.

sub("x\\$", replacement = "", x = "x$var1")
[1] "var1"

Or we can use fixed=TRUE and remove the escape characters\\

sub("x$", replacement = "", x = "x$var1", fixed=TRUE)
#[1] "var1"

Remove part of a string

Use regular expressions. In this case, you can use gsub:

gsub("^.*?_","_","ATGAS_1121")
[1] "_1121"

This regular expression matches the beginning of the string (^), any character (.) repeated zero or more times (*), and underscore (_). The ? makes the match "lazy" so that it only matches are far as the first underscore. That match is replaced with just an underscore. See ?regex for more details and references

How to remove last n characters from every element in the R vector

Here is an example of what I would do. I hope it's what you're looking for.

char_array = c("foo_bar","bar_foo","apple","beer")
a = data.frame("data"=char_array,"data2"=1:4)
a$data = substr(a$data,1,nchar(a$data)-3)

a should now contain:

  data data2
1 foo_ 1
2 bar_ 2
3 ap 3
4 b 4

Special characters in function in R

In order to find the correct regular expression you need to know what exactly you are systematically looking for in your strings. From your post I assume that you want to extract the ELA_ string and the number at the end of the strings. You could do it like this:

strings <- c("WV-Online-Reading-S1-COMBINED-ELA-3", "AIR-GEN-SUM-UD-ELA-NH-COMBINED-3-SEG1")

gsub(".*(ELA\\-).*(\\d$)", "\\1\\2", strings)

[1] "ELA-3" "ELA-1"

I will briefly explain the components of the pattern:

  • .* matches zero or more arbitraty characters
  • ELA\\- matches 'ELA-'
  • \\d$ matches a digit at the end of the line

The brackets form a capture group which can be "backreferenced" to by \\1 (first capture group) and \\2 (second capture group). gsub() takes the entire strings and replaces it by what it could match in both backreferences. As I do not know the exact systematic of what you are looking for the pattern might still need adjustments to fit your needs.

If you are interested in the first digit only you can get it with

library(stringr)
strings %>% str_extract("\\d")


Related Topics



Leave a reply



Submit