Using Grepl in R to Search for an Asterisk

Using grepl in R to search for an asterisk

Try this:

p <- c("Hello", "H*llo")
grepl("\\*", p)

[1] FALSE TRUE

This works because the * asterisk has special meaning in a regular expresssion. Specifically, * means find zero or more of the previous element.

Thus you have to escape the asterisk using \\*. The double escape is necessary because the \ already has the meaning of escape in R.

Searching especial character using grepl?

* is a meta character, use the escape meta character / to search for it

grepl('/*', '***')
[1] TRUE

Match all elements with punctuation mark except asterisk in r

Maybe you can try grep like below

grep("\\*",grep("[[:punct:]]",vec,value = TRUE), value = TRUE,invert = TRUE) # nested `grep`s for double filtering

or

grep("[^\\*[:^punct:]]",vec,perl = TRUE, value = TRUE) # but this will fail for case `abc*01|` (thanks for feedback from @Tim Biegeleisen)

which gives

[1] "a,"   "abc-" "abc|"

Using the star sign in grep

The asterisk is just a repetition operator, but you need to tell it what you repeat. /*abc*/ matches a string containing ab and zero or more c's (because the second * is on the c; the first is meaningless because there's nothing for it to repeat). If you want to match anything, you need to say .* -- the dot means any character (within certain guidelines). If you want to just match abc, you could just say grep 'abc' myFile. For your more complex match, you need to use .* -- grep 'abc.*def' myFile will match a string that contains abc followed by def with something optionally in between.

Update based on a comment:

* in a regular expression is not exactly the same as * in the console. In the console, * is part of a glob construct, and just acts as a wildcard (for instance ls *.log will list all files that end in .log). However, in regular expressions, * is a modifier, meaning that it only applies to the character or group preceding it. If you want * in regular expressions to act as a wildcard, you need to use .* as previously mentioned -- the dot is a wildcard character, and the star, when modifying the dot, means find one or more dot; ie. find one or more of any character.

How to grep a line start with #* using grep in R

To avoid the problem of figuring out how many backslashes use [*] to match a star.

grep("^#[*]", x, value = TRUE)

Another approach, not using any regular expressions at all, is:

x[ substr(x, 1, 2) == "#*" ]

or

x[ startsWith(x, "#*") ]

How to remove a character (asterisk) in column values in r?

The stringr package has some very handy functions for vectorized string manipulation.

In the following code I replace the * with ''. Note that in R, literals inside the regex have to be preceded by double slashes \\ instead of the usual single slash \.

library(stringr) 
LocationID <- c('*Yukon','*Lewis Rich', '*Kodiak', 'Kodiak', '*Rays')
AWC <- c(333, 485, 76, 666, 54)
df <- data.frame(LocationID, AWC)

df$location_clean <- stringr::str_replace(df$LocationID, '\\*', '')

Resulting in:

LocationID AWC location_clean
1 *Yukon 333 Yukon
2 *Lewis Rich 485 Lewis Rich
3 *Kodiak 76 Kodiak
4 Kodiak 666 Kodiak
5 *Rays 54 Rays

unable to delete rows with character *

If we use grep, * is a metacharacter representing any zero or more character. We can either use fixed = TRUE or escape (\\*) to get the literal value

xx[!Reduce(`|`, lapply(xx, function(x) grepl("*", x, fixed = TRUE))),]

Or another option is == to match the *, get the count of matches in a row with rowSums, and subset

xx[!rowSums(xx == "*"),]


Related Topics



Leave a reply



Submit