Grepl for a Period "." in R

grepl for a period . in R?

See the differences with these examples

 > grepl("\\.", "Hello.")
[1] TRUE
> grepl("\\.", "Hello")
[1] FALSE

the . means anything as pointed out by SimonO101, if you want to look for an explicit . then you have to skip it by using \\. which means look for a .

R documentation is extensive on regular expressions, you can also take a look at this link to understand the use of the dot.

grepl contain multi pattern at least one time in R

If you really want to use a single call to grepl, then we can try using the following single regex pattern:

\bTAX\b.*\bGAP\b|\bGAP\b.*\bTAX\b

This pattern uses an alternation to check for both orders in which TAX and GAP might occur. Note also that TAX and GAP are surrounded by word boundary markers (\b) on each side, to make sure that we don't accidentally match e.g. TAX when it happens to occur in a substring of a larger word like TAXES.

grepl("\\bTAX\\b.*\\bGAP\\b|\\bGAP\\b.*\\bTAX\\b", K)
[1] FALSE TRUE TRUE TRUE FALSE

Matching a single character within a string in R

We need either fixed = TRUE

grepl("test.txt", pattern = ".", fixed = TRUE)
#[1] TRUE

NOTE: pattern is the first argument of grep/grepl If we specify it in different order, make sure to name the parameter

or escape (\\.) the . as . is a metacharacter that matches any character

R grepl: combine conditions

Try this regex.

str <- c("0. 365", "S12")
grepl("^[ \\.[:digit:]]*$", str)
#[1] TRUE FALSE

Pattern match with grepl() function in R

The result is correct.

grepl is looking for the pattern of xx-xx-xx, where x is a digit, and that does appear in the first query. If you want to query starting from the beginning of the string, you can use the ^ symbol.

For example, if you were to run grepl("^[0-9]{2}-[0-9]{2}-[0-9]{2}", "2010-04-09"), you'd get FALSE, but grepl("^[0-9]{4}-[0-9]{2}-[0-9]{2}", "2010-04-09") would return TRUE.

PS: On the opposite end, $ indicates the end of the string.

grepl for any string allowed

An option would be to just use . (as it is a metacharacter for any character) as the default match for descr parameter

Added one more paramet colNm to generalize a bit more

If there are blanks ("") and want to match those, it may be better to have * as the default

some_sub <- function(data, colNm, descr="."){
colNm <- enquo(colNm)
data %>%
filter(grepl(descr, !!colNm))
}

some_sub(iris, Species, "setosa")
some_sub(iris, Species)

Using OP' data

some_sub(data, description, "Cabbage")
# description weight
#1 Cabbage 12
#2 Cabbage 9
#3 Cabbage 15

some_sub(data, description)
# description weight
#1 Cabbage 12
#2 Cabbage 9
#3 Carrot 7
#4 Cabbage 15

Escaped Periods In R Regular Expressions

My R-Fu is weak to the point of being non-existent but I think I know what's up.

The string handling part of the R processor has to peek inside the strings to convert \n and related escape sequences into their character equivalents. R doesn't know what \. means so it complains. You want to get the escaped dot down into the regex engine so you need to get a single \ past the string mangler. The usual way of doing that sort of thing is to escape the escape:

grepl("Processor\\.[0-9]+\\..*Processor\\.Time", names(web02))

Embedding one language (regular expressions) inside another language (R) is usually a bit messy and more so when both languages use the same escaping syntax.



Related Topics



Leave a reply



Submit