grepl for a period . in R?
See the differences with these examples
> grepl("\\.", "Hello.")
[1] TRUE
> grepl("\\.", "Hello")
[1] FALSE
the .
means anything as pointed out by SimonO101, if you want to look for an explicit .
then you have to skip it by using \\.
which means look for a .
R documentation is extensive on regular expressions, you can also take a look at this link to understand the use of the dot.
grepl contain multi pattern at least one time in R
If you really want to use a single call to grepl
, then we can try using the following single regex pattern:
\bTAX\b.*\bGAP\b|\bGAP\b.*\bTAX\b
This pattern uses an alternation to check for both orders in which TAX
and GAP
might occur. Note also that TAX
and GAP
are surrounded by word boundary markers (\b
) on each side, to make sure that we don't accidentally match e.g. TAX
when it happens to occur in a substring of a larger word like TAXES
.
grepl("\\bTAX\\b.*\\bGAP\\b|\\bGAP\\b.*\\bTAX\\b", K)
[1] FALSE TRUE TRUE TRUE FALSE
Matching a single character within a string in R
We need either fixed = TRUE
grepl("test.txt", pattern = ".", fixed = TRUE)
#[1] TRUE
NOTE: pattern
is the first argument of grep/grepl
If we specify it in different order, make sure to name the parameter
or escape (\\.
) the .
as .
is a metacharacter that matches any character
R grepl: combine conditions
Try this regex.
str <- c("0. 365", "S12")
grepl("^[ \\.[:digit:]]*$", str)
#[1] TRUE FALSE
Pattern match with grepl() function in R
The result is correct.
grepl
is looking for the pattern of xx-xx-xx, where x is a digit, and that does appear in the first query. If you want to query starting from the beginning of the string, you can use the ^
symbol.
For example, if you were to run grepl("^[0-9]{2}-[0-9]{2}-[0-9]{2}", "2010-04-09")
, you'd get FALSE, but grepl("^[0-9]{4}-[0-9]{2}-[0-9]{2}", "2010-04-09")
would return TRUE.
PS: On the opposite end, $
indicates the end of the string.
grepl for any string allowed
An option would be to just use .
(as it is a metacharacter for any character) as the default match for descr
parameter
Added one more paramet colNm
to generalize a bit more
If there are blanks (""
) and want to match those, it may be better to have *
as the default
some_sub <- function(data, colNm, descr="."){
colNm <- enquo(colNm)
data %>%
filter(grepl(descr, !!colNm))
}
some_sub(iris, Species, "setosa")
some_sub(iris, Species)
Using OP' data
some_sub(data, description, "Cabbage")
# description weight
#1 Cabbage 12
#2 Cabbage 9
#3 Cabbage 15
some_sub(data, description)
# description weight
#1 Cabbage 12
#2 Cabbage 9
#3 Carrot 7
#4 Cabbage 15
Escaped Periods In R Regular Expressions
My R-Fu is weak to the point of being non-existent but I think I know what's up.
The string handling part of the R processor has to peek inside the strings to convert \n
and related escape sequences into their character equivalents. R doesn't know what \.
means so it complains. You want to get the escaped dot down into the regex engine so you need to get a single \
past the string mangler. The usual way of doing that sort of thing is to escape the escape:
grepl("Processor\\.[0-9]+\\..*Processor\\.Time", names(web02))
Embedding one language (regular expressions) inside another language (R) is usually a bit messy and more so when both languages use the same escaping syntax.
Related Topics
Removing Whitespace from a Whole Data Frame in R
Sum of Rows Based on Column Value
R Random Forest Error - Type of Predictors in New Data Do Not Match
Interactive Directory Input in Shiny App (R)
How to Check If a Column Is a Date in R
Pad with Leading Zeros to Common Width
Example Needed: Change the Default Print Method of an Object
Modify Glm Function to Adopt User-Specified Link Function in R
Creating Multi Column Legend in Ggplot
Rbind Error: "Names Do Not Match Previous Names"
Adjust Plot Title (Main) Position
No Visible Global Function Definition for 'Median'
Force Ggplot Legend to Show All Categories When No Values Are Present