Multiple Strings with Str_Detect R

str_detect for multiple patterns

You need to use the | separator in your search, all within one set of "".

> words <- c("quantity", "single", "double", "triple", "awful")
> set.seed(1234)
> df = tibble(col = sample(words,10, replace = TRUE))
> df
# A tibble: 10 x 1
col
<chr>
1 triple
2 single
3 awful
4 triple
5 quantity
6 awful
7 triple
8 single
9 single
10 triple

> df %>% filter(str_detect(col, "quantity|single"))
# A tibble: 4 x 1
col
<chr>
1 single
2 quantity
3 single
4 single

Ignore case with multiple strings using str_detect in R

A possible solution would be to bring everything to lower case and match that with ag|field.

dat %>%
mutate(Class_2 = case_when(
str_detect(string = str_to_lower(Class),
pattern = "ag|field") ~ "Agricultural",
TRUE ~ Class
))

# A tibble: 3 × 2
Class Class_2
<chr> <chr>
1 ag Agricultural
2 Agricultural--misc Agricultural
3 old field Agricultural

how to use str_detect within across when searching multiple columns for several search strings

Combine across with Reduce to select rows which has any occurrence of the pattern.

library(dplyr)
library(stringr)

pat <- paste(search_string, collapse = "|")

raw_df %>%
filter(Reduce(`|`, across(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE)))))

However, I think using if_any is more suitable here as it was build to handle such cases -

raw_df %>%
filter(if_any(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE))))

# cust_name other_desc trans val
# <chr> <chr> <chr> <int>
#1 Cisco nothing a 100
#2 bad_cs cisCo s 101
#3 Ibm nothing d 102
#4 bad_ib ibM f 102

str_detect with multiple strings (and not or) of the same kind using R

You may use -

library(stringr)

str_detect(find.variable, '\\bdetect\\b.*\\bdetect\\b')
#[1] TRUE FALSE FALSE TRUE FALSE

If you want to allow consecutive values of 'detect', use

str_detect(find.variable, 'detect.*detect')

You can also use str_count to count number of detects in a string.

str_count(find.variable, 'detect') == 2
#[1] TRUE FALSE FALSE TRUE TRUE

Note that the last value is TRUE in case of str_count.

Detect multiple strings with dplyr and stringr

str_detect only accepts a length-1 pattern. Either turn it into one regex using paste(..., collapse = '|') or use any:

sapply(test.data$item, function(x) any(sapply(fruit, str_detect, string = x)))
# Apple Bear Orange Pear Two Apples
# TRUE FALSE TRUE TRUE TRUE

str_detect(test.data$item, paste(fruit, collapse = '|'))
# [1] TRUE FALSE TRUE TRUE TRUE

Filter by multiple patterns with filter() and str_detect()

The correct syntax to accomplish this with filter() and str_detect() would be

df %>%
filter(
str_detect(letters, "a|f|o")
)
# numbers letters
#1 1 a
#2 6 f
#3 15 o
#4 27 a
#5 32 f
#6 41 o

R exact match for multiple patterns

We could use the word boundary (\\b) to avoid the unnecessary partial matches

str_detect(myfile,paste0("\\b(", paste(toMatch, collapse="|"), ")\\b"))
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE

Based on the elements used, it can be done with %in%

myfile %in% toMatch
1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE

summarise to sum cells containing multiple strings

Use str_detect separately for & condition.

library(dplyr)
library(stringr)

summarise("Total" = n(),
"CD8:PD-1" = sum(str_detect(ordered, "PD-1") &
str_detect(ordered, "CD8"),na.rm = TRUE),
"CD8:PD:1:FoxP3" = sum(str_detect(ordered, "PD-1") &
str_detect(ordered, "CD8") &
str_detect(ordered, "FoxP3"), na.rm = TRUE))


Related Topics



Leave a reply



Submit