Filtering Multiple Columns with Str_Detect

Filtering multiple columns with str_detect

You can use filter_at:

Dataframe %>% filter_at(.vars = vars(names, Jobs),
.vars_predicate = any_vars(str_detect(. , paste0("^(", paste(Filter_list, collapse = "|"), ")"))))

If you want to apply the filter to all varaibles then you can use filter_all

how to use str_detect within across when searching multiple columns for several search strings

Combine across with Reduce to select rows which has any occurrence of the pattern.

library(dplyr)
library(stringr)

pat <- paste(search_string, collapse = "|")

raw_df %>%
filter(Reduce(`|`, across(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE)))))

However, I think using if_any is more suitable here as it was build to handle such cases -

raw_df %>%
filter(if_any(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE))))

# cust_name other_desc trans val
# <chr> <chr> <chr> <int>
#1 Cisco nothing a 100
#2 bad_cs cisCo s 101
#3 Ibm nothing d 102
#4 bad_ib ibM f 102

Reverse filtering multiple columns with str_detect

In the new version of dplyr i.e. 1.0.4, we can use if_any within filter

library(dplyr)
library(stringr)
Dataframe %>%
filter(!if_any(c(names, Jobs),
~ str_detect(., str_c("^(", str_c(Filter_list, collapse="|"), ")"))))
# names Jobs
#1 Mark Nojob

The "Nojob" is not matched because we are checking whether the string starts (^) with "Jo" (also the case is different)


In the older version, we can negate (!) with all_vars

Dataframe %>%
filter_at(.vars = vars(names, Jobs),
.vars_predicate = all_vars(!str_detect(. , paste0("^(", paste(Filter_list, collapse = "|"), ")"))))
# names Jobs
#1 Mark Nojob

The reason why any_vars with ! didn't work is that it is looking for any column that doesn't have a match for the string. So, if one of the column row doesn't have that match while the other have it, then it returns that row. Whereas with all_vars and negate, it will only return that row, when all those columns specified in vars are not matching

In the previous version, we cannot negate (!) in front of any_vars whereas it is not the case with if_any as if_any is returning a logical vector to be passed directly to filter whereas any_vars is doing it indirectly to filter_at

NOTE: The function wrapper that corresponds to all_vars is if_all in the current version

data

Dataframe <- data.frame("names" = c('John','Jill','Joe','Mark'), "Jobs" = c('Mailman','Jockey','Jobhunter',"Nojob"))

Filter_list <- c('Jo')

How do I elegantly str_detect across multiple columns and populating new columns conditionally

Here's a way to simplify this and reduce repetition :

library(dplyr)

regex_list <- list(date = '(^20[1,2][0-9]\\-)|(\\/20[1,2][0-9]$)',
numericScientificNotation = '\\d\\.\\d{3}[eE][+-]\\d{2}+',
batches = '(^[a-zA-Z][0-9]{2}\\/2[0-1]{1}$)|(^[A-Z]{1,2}\\-\\d.*[a-zA-Z]*$)|(^[a-zA-Z][0-9]{2})|(^[A-Z][0-9]$)',
integers = '^-?\\d+$')

purrr::imap_dfc(regex_list, function(x, y)
df %>%
mutate(across(.fns = ~ifelse(str_detect(.x, x), .x, NA))) %>%
transmute(!!y := do.call(coalesce, .)))

# date numericScientificNotation batches integers
# <chr> <chr> <chr> <chr>
#1 NA NA W-7 9996155
#2 NA NA W-8 4001096
#3 2020-01-23 NA W-9 4001525
#4 2019-12-23 NA W-2 4000590
#5 2020-01-23 NA W-1 NA
#6 2019-12-23 3.408E+20 W-1 NA
#7 2020-01-20 3.527E+20 NA 4000461
#8 2019-12-08 3.498E+20 NA 4000311

Filter_at selected columns with multiple str_detect patterns

You can loop over column which has "Pair" in the dataframe check if the required pattern in present or not, create a matrix of logical vectors and select rows which have no occurrence of the pattern.

cols <- grep("Pair", names(df))
df[rowSums(sapply(df[cols],function(x) grepl("quinoaquinoa|lupinelupine", x)))== 0, ]

How to use filter across and str_detect together to filter conditional on mutlitple columns

We can use if_any as across will look for & condition i.e. all columns should meet the condition for a particular row to get filtered

library(dplyr)
library(stringr)
df %>%
filter(if_any(everything(), ~str_detect(., "^A")))

-output

   col1 col2 col3
1 Z2 Z2 A2
2 A2 Z2 B2
3 B2 A2 C2
4 A2 C2 E2
5 F2 A2 G2

According to ?across

if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns.

across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().

The if_any/if_all are not part of the scoped variants

Filter by multiple patterns with filter() and str_detect()

The correct syntax to accomplish this with filter() and str_detect() would be

df %>%
filter(
str_detect(letters, "a|f|o")
)
# numbers letters
#1 1 a
#2 6 f
#3 15 o
#4 27 a
#5 32 f
#6 41 o

pass on multiple columns to function within dplyr

You can use across :

library(dplyr)
library(stringr)

df %>% filter(Reduce(`|`, across(.fns = ~str_detect(., "plate"))))

# col1 col2 col3
# <chr> <chr> <chr>
#1 plate_ABC text text
#2 text this is plate B text
#3 text text C-plate

Or rowwise :

df %>%
rowwise() %>%
filter(any(str_detect(c_across(), 'plate')))

If you have older version of dplyr (<1.0.0) you can use filter_all/filter_at :

df %>% filter_all(any_vars(str_detect(., 'plate')))


Related Topics



Leave a reply



Submit