Filtering multiple columns with str_detect
You can use filter_at
:
Dataframe %>% filter_at(.vars = vars(names, Jobs),
.vars_predicate = any_vars(str_detect(. , paste0("^(", paste(Filter_list, collapse = "|"), ")"))))
If you want to apply the filter to all varaibles then you can use filter_all
how to use str_detect within across when searching multiple columns for several search strings
Combine across
with Reduce
to select rows which has any occurrence of the pattern.
library(dplyr)
library(stringr)
pat <- paste(search_string, collapse = "|")
raw_df %>%
filter(Reduce(`|`, across(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE)))))
However, I think using if_any
is more suitable here as it was build to handle such cases -
raw_df %>%
filter(if_any(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE))))
# cust_name other_desc trans val
# <chr> <chr> <chr> <int>
#1 Cisco nothing a 100
#2 bad_cs cisCo s 101
#3 Ibm nothing d 102
#4 bad_ib ibM f 102
Reverse filtering multiple columns with str_detect
In the new version of dplyr
i.e. 1.0.4
, we can use if_any
within filter
library(dplyr)
library(stringr)
Dataframe %>%
filter(!if_any(c(names, Jobs),
~ str_detect(., str_c("^(", str_c(Filter_list, collapse="|"), ")"))))
# names Jobs
#1 Mark Nojob
The "Nojob" is not matched because we are checking whether the string starts (^
) with "Jo" (also the case is different)
In the older version, we can negate (!
) with all_vars
Dataframe %>%
filter_at(.vars = vars(names, Jobs),
.vars_predicate = all_vars(!str_detect(. , paste0("^(", paste(Filter_list, collapse = "|"), ")"))))
# names Jobs
#1 Mark Nojob
The reason why any_vars
with !
didn't work is that it is looking for any column that doesn't have a match for the string. So, if one of the column row doesn't have that match while the other have it, then it returns that row. Whereas with all_vars
and negate, it will only return that row, when all those columns specified in vars
are not matching
In the previous version, we cannot negate (!
) in front of any_vars
whereas it is not the case with if_any
as if_any
is returning a logical vector to be passed directly to filter
whereas any_vars
is doing it indirectly to filter_at
NOTE: The function wrapper that corresponds to all_vars
is if_all
in the current version
data
Dataframe <- data.frame("names" = c('John','Jill','Joe','Mark'), "Jobs" = c('Mailman','Jockey','Jobhunter',"Nojob"))
Filter_list <- c('Jo')
How do I elegantly str_detect across multiple columns and populating new columns conditionally
Here's a way to simplify this and reduce repetition :
library(dplyr)
regex_list <- list(date = '(^20[1,2][0-9]\\-)|(\\/20[1,2][0-9]$)',
numericScientificNotation = '\\d\\.\\d{3}[eE][+-]\\d{2}+',
batches = '(^[a-zA-Z][0-9]{2}\\/2[0-1]{1}$)|(^[A-Z]{1,2}\\-\\d.*[a-zA-Z]*$)|(^[a-zA-Z][0-9]{2})|(^[A-Z][0-9]$)',
integers = '^-?\\d+$')
purrr::imap_dfc(regex_list, function(x, y)
df %>%
mutate(across(.fns = ~ifelse(str_detect(.x, x), .x, NA))) %>%
transmute(!!y := do.call(coalesce, .)))
# date numericScientificNotation batches integers
# <chr> <chr> <chr> <chr>
#1 NA NA W-7 9996155
#2 NA NA W-8 4001096
#3 2020-01-23 NA W-9 4001525
#4 2019-12-23 NA W-2 4000590
#5 2020-01-23 NA W-1 NA
#6 2019-12-23 3.408E+20 W-1 NA
#7 2020-01-20 3.527E+20 NA 4000461
#8 2019-12-08 3.498E+20 NA 4000311
Filter_at selected columns with multiple str_detect patterns
You can loop over column which has "Pair" in the dataframe check if the required pattern in present or not, create a matrix of logical vectors and select rows which have no occurrence of the pattern.
cols <- grep("Pair", names(df))
df[rowSums(sapply(df[cols],function(x) grepl("quinoaquinoa|lupinelupine", x)))== 0, ]
How to use filter across and str_detect together to filter conditional on mutlitple columns
We can use if_any
as across
will look for &
condition i.e. all columns should meet the condition for a particular row to get filter
ed
library(dplyr)
library(stringr)
df %>%
filter(if_any(everything(), ~str_detect(., "^A")))
-output
col1 col2 col3
1 Z2 Z2 A2
2 A2 Z2 B2
3 B2 A2 C2
4 A2 C2 E2
5 F2 A2 G2
According to ?across
if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns.
across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().
The if_any/if_all
are not part of the scoped variants
Filter by multiple patterns with filter() and str_detect()
The correct syntax to accomplish this with filter() and str_detect() would be
df %>%
filter(
str_detect(letters, "a|f|o")
)
# numbers letters
#1 1 a
#2 6 f
#3 15 o
#4 27 a
#5 32 f
#6 41 o
pass on multiple columns to function within dplyr
You can use across
:
library(dplyr)
library(stringr)
df %>% filter(Reduce(`|`, across(.fns = ~str_detect(., "plate"))))
# col1 col2 col3
# <chr> <chr> <chr>
#1 plate_ABC text text
#2 text this is plate B text
#3 text text C-plate
Or rowwise :
df %>%
rowwise() %>%
filter(any(str_detect(c_across(), 'plate')))
If you have older version of dplyr
(<1.0.0) you can use filter_all
/filter_at
:
df %>% filter_all(any_vars(str_detect(., 'plate')))
Related Topics
R:Function to Generate a Mixture Distribution
Cannot Install Stringi Since Xcode Command Line Tools Update
How to Add a Legend for the Secondary Axis Ggplot
How to Add Rows with 0 Counts to Summarised Output
Combining Rows Based on a Column
How to Shift X Axis Positions of Two Geoms Relative to Each Other
R Replacing Zeros in Dataframe with Next Non Zero Value
Obtaining Twitter Screen Names from a Twitter List
Predict.Lm in R Fails to Recognize Newdata
Levenshtein Type Algorithm with Numeric Vectors
"Non-Finite Function Value" When Using Integrate() in R
Changes in Plotting an Xts Object
Scale Value Inside of Aes_String()
Importing Multiple .CSV Files with Variable Column Types into R
Multiplying Combinations of a List of Lists in R
How to Load Any Package in R (Unable to Load Shared Object)
Unexpected Date When Converting Posixct Date-Time to Date - Timezone Issue