Find Multiple Strings Using Str_Extract_All

find multiple strings using str_extract_all

You could create a single regex:

tofind <- paste(c("aaa","bbb","ccc","ddd"), collapse="|")

str_extract_all(n, tofind)

[[1]]
[1] "aaa" "bbb"

[[2]]
[1] "aaa"

[[3]]
[1] "aaa" "ccc" "ddd"

[[4]]
character(0)

How to identify and retrieve multiple patterns from multiple texts?

We can use collapse all the Code into one pattern and use str_extract_all to extract all the codes that appear in Text and combine them into one comma-separated string.

main_df$extract_string <- sapply(stringr::str_extract_all(main_df$Text, 
             paste0('\\b', auxiliary_df$Code, '\\b', collapse = '|')), toString)
main_df

#                  Title                                     Text extract_string
#1    School Performance         Students A1, A6 and A7 are great     A1, A6, A7
#2 Groceries Performance                   Students A9, A3 are ok         A9, A3
#3     Fruit Performance    A5 and A7 will be great fruit pickers         A5, A7
#4      Jedi Performance           A3, A6, A5 will be great Jedis     A3, A6, A5
#5      Sith Performance No one is very good. We should be happy.

Added word boundaries (\\b) in the pattern so that A1 do not get matched with A11 or A110 if it is not present in the Text.

Extract multiple matches using a pattern from a String

We may need to add : also

library(stringr)
str_extract_all("'abcd:3343', sdgshdg374 'rgjrkgj4252:sfsfd', 
       wdwdw'wdwd:364:ssfd', 3434", "'[A-Za-z0-9:# ]+'")[[1]]

-output

[1] "'abcd:3343'"         "'rgjrkgj4252:sfsfd'" "'wdwd:364:ssfd'"

Or it could be also to match the ' followed by one or more characters that are not ' ([^']+) and the '

str_extract_all("'abcd:3343', sdgshdg374 'rgjrkgj4252:sfsfd', 
       wdwdw'wdwd:364:ssfd', 3434", "'[^']+'")[[1]]
[1] "'abcd:3343'"         "'rgjrkgj4252:sfsfd'" "'wdwd:364:ssfd'"

Extracting a pattern considering different patterns

Your approach was correct but you should look at extracting the pattern that you want instead of removing which you don't want.

library(stringr)
str_extract(vec, str_c(to_match, collapse = "|"))
#[1] "FOO" NA    "FEE" "FOO" NA

how to extract multiple overlapping strings from a string using stringr?

We can do that using positive lookahead since it does not consume the string when matched.

string <- "AAAAAAAAAAAAAAAXAAAAAAAAAXBAAAAAAAAA"
stringr::str_match_all(string, "(?=(.{5}X.{5}))")[[1]][, 2]
#[1] "AAAAAXAAAAA" "AAAAAXBAAAA"

str_extract_all - find only exact strings from a list

You can just add \b to your individial terms to make sure they match a word boundry.

pattern <- paste0("\\b", paste(fire_match , collapse="\\b|\\b"), "\\b")
str_extract_all(CAUSE_TEXT, regex(pattern, ignore_case = TRUE))
# [[1]]
# [1] "fire"
# [[2]]
# character(0)
# [[3]]
# [1] "Fire"
# [[4]]
# [1] "Injury"

Find the original strings from the results of str_extract_all

Your code already did what you want. You just need to create an extra column to store the output of str_extract_all, like the following:

Since str_extract_all() returns a list, we'll need to unnest the list to become rows.

The final line of the code is to create a consecutive index (since "banana" is gone, index 2 will also be gone).

library(tidyverse)

fruit %>% 
  mutate(pattern = str_extract_all(Fruit, "(.)\\1")) %>% 
  unnest(pattern) %>%
  mutate(index = as.numeric(as.factor(index)))

# A tibble: 5 × 3
  index Fruit       pattern
  <dbl> <chr>       <chr>  
1     1 apple       pp     
2     2 strawberry  rr     
3     3 pineapple   pp     
4     4 bell pepper ll     
5     4 bell pepper pp

Find Multiple Strings Using Str_Extract_All