Find Multiple Strings Using Str_Extract_All

find multiple strings using str_extract_all

You could create a single regex:

tofind <- paste(c("aaa","bbb","ccc","ddd"), collapse="|")

str_extract_all(n, tofind)
[[1]]
[1] "aaa" "bbb"

[[2]]
[1] "aaa"

[[3]]
[1] "aaa" "ccc" "ddd"

[[4]]
character(0)

How to identify and retrieve multiple patterns from multiple texts?

We can use collapse all the Code into one pattern and use str_extract_all to extract all the codes that appear in Text and combine them into one comma-separated string.

main_df$extract_string <- sapply(stringr::str_extract_all(main_df$Text, 
paste0('\\b', auxiliary_df$Code, '\\b', collapse = '|')), toString)
main_df

# Title Text extract_string
#1 School Performance Students A1, A6 and A7 are great A1, A6, A7
#2 Groceries Performance Students A9, A3 are ok A9, A3
#3 Fruit Performance A5 and A7 will be great fruit pickers A5, A7
#4 Jedi Performance A3, A6, A5 will be great Jedis A3, A6, A5
#5 Sith Performance No one is very good. We should be happy.

Added word boundaries (\\b) in the pattern so that A1 do not get matched with A11 or A110 if it is not present in the Text.

Extract multiple matches using a pattern from a String

We may need to add : also

library(stringr)
str_extract_all("'abcd:3343', sdgshdg374 'rgjrkgj4252:sfsfd',
wdwdw'wdwd:364:ssfd', 3434", "'[A-Za-z0-9:# ]+'")[[1]]

-output

[1] "'abcd:3343'"         "'rgjrkgj4252:sfsfd'" "'wdwd:364:ssfd'"    

Or it could be also to match the ' followed by one or more characters that are not ' ([^']+) and the '

str_extract_all("'abcd:3343', sdgshdg374 'rgjrkgj4252:sfsfd', 
wdwdw'wdwd:364:ssfd', 3434", "'[^']+'")[[1]]
[1] "'abcd:3343'" "'rgjrkgj4252:sfsfd'" "'wdwd:364:ssfd'"

Extracting a pattern considering different patterns

Your approach was correct but you should look at extracting the pattern that you want instead of removing which you don't want.

library(stringr)
str_extract(vec, str_c(to_match, collapse = "|"))
#[1] "FOO" NA "FEE" "FOO" NA

how to extract multiple overlapping strings from a string using stringr?

We can do that using positive lookahead since it does not consume the string when matched.

string <- "AAAAAAAAAAAAAAAXAAAAAAAAAXBAAAAAAAAA"
stringr::str_match_all(string, "(?=(.{5}X.{5}))")[[1]][, 2]
#[1] "AAAAAXAAAAA" "AAAAAXBAAAA"

str_extract_all - find only exact strings from a list

You can just add \b to your individial terms to make sure they match a word boundry.

pattern <- paste0("\\b", paste(fire_match , collapse="\\b|\\b"), "\\b")
str_extract_all(CAUSE_TEXT, regex(pattern, ignore_case = TRUE))
# [[1]]
# [1] "fire"
# [[2]]
# character(0)
# [[3]]
# [1] "Fire"
# [[4]]
# [1] "Injury"

Find the original strings from the results of str_extract_all

Your code already did what you want. You just need to create an extra column to store the output of str_extract_all, like the following:

Since str_extract_all() returns a list, we'll need to unnest the list to become rows.

The final line of the code is to create a consecutive index (since "banana" is gone, index 2 will also be gone).

library(tidyverse)

fruit %>%
mutate(pattern = str_extract_all(Fruit, "(.)\\1")) %>%
unnest(pattern) %>%
mutate(index = as.numeric(as.factor(index)))

# A tibble: 5 × 3
index Fruit pattern
<dbl> <chr> <chr>
1 1 apple pp
2 2 strawberry rr
3 3 pineapple pp
4 4 bell pepper ll
5 4 bell pepper pp


Related Topics



Leave a reply



Submit