find multiple strings using str_extract_all
You could create a single regex:
tofind <- paste(c("aaa","bbb","ccc","ddd"), collapse="|")
str_extract_all(n, tofind)
[[1]]
[1] "aaa" "bbb"
[[2]]
[1] "aaa"
[[3]]
[1] "aaa" "ccc" "ddd"
[[4]]
character(0)
How to identify and retrieve multiple patterns from multiple texts?
We can use collapse all the Code
into one pattern and use str_extract_all
to extract all the codes that appear in Text
and combine them into one comma-separated string.
main_df$extract_string <- sapply(stringr::str_extract_all(main_df$Text,
paste0('\\b', auxiliary_df$Code, '\\b', collapse = '|')), toString)
main_df
# Title Text extract_string
#1 School Performance Students A1, A6 and A7 are great A1, A6, A7
#2 Groceries Performance Students A9, A3 are ok A9, A3
#3 Fruit Performance A5 and A7 will be great fruit pickers A5, A7
#4 Jedi Performance A3, A6, A5 will be great Jedis A3, A6, A5
#5 Sith Performance No one is very good. We should be happy.
Added word boundaries (\\b
) in the pattern so that A1
do not get matched with A11
or A110
if it is not present in the Text
.
Extract multiple matches using a pattern from a String
We may need to add :
also
library(stringr)
str_extract_all("'abcd:3343', sdgshdg374 'rgjrkgj4252:sfsfd',
wdwdw'wdwd:364:ssfd', 3434", "'[A-Za-z0-9:# ]+'")[[1]]
-output
[1] "'abcd:3343'" "'rgjrkgj4252:sfsfd'" "'wdwd:364:ssfd'"
Or it could be also to match the '
followed by one or more characters that are not '
([^']+
) and the '
str_extract_all("'abcd:3343', sdgshdg374 'rgjrkgj4252:sfsfd',
wdwdw'wdwd:364:ssfd', 3434", "'[^']+'")[[1]]
[1] "'abcd:3343'" "'rgjrkgj4252:sfsfd'" "'wdwd:364:ssfd'"
Extracting a pattern considering different patterns
Your approach was correct but you should look at extracting the pattern that you want instead of removing which you don't want.
library(stringr)
str_extract(vec, str_c(to_match, collapse = "|"))
#[1] "FOO" NA "FEE" "FOO" NA
how to extract multiple overlapping strings from a string using stringr?
We can do that using positive lookahead since it does not consume the string when matched.
string <- "AAAAAAAAAAAAAAAXAAAAAAAAAXBAAAAAAAAA"
stringr::str_match_all(string, "(?=(.{5}X.{5}))")[[1]][, 2]
#[1] "AAAAAXAAAAA" "AAAAAXBAAAA"
str_extract_all - find only exact strings from a list
You can just add \b
to your individial terms to make sure they match a word boundry.
pattern <- paste0("\\b", paste(fire_match , collapse="\\b|\\b"), "\\b")
str_extract_all(CAUSE_TEXT, regex(pattern, ignore_case = TRUE))
# [[1]]
# [1] "fire"
# [[2]]
# character(0)
# [[3]]
# [1] "Fire"
# [[4]]
# [1] "Injury"
Find the original strings from the results of str_extract_all
Your code already did what you want. You just need to create an extra column to store the output of str_extract_all
, like the following:
Since str_extract_all()
returns a list, we'll need to unnest
the list to become rows.
The final line of the code is to create a consecutive index (since "banana" is gone, index 2 will also be gone).
library(tidyverse)
fruit %>%
mutate(pattern = str_extract_all(Fruit, "(.)\\1")) %>%
unnest(pattern) %>%
mutate(index = as.numeric(as.factor(index)))
# A tibble: 5 × 3
index Fruit pattern
<dbl> <chr> <chr>
1 1 apple pp
2 2 strawberry rr
3 3 pineapple pp
4 4 bell pepper ll
5 4 bell pepper pp
Related Topics
R Solve:System Is Exactly Singular
Get First and Last Values Per Group - Dplyr Group_By with Last() and First()
Plotting During a Loop in Rstudio
Calculating Time Difference Between Two Columns
Adding Empty Graphs to Facet_Wrap in Ggplot2
How to Cumulatively Add Values in One Vector in R
Efficient Calculation of Matrix Cumulative Standard Deviation in R
Elegant Way to Select the Color for a Particular Segment of a Line Plot
Adding New Column with Conditional Values Using Ifelse
The Simplest Way to Convert a List with Various Length Vectors to a Data.Frame in R
R Command Line Passing a Filename to Script in Arguments (Windows)