Str_Extract_All: Return All Patterns Found in String Concatenated as Vector

str_extract_all: return all patterns found in string concatenated as vector

Instead of cat, we can use paste. Also, with tidyverse, can make use of map and str_c (in place of paste - from stringr)

library(tidyverse)
data %>%
mutate(age_new = map_chr(str_extract_all(x, "[^a_]+"), ~ str_c(.x, collapse="")))

using `OP's code

data %>%
mutate(age_new = sapply(str_extract_all(x,"[^a_]"),
function(x) paste(x,collapse="")))

If the intention is to get the numbers

library(readr)
data %>%
mutate(age_new = parse_number(x))

Extract digits and next string after from a character vector in R

Use the pattern to match one or more digits (\\d+) followed by one or more spaces (\\s+) and word (\\w+)

library(stringr)
str_extract_all(my_text, "\\d+\\s+\\w+")[[1]]

Why does `str_extract_all` return NA for non-matches?

In case you have NA in the vector stringr::str_extract_all will return NA:

sample[1] <- NA
unlist(stringr::str_extract_all(sample, "\\b\\w+well\\w+\\b"))
#[1] NA "swellings" "jewellery" "jewellers"

To get rid of NA you can remove NA using is.na like:

unlist(stringr::str_extract_all(sample[!is.na(sample)], "\\b\\w+well\\w+\\b"))
#[1] "swellings" "jewellery" "jewellers"

or you use gregexpr and regmatches from base:

unlist(regmatches(sample, gregexpr("\\b\\w+well\\w+\\b", sample)))
#[1] "swellings" "jewellery" "jewellers"

How do i retrieve all numbers in a string and combine them into one number using regex?

The str_extract_all returns a list. We need to convert to vector and then paste. To extract the list element we use [[ and as there is only a single element, mynumbers[[1]] will get the vector. Then, do the paste/collapse and as.numeric.

as.numeric(paste(mynumbers[[1]],collapse=""))
#[1] 77500

We can also match one or more non-numeric (\\D+), replace it with "" in gsub and convert to numeric.

as.numeric(gsub("\\D+", "", input))
#[1] 77500

R duplicate rows based on the elements in a string column

You can use str_extract_all to extract all the states and unnest to duplicate rows such that each state is in a separate row. There is an inbuilt constant state.name which have the state names of US which can be used here to create pattern.

library(dplyr)
pat <- paste0("\\b", state.name, "\\b", collapse = "|")

df %>%
mutate(states = stringr::str_extract_all(location, pat)) %>%
tidyr::unnest(states)

# A tibble: 11 x 3
# location year states
# <chr> <int> <chr>
# 1 North Dakota, Minnesota, Michigan 2011 North Dakota
# 2 North Dakota, Minnesota, Michigan 2011 Minnesota
# 3 North Dakota, Minnesota, Michigan 2011 Michigan
# 4 California, Tennessee 2012 California
# 5 California, Tennessee 2012 Tennessee
# 6 Bastrop County (Texas) 2013 Texas
# 7 Dallas (Texas) 2014 Texas
# 8 Shasta (California) 2015 California
# 9 California, Oregon, Washington 2011 California
#10 California, Oregon, Washington 2011 Oregon
#11 California, Oregon, Washington 2011 Washington

data

df <- structure(list(location = c("North Dakota, Minnesota, Michigan", 
"California, Tennessee", "Bastrop County (Texas)", "Dallas (Texas)",
"Shasta (California)", "California, Oregon, Washington"), year = c(2011L,
2012L, 2013L, 2014L, 2015L, 2011L)), class = "data.frame", row.names = c(NA, -6L))

Extract all matches to a new column using regex in R

We can use str_extract_all instead of str_extract because str_extract matches only the first instance where as the _all suffix is global and would extract all the instances in a list, which can be convert back to two columns with unnest_wider

library(dplyr)
library(tidyr)
library(stringr)
d %>%
mutate(out = str_extract_all(x, "\\d{2}")) %>%
unnest_wider(c(out)) %>%
rename_at(-1, ~ c('y', 'z')) %>%
type.convert(as.is = TRUE)
# A tibble: 3 x 3
# x y z
# <chr> <int> <int>
#1 i am 10 and she is 50 10 50
#2 he is 32 and i am 22 32 22
#3 he may be 70 and she may be 99 70 99

If we need as a string column with , as separator, after extraction into a list, loop over the list with map and concatenate all elements to a single string with toString (wrapper for paste(., collapse=", "))

library(purrr)
d %>%
mutate(y = str_extract_all(x, "\\b\\d{2}\\b") %>%
map_chr(toString))
# A tibble: 3 x 2
# x y
# <chr> <chr>
#1 i am 10 and she is 50 10, 50
#2 he is 32 and i am 22 32, 22
#3 he may be 70 and she may be 99 70, 99

Extracting numbers from vectors of strings

How about

# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))

or

# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))

or

# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))


Related Topics



Leave a reply



Submit