str_extract_all: return all patterns found in string concatenated as vector
Instead of cat
, we can use paste
. Also, with tidyverse
, can make use of map
and str_c
(in place of paste
- from stringr
)
library(tidyverse)
data %>%
mutate(age_new = map_chr(str_extract_all(x, "[^a_]+"), ~ str_c(.x, collapse="")))
using `OP's code
data %>%
mutate(age_new = sapply(str_extract_all(x,"[^a_]"),
function(x) paste(x,collapse="")))
If the intention is to get the numbers
library(readr)
data %>%
mutate(age_new = parse_number(x))
Extract digits and next string after from a character vector in R
Use the pattern to match one or more digits (\\d+
) followed by one or more spaces (\\s+
) and word (\\w+
)
library(stringr)
str_extract_all(my_text, "\\d+\\s+\\w+")[[1]]
Why does `str_extract_all` return NA for non-matches?
In case you have NA
in the vector stringr::str_extract_all
will return NA
:
sample[1] <- NA
unlist(stringr::str_extract_all(sample, "\\b\\w+well\\w+\\b"))
#[1] NA "swellings" "jewellery" "jewellers"
To get rid of NA
you can remove NA
using is.na
like:
unlist(stringr::str_extract_all(sample[!is.na(sample)], "\\b\\w+well\\w+\\b"))
#[1] "swellings" "jewellery" "jewellers"
or you use gregexpr
and regmatches
from base:
unlist(regmatches(sample, gregexpr("\\b\\w+well\\w+\\b", sample)))
#[1] "swellings" "jewellery" "jewellers"
How do i retrieve all numbers in a string and combine them into one number using regex?
The str_extract_all
returns a list
. We need to convert to vector
and then paste
. To extract the list
element we use [[
and as there is only a single element, mynumbers[[1]]
will get the vector
. Then, do the paste/collapse
and as.numeric
.
as.numeric(paste(mynumbers[[1]],collapse=""))
#[1] 77500
We can also match one or more non-numeric (\\D+
), replace it with ""
in gsub
and convert to numeric
.
as.numeric(gsub("\\D+", "", input))
#[1] 77500
R duplicate rows based on the elements in a string column
You can use str_extract_all
to extract all the states and unnest
to duplicate rows such that each state is in a separate row. There is an inbuilt constant state.name
which have the state names of US which can be used here to create pattern.
library(dplyr)
pat <- paste0("\\b", state.name, "\\b", collapse = "|")
df %>%
mutate(states = stringr::str_extract_all(location, pat)) %>%
tidyr::unnest(states)
# A tibble: 11 x 3
# location year states
# <chr> <int> <chr>
# 1 North Dakota, Minnesota, Michigan 2011 North Dakota
# 2 North Dakota, Minnesota, Michigan 2011 Minnesota
# 3 North Dakota, Minnesota, Michigan 2011 Michigan
# 4 California, Tennessee 2012 California
# 5 California, Tennessee 2012 Tennessee
# 6 Bastrop County (Texas) 2013 Texas
# 7 Dallas (Texas) 2014 Texas
# 8 Shasta (California) 2015 California
# 9 California, Oregon, Washington 2011 California
#10 California, Oregon, Washington 2011 Oregon
#11 California, Oregon, Washington 2011 Washington
data
df <- structure(list(location = c("North Dakota, Minnesota, Michigan",
"California, Tennessee", "Bastrop County (Texas)", "Dallas (Texas)",
"Shasta (California)", "California, Oregon, Washington"), year = c(2011L,
2012L, 2013L, 2014L, 2015L, 2011L)), class = "data.frame", row.names = c(NA, -6L))
Extract all matches to a new column using regex in R
We can use str_extract_all
instead of str_extract
because str_extract
matches only the first instance where as the _all
suffix is global and would extract all the instances in a list
, which can be convert back to two columns with unnest_wider
library(dplyr)
library(tidyr)
library(stringr)
d %>%
mutate(out = str_extract_all(x, "\\d{2}")) %>%
unnest_wider(c(out)) %>%
rename_at(-1, ~ c('y', 'z')) %>%
type.convert(as.is = TRUE)
# A tibble: 3 x 3
# x y z
# <chr> <int> <int>
#1 i am 10 and she is 50 10 50
#2 he is 32 and i am 22 32 22
#3 he may be 70 and she may be 99 70 99
If we need as a string column with ,
as separator, after extraction into a list
, loop over the list
with map
and concatenate all elements to a single string with toString
(wrapper for paste(., collapse=", ")
)
library(purrr)
d %>%
mutate(y = str_extract_all(x, "\\b\\d{2}\\b") %>%
map_chr(toString))
# A tibble: 3 x 2
# x y
# <chr> <chr>
#1 i am 10 and she is 50 10, 50
#2 he is 32 and i am 22 32, 22
#3 he may be 70 and she may be 99 70, 99
Extracting numbers from vectors of strings
How about
# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))
or
# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))
or
# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))
Related Topics
How to Use Write.Table() and Ddply, Together
R Cannot Allocate Memory Though Memory Seems to Be Available
Why Does Lm Run Out of Memory While Matrix Multiplication Works Fine for Coefficients
Subset a Data.Frame with Multiple Conditions
Difference of Prediction Results in Random Forest Model
R Remove Multiple Text Strings in Data Frame
Send a Text String Containing Double Quotes to Function
Text Color Based on Contrast Against Background
How to Add Abline with Lattice Xyplot Function
Disabling/Enabling Sidebar from Server Side
Binning Data, Finding Results by Group, and Plotting Using R
R Cumulative Sum with a Condition and a Reset
How to Include Custom CSS in HTMLwidgets for R And/Or Leafletr
How to Create a Hyperlink Interactively in Shiny App
R Data.Table Rolling Join "Mult" Not Working as Expected
Adding Slight Curve (Or Bend) in Ggplot Geom_Path to Make Path Easier to Read
Rbindlist Two Data.Tables Where One Has Factor and Other Has Character Type for a Column