Extracting numbers from vectors of strings
How about
# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))
or
# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))
or
# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))
Extract first digit from each element of a numeric vector in R
One numerical approach here would be to divide each input number by 10 raised to the floor of log base 10. This means that, for example, we divide an input of 123
by 100
, to yield 1.23
. Then, we take the floor of that to yield the first digit 1
.
getFirstDigit <- function(x) {
floor(x / (10 ^ floor(log10(x))))
}
d <- c(123, 2, 45)
getFirstDigit(d)
[1] 1 2 4
The more brute force way of doing this would be to cast the input vector to character, take the first character, and then cast back to a number. But, I doubt doing it that way would outperform what I have above.
how to extract specific digits in R (including NA)
Assuming that we need to extract per row
out <- data.frame(v7 = apply(df, 1, function(x) grep("^\\d{6}", x,
value = TRUE)[1]))
Another option is coalesce
after replacing all values other than 6 digits to NA
library(dplyr)
library(stringr)
df %>%
mutate_all(~ replace(as.character(.),
str_detect(., "^\\d{6}$", negate = TRUE), NA)) %>%
transmute(v7 = coalesce(!!! .))
Extract just the number from string
sapply(strsplit(last_run, " "), function(x) na.omit(as.numeric(x)))
strsplit
It will parse last_run
and returns a list where each element is a character vector with sentences split in words
> strsplit(last_run, " ")
[[1]]
[1] "Last" "run" "15" "days" "ago"
[[2]]
[1] "1st" "up" "after" "126" "days"
[[3]]
[1] "Last" "run" "21" "days" "ago"
[[4]]
[1] "Last" "run" "22" "days" "ago"
[[5]]
[1] "1st" "up" "after" "177" "days"
[[6]]
[1] "1st" "up" "after" "364" "days"
as.numeric
It will try to convert words in numbers and returns NA
if it is not possible
> as.numeric(strsplit(last_run, " ")[[1]])
[1] NA NA 15 NA NA
na.omit
It will remove NA from vectors
na.omit(as.numeric(strsplit(last_run, " ")[[1]]))[[1]]
[1] 15
na.omit
returns a list, and the vector without NA is the first element of the list (that is why, you need [[1]]
)
sapply
sapply
applies a function on each element of a list and returns a vector
Related Topics
How to Sort a Character Vector According to a Specific Order
Finding Non-Numeric Data in a Data Frame or Vector
Dplyr Group by Colnames Described as Vector of Strings
Expression and New Line in Plot Labels
How to Use Loess Method in Ggally::Ggpairs Using Wrap Function
Grouping Every N Minutes with Dplyr
Add Regression Plane to 3D Scatter Plot in Plotly
Changing the Symbol in the Legend Key in Ggplot2
How to One-Hot-Encode Factor Variables with Data.Table
Format Text Inside R Code Chunk
Add an Image to a Table-Like Output in R
Increase Space Between Bars in Ggplot
Change the Number of Breaks Using Facet_Grid in Ggplot2
Rounding Time to Nearest Quarter Hour
Determine Level of Nesting in R
How to Pass Multiple Arguments to a Function as a Single Vector