Extract Digit from Numeric in R

Extracting numbers from vectors of strings

How about

# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))

or

# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))

or

# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))

Extract first digit from each element of a numeric vector in R

One numerical approach here would be to divide each input number by 10 raised to the floor of log base 10. This means that, for example, we divide an input of 123 by 100, to yield 1.23. Then, we take the floor of that to yield the first digit 1.

getFirstDigit <- function(x) {
floor(x / (10 ^ floor(log10(x))))
}

d <- c(123, 2, 45)
getFirstDigit(d)

[1] 1 2 4

The more brute force way of doing this would be to cast the input vector to character, take the first character, and then cast back to a number. But, I doubt doing it that way would outperform what I have above.

how to extract specific digits in R (including NA)

Assuming that we need to extract per row

out <- data.frame(v7 = apply(df, 1, function(x) grep("^\\d{6}", x,
value = TRUE)[1]))

Another option is coalesce after replacing all values other than 6 digits to NA

library(dplyr)
library(stringr)
df %>%
mutate_all(~ replace(as.character(.),
str_detect(., "^\\d{6}$", negate = TRUE), NA)) %>%
transmute(v7 = coalesce(!!! .))

Extract just the number from string

sapply(strsplit(last_run, " "), function(x) na.omit(as.numeric(x)))


strsplit

It will parse last_run and returns a list where each element is a character vector with sentences split in words

> strsplit(last_run, " ")
[[1]]
[1] "Last" "run" "15" "days" "ago"

[[2]]
[1] "1st" "up" "after" "126" "days"

[[3]]
[1] "Last" "run" "21" "days" "ago"

[[4]]
[1] "Last" "run" "22" "days" "ago"

[[5]]
[1] "1st" "up" "after" "177" "days"

[[6]]
[1] "1st" "up" "after" "364" "days"


as.numeric

It will try to convert words in numbers and returns NA if it is not possible

> as.numeric(strsplit(last_run, " ")[[1]])
[1] NA NA 15 NA NA


na.omit

It will remove NA from vectors

na.omit(as.numeric(strsplit(last_run, " ")[[1]]))[[1]]
[1] 15

na.omit returns a list, and the vector without NA is the first element of the list (that is why, you need [[1]])



sapply

sapply applies a function on each element of a list and returns a vector



Related Topics



Leave a reply



Submit