How to Determine If a Character Vector Is a Valid Numeric or Integer Vector

how to determine if a character vector is a valid numeric or integer vector

As discussed here, checking if as.numeric returns NA values is a simple approach to checking if a character string contains numeric data. Now you can do something like:

myDF2 <- lapply(myDF, function(col) {
if (suppressWarnings(all(!is.na(as.numeric(as.character(col)))))) {
as.numeric(as.character(col))
} else {
col
}
})
str(myDF2)
# List of 3
# $ w : num [1:2] 1 2
# $ x.y: num [1:2] 0.1 0.2
# $ x.z: Factor w/ 2 levels "cat","dog": 1 2

Test for numeric elements in a character string

Maybe there's a reason some other pieces of your data are more complicated that would break this, but my first thought is:

> !is.na(as.numeric(x))
[1] TRUE TRUE TRUE TRUE FALSE FALSE

As noted below by Josh O'Brien this won't pick up things like 7L, which the R interpreter would parse as the integer 7. If you needed to include those as "plausibly numeric" one route would be to pick them out with a regex first,

x <- c("1.2","1e4","1.2.3","5L")
> x
[1] "1.2" "1e4" "1.2.3" "5L"
> grepl("^[[:digit:]]+L",x)
[1] FALSE FALSE FALSE TRUE

...and then strip the "L" from just those elements using gsub and indexing.

How to test if an R object is a named numeric vector?

The only problem I see in your firt solution (the if clause) is that the names of the vector could be incomplete, and I do not know if thats acceptable in your case.

In the second solution, a non-numeric named vector would pass the verification, right? And it seems to me that this is not what you desire.

If you could provide more details about what you want exactly to do with this verification, I could help you a little more.

It is also easier to do

is.vector(f, mode = "numeric")

Thus you verifie both conditions (vector and numeric). Read is.vector help.

Getting an integer atomic vector (vs. numeric)

From reading the type.convert() docs, I'm surprised it does not
produce integers when all the data could be represented as integer. Am
I misreading that?

I think you may be.

In some contexts, converting a number written as 123.0 to 123 does change its meaning: the trailing zero in 123.0 can be intended to indicate that it represents a value measured to a higher degree of precision (e.g. to the nearest tenth) than 123 (which may only have been measured to the nearest integral value). (See Wikipedia's article on significant figures for a fuller explanation.) So type.convert() takes the appropriate/conservative approach of treating 123.0 (and indeed 123.) as representing numeric rather than integer values.

As a solution, how about something like this?

type.convert2 <- function(x) {
x <- sub("(^\\d+)\\.0*$", "\\1", x)
type.convert(x)
}

class(type.convert2("123.1"))
# [1] "numeric"
class(type.convert2("123.0"))
# [1] "integer"
class(type.convert2("123."))
# [1] "integer"

class(type.convert2("hello.0"))
# [1] "factor"
type.convert2("hello.0")
# [1] hello.0
# Levels: hello.0

Check if the number is integer

Another alternative is to check the fractional part:

x%%1==0

or, if you want to check within a certain tolerance:

min(abs(c(x%%1, x%%1-1))) < tol

Entire character vector saved as a single string in R

You can try:

library(tidyverse)

eval(parse(text = "c(\"1963-09-16\", \"1969-07-16\")") )
#> [1] "1963-09-16" "1969-07-16"

Or from a df

df <- data.frame(dates = rep("c(\"1963-09-16\", \"1969-07-16\")", 5))

summarise(df, dates = map(dates, function(x) eval(parse(text = x))) %>%
reduce(c))
#> dates
#> 1 1963-09-16
#> 2 1969-07-16
#> 3 1963-09-16
#> 4 1969-07-16
#> 5 1963-09-16
#> 6 1969-07-16
#> 7 1963-09-16
#> 8 1969-07-16
#> 9 1963-09-16
#> 10 1969-07-16

Created on 2021-12-07 by the reprex package (v2.0.1)

Check if string contains ONLY NUMBERS or ONLY CHARACTERS (R)

you need to persist your regex

all_num <- "123"
all_letters <- "abc"
mixed <- "123abc"

grepl("^[A-Za-z]+$", all_num, perl = T) #will be false
grepl("^[A-Za-z]+$", all_letters, perl = T) #will be true
grepl("^[A-Za-z]+$", mixed, perl=T) #will be false

Using R, How to use a character vector to search for matches in a very large character vector

grep and family only allow a single pattern= in their call, but one can use Vectorize to help with this:

out <- Vectorize(grepl, vectorize.args = "pattern")(Cities, Locations)
rownames(out) <- Locations
out
# New York San Francisco Austin
# San Antonio/TX FALSE FALSE FALSE
# Austin/TX FALSE FALSE TRUE
# Boston/MA FALSE FALSE FALSE

(I added rownames(.) purely to identify columns/rows from the source data.)

With this, if you want to know which index points where, then you can do

apply(out, 1, function(z) which(z)[1])
# San Antonio/TX Austin/TX Boston/MA
# NA 3 NA
apply(out, 2, function(z) which(z)[1])
# New York San Francisco Austin
# NA NA 2

The first indicates the index within Cities that apply to each specific location. The second indicates the index within Locations that apply to each of Cities. Both of these methods assume that there is at most a 1-to-1 matching; if there are ever more, the which(z)[1] will hide the 2nd and subsequent, which is likely not a good thing.

Avoiding type conflicts with dplyr::case_when

As said in ?case_when:

All RHSs must evaluate to the same type of vector.

You actually have two possibilities:

1) Create new as a numeric vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5,
old == 2 ~ NA_real_,
TRUE ~ as.numeric(old)))

Note that NA_real_ is the numeric version of NA, and that you must convert old to numeric because you created it as an integer in your original dataframe.

You get:

str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: num 5 NA 3

2) Create new as an integer vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
old == 2 ~ NA_integer_,
TRUE ~ old))

Here, 5L forces 5 into the integer type, and NA_integer_ is the integer version of NA.

So this time new is integer:

str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: int 5 NA 3


Related Topics



Leave a reply



Submit