how to determine if a character vector is a valid numeric or integer vector
As discussed here, checking if as.numeric
returns NA
values is a simple approach to checking if a character string contains numeric data. Now you can do something like:
myDF2 <- lapply(myDF, function(col) {
if (suppressWarnings(all(!is.na(as.numeric(as.character(col)))))) {
as.numeric(as.character(col))
} else {
col
}
})
str(myDF2)
# List of 3
# $ w : num [1:2] 1 2
# $ x.y: num [1:2] 0.1 0.2
# $ x.z: Factor w/ 2 levels "cat","dog": 1 2
Test for numeric elements in a character string
Maybe there's a reason some other pieces of your data are more complicated that would break this, but my first thought is:
> !is.na(as.numeric(x))
[1] TRUE TRUE TRUE TRUE FALSE FALSE
As noted below by Josh O'Brien this won't pick up things like 7L
, which the R interpreter would parse as the integer 7. If you needed to include those as "plausibly numeric" one route would be to pick them out with a regex first,
x <- c("1.2","1e4","1.2.3","5L")
> x
[1] "1.2" "1e4" "1.2.3" "5L"
> grepl("^[[:digit:]]+L",x)
[1] FALSE FALSE FALSE TRUE
...and then strip the "L" from just those elements using gsub
and indexing.
How to test if an R object is a named numeric vector?
The only problem I see in your firt solution (the if clause) is that the names of the vector could be incomplete, and I do not know if thats acceptable in your case.
In the second solution, a non-numeric named vector would pass the verification, right? And it seems to me that this is not what you desire.
If you could provide more details about what you want exactly to do with this verification, I could help you a little more.
It is also easier to do
is.vector(f, mode = "numeric")
Thus you verifie both conditions (vector and numeric). Read is.vector help.
Getting an integer atomic vector (vs. numeric)
From reading the
type.convert()
docs, I'm surprised it does not
produce integers when all the data could be represented as integer. Am
I misreading that?
I think you may be.
In some contexts, converting a number written as 123.0
to 123
does change its meaning: the trailing zero in 123.0
can be intended to indicate that it represents a value measured to a higher degree of precision (e.g. to the nearest tenth) than 123
(which may only have been measured to the nearest integral value). (See Wikipedia's article on significant figures for a fuller explanation.) So type.convert()
takes the appropriate/conservative approach of treating 123.0
(and indeed 123.
) as representing numeric rather than integer values.
As a solution, how about something like this?
type.convert2 <- function(x) {
x <- sub("(^\\d+)\\.0*$", "\\1", x)
type.convert(x)
}
class(type.convert2("123.1"))
# [1] "numeric"
class(type.convert2("123.0"))
# [1] "integer"
class(type.convert2("123."))
# [1] "integer"
class(type.convert2("hello.0"))
# [1] "factor"
type.convert2("hello.0")
# [1] hello.0
# Levels: hello.0
Check if the number is integer
Another alternative is to check the fractional part:
x%%1==0
or, if you want to check within a certain tolerance:
min(abs(c(x%%1, x%%1-1))) < tol
Entire character vector saved as a single string in R
You can try:
library(tidyverse)
eval(parse(text = "c(\"1963-09-16\", \"1969-07-16\")") )
#> [1] "1963-09-16" "1969-07-16"
Or from a df
df <- data.frame(dates = rep("c(\"1963-09-16\", \"1969-07-16\")", 5))
summarise(df, dates = map(dates, function(x) eval(parse(text = x))) %>%
reduce(c))
#> dates
#> 1 1963-09-16
#> 2 1969-07-16
#> 3 1963-09-16
#> 4 1969-07-16
#> 5 1963-09-16
#> 6 1969-07-16
#> 7 1963-09-16
#> 8 1969-07-16
#> 9 1963-09-16
#> 10 1969-07-16
Created on 2021-12-07 by the reprex package (v2.0.1)
Check if string contains ONLY NUMBERS or ONLY CHARACTERS (R)
you need to persist your regex
all_num <- "123"
all_letters <- "abc"
mixed <- "123abc"
grepl("^[A-Za-z]+$", all_num, perl = T) #will be false
grepl("^[A-Za-z]+$", all_letters, perl = T) #will be true
grepl("^[A-Za-z]+$", mixed, perl=T) #will be false
Using R, How to use a character vector to search for matches in a very large character vector
grep
and family only allow a single pattern=
in their call, but one can use Vectorize
to help with this:
out <- Vectorize(grepl, vectorize.args = "pattern")(Cities, Locations)
rownames(out) <- Locations
out
# New York San Francisco Austin
# San Antonio/TX FALSE FALSE FALSE
# Austin/TX FALSE FALSE TRUE
# Boston/MA FALSE FALSE FALSE
(I added rownames(.)
purely to identify columns/rows from the source data.)
With this, if you want to know which index points where, then you can do
apply(out, 1, function(z) which(z)[1])
# San Antonio/TX Austin/TX Boston/MA
# NA 3 NA
apply(out, 2, function(z) which(z)[1])
# New York San Francisco Austin
# NA NA 2
The first indicates the index within Cities
that apply to each specific location. The second indicates the index within Locations
that apply to each of Cities
. Both of these methods assume that there is at most a 1-to-1 matching; if there are ever more, the which(z)[1]
will hide the 2nd and subsequent, which is likely not a good thing.
Avoiding type conflicts with dplyr::case_when
As said in ?case_when
:
All RHSs must evaluate to the same type of vector.
You actually have two possibilities:
1) Create new
as a numeric vector
df <- df %>% mutate(new = case_when(old == 1 ~ 5,
old == 2 ~ NA_real_,
TRUE ~ as.numeric(old)))
Note that NA_real_
is the numeric version of NA
, and that you must convert old
to numeric because you created it as an integer in your original dataframe.
You get:
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: num 5 NA 3
2) Create new
as an integer vector
df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
old == 2 ~ NA_integer_,
TRUE ~ old))
Here, 5L
forces 5 into the integer type, and NA_integer_
is the integer version of NA
.
So this time new
is integer:
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: int 5 NA 3
Related Topics
Calculate Elapsed Time Since Last Event
Can Transparency Be Used with Postscript/Eps
Plot Only One Side/Half of the Violin Plot
Caret: There Were Missing Values in Resampled Performance Measures
How to Get Dimnames in Xtable.Table Output
Subtract Values in One Dataframe from Another
Differencebetween Scale Transformation and Coordinate System Transformation
In R: Joining Vector Elements by Row, Converting Vector Rows to Strings
Incorrect Number of Subscripts on Matrix in R
Making Binned Scatter Plots for Two Variables in Ggplot2 in R
Stargazer Left Align Latex Table Columns
R Looping Through in Survey Package
Why Should Someone Use {} for Initializing an Empty Object in R
How to Search for a String in One Column in Other Columns of a Data Frame
How to Set Factor Levels to the Order They Appear in a Data Frame
How to Run a Job Array in R Using the Rscript Command from the Command Line