Finding Non-Numeric Data in a Data Frame or Vector

Finding non-numeric data in a data frame or vector

df <- data.frame(x = c(1,2,3,4,"five",6,7,8,"nine",10))

The trick is knowing that converting to numeric via as.numeric(as.character(.)) will convert non-numbers to NA.

which(is.na(as.numeric(as.character(df[[1]]))))
## 5 9

(just using as.numeric(df[[1]]) doesn't work - it just drops the levels leaving the numeric codes).

You might choose to suppress the warnings:

which.nonnum <- function(x) {
which(is.na(suppressWarnings(as.numeric(as.character(x)))))
}
which.nonnum(df[[1]])

To be more careful, you should also check that the values weren't NA before conversion:

which.nonnum <- function(x) {
badNum <- is.na(suppressWarnings(as.numeric(as.character(x))))
which(badNum & !is.na(x))
}

lapply(df, which.nonnum) will report 'bad' values for all columns of the data frame.

Find non-numeric entries in a column that is supposed to contain numbers using R

You could try

which(!grepl('^[0-9]',grades))

to check which entries do not consist out of only numeric characters. It outputs

2 5 7 9

Hope this helps!

Checking all non-numerical entries in a data.frame column and delete or substitute

I would use a loop and readline to create the new vector like this:

df <- data.frame(list(A=c(1, 2, 3, 4, 5, 6, 7, 8, 9), B=c("40g", "< 2", "thx", "about 1", "1-2", "1/2", 3, 2.3, "two")))
df$B <- as.character(df$B)

myscan <- function(x) {
new <- vector("numeric",length(x))
for(i in seq_along(x)) {
new[i] <- readline(sprintf("Non numeric entry '%s' new value to set: ",x[i]))
}
as.numeric(new)
}

# get the entries
notNum <- is.na( as.numeric(df$B) )
# Loop and ask for updates
df$B[notNum] <- myscan(df$B[notNum])

When run it gives:

> df$B[notNum] <- as.numeric( myscan(df$B[notNum]) )
Non numeric entry '40g' new value to set: 0.4
Non numeric entry '< 2' new value to set: na
Non numeric entry 'thx' new value to set: ba
Non numeric entry 'about 1' new value to set: 1
Non numeric entry '1-2' new value to set: 1.5
Non numeric entry '1/2' new value to set: na
Non numeric entry 'two' new value to set: 2

Then we return the column to numeric state:

df$B <- as.numeric(df$B)

And we get the new data frame:

> df
A B
1 1 0.4
2 2 NA
3 3 NA
4 4 1.0
5 5 1.5
6 6 NA
7 7 3.0
8 8 2.3
9 9 2.0

How to use OR between two non-numeric values?

You can use %in% instead:

m.v <- c("A", "AGG", "A" ,"G", "GA")
count <- 0
for(i in 1: 5){
if(m.v[i] %in% c("A", "G")){
count <- count+1
}
}
count
[1] 3

How to convert all non numeric cells in data frame to NA

Based on your edit, you have vectors which should be numeric, but due to some erroneous data introduced during the reading-in process, the data have been converted to another format (likely character or factor).

Here is an example of that case. mydf1 <- mydf2 <- mydf3 <-
data.frame(...)
just creates three data.frames with the same data.

# I'm going to show three approaches
mydf1 <- mydf2 <- mydf3 <- data.frame(
A = c(1, 2, "x", 4),
B = c("y", 3, 4, "-")
)

str(mydf1)
# 'data.frame': 4 obs. of 2 variables:
# $ A: Factor w/ 4 levels "1","2","4","x": 1 2 4 3
# $ B: Factor w/ 4 levels "-","3","4","y": 4 2 3 1

One way to do this is to just let R coerce any values that cannot be converted to numeric to NA:

## You WILL get warnings
mydf1[] <- lapply(mydf1, function(x) as.numeric(as.character(x)))
# Warning messages:
# 1: In FUN(X[[i]], ...) : NAs introduced by coercion
# 2: In FUN(X[[i]], ...) : NAs introduced by coercion

str(mydf1)
# 'data.frame': 4 obs. of 2 variables:
# $ A: num 1 2 NA 4
# $ B: num NA 3 4 NA

Another option is to use makemeNA from my SOfun package:

library(SOfun)
makemeNA(mydf2, "[^0-9]", FALSE)
# A B
# 1 1 NA
# 2 2 3
# 3 NA 4
# 4 4 NA

str(.Last.value)
# 'data.frame': 4 obs. of 2 variables:
# $ A: int 1 2 NA 4
# $ B: int NA 3 4 NA

This function is a bit different in that it uses type.convert to do the conversion, and can handle more specific rules for conversion to NA (just like you can use a vector for na.strings when reading data into R).


About your error, I believe you would have tried as.numeric on your data.frame to get the error you had shown.

Example:

# Your error...
as.numeric(mydf3)
# Error: (list) object cannot be coerced to type 'double'

You won't get that error on a matrix though (but you'll still get the warning)....

# You'll get a warning
as.numeric(as.matrix(mydf3))
# [1] 1 2 NA 4 NA 3 4 NA
# Warning message:
# NAs introduced by coercion

Why don't we need to explicitly use as.character? as.matrix does that for you:

str(as.matrix(mydf3))
# chr [1:4, 1:2] "1" "2" "x" "4" "y" "3" "4" "-"
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:2] "A" "B"

How can you use that information?

mydf3[] <- as.numeric(as.matrix(mydf3))
# Warning message:
# NAs introduced by coercion

str(mydf3)
# 'data.frame': 4 obs. of 2 variables:
# $ A: num 1 2 NA 4
# $ B: num NA 3 4 NA

R: How to find the mean of a column in a data frame, that has non-numeric (specifically, dashes '-') as well as numeric numbers

Try this, assuming your data is called dat:

dat[dat == "-"] <- NA

mean(dat$Population_and_People, na.rm = TRUE]


Related Topics



Leave a reply



Submit