Converting a Factor to Numeric Without Losing Information R (As.Numeric() Doesn't Seem to Work)

How to convert a factor to integer\numeric without loss of information?

See the Warning section of ?factor:

In particular, as.numeric applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factor f to
approximately its original numeric
values, as.numeric(levels(f))[f] is
recommended and slightly more
efficient than
as.numeric(as.character(f)).

The FAQ on R has similar advice.


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.


Some timings

library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05

Converting a factor to numeric without losing information R (as.numeric() doesn't seem to work)

First, factor consists of indices and levels. This fact is very very important when you are struggling with factor.

For example,

> z <- factor(letters[c(3, 2, 3, 4)])

# human-friendly display, but internal structure is invisible
> z
[1] c b c d
Levels: b c d

# internal structure of factor
> unclass(z)
[1] 2 1 2 3
attr(,"levels")
[1] "b" "c" "d"

here, z has 4 elements.

The index is 2, 1, 2, 3 in that order.

The level is associated with each index: 1 -> b, 2 -> c, 3 -> d.

Then, as.numeric converts simply the index part of factor into numeric.

as.character handles the index and levels, and generates character vector expressed by its level.

?as.numeric says that Factors are handled by the default method.

What's wrong with as.numeric in R?

Your vector is a factor. This question has been asked quite a few times, ex: here, here, here. In order to convert a factor to numeric, you'll have to convert to character first. Try:

as.numeric(as.character(my_vec))

Strange behaviour of as.numeric() with factor variable - gives completely different numbers to those supplied

This question deals with how R understands your process. Count = 1 is the smallest number and so this become Countnum = 1. Count = 3 is the second highest number so the factor level is 2, which also means that the Countnum = 2, and so on and so forth. In effect, what your first as.numeric does is takes the factor level and converts the factor level to a number. The Countnum_char takes the character value (e.g. Count = 8 is factor level = 5 or Count = 5 is factor level = 3) as its value and converts the value to a number, not the factor level.

How to convert a factor variable to numeric while preserving the numbers in R

dv$ICPSR <- as.numeric(as.character(dv$ICPSR))

Transform your factor to a character vector before transforming it into a numeric vector.

convert factor to original numeric value

The easiest solution would be to change how you specify the call to factor such that it can work with any number of numeric levels.

fact <- factor(c(1,1,0,1,0,1, 2),
levels=c(0,1, 2),
labels=c("no", "yes", "maybe"))
as.numeric(fact) - 1

Converting Character to Numeric without NA Coercion in R

As Anando pointed out, the problem is somewhere in your data, and we can't really help you much without a reproducible example. That said, here's a code snippet to help you pin down the records in your data that are causing you problems:

test = as.character(c(1,2,3,4,'M'))
v = as.numeric(test) # NAs intorduced by coercion
ix.na = is.na(v)
which(ix.na) # row index of our problem = 5
test[ix.na] # shows the problematic record, "M"

Instead of guessing as to why NAs are being introduced, pull out the records that are causing the problem and address them directly/individually until the NAs go away.

UPDATE: Looks like the problem is in your call to str_replace_all. I don't know the stringr library, but I think you can accomplish the same thing with gsub like this:

v2 = c("1.00","2.00","3.00")
gsub("\\.00", "", v2)

[1] "1" "2" "3"

I'm not entirely sure what this accomplishes though:

sum(as.numeric(v2)!=as.numeric(gsub("\\.00", "", v2))) # Illustrate that vectors are equivalent.

[1] 0

Unless this achieves some specific purpose for you, I'd suggest dropping this step from your preprocessing entirely, as it doesn't appear necessary and seems to be giving you problems.



Related Topics



Leave a reply



Submit