Convert Factor to Integer

Convert factor to integer

You can combine the two functions; coerce to characters thence to numerics:

> fac <- factor(c("1","2","1","2"))
> as.numeric(as.character(fac))
[1] 1 2 1 2

How to convert a factor to integer\numeric without loss of information?

See the Warning section of ?factor:

In particular, as.numeric applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factor f to
approximately its original numeric
values, as.numeric(levels(f))[f] is
recommended and slightly more
efficient than
as.numeric(as.character(f)).

The FAQ on R has similar advice.


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.


Some timings

library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05

how to convert factor levels to integer in r

We can use match with unique elements

library(dplyr)
dat %>%
mutate_all(funs(match(., unique(.))))
# ID Season Year Weekday
#1 1 1 1 1
#2 2 1 2 2
#3 3 2 1 1
#4 4 2 2 3

Convert factor to integer in a data frame

With anna.table (it is a data frame by the way, a table is something else!), the easiest way will be to just do:

anna.table2 <- data.matrix(anna.table)

as data.matrix() will convert factors to their underlying numeric (integer) levels. This will work for a data frame that contains only numeric, integer, factor or other variables that can be coerced to numeric, but any character strings (character) will cause the matrix to become a character matrix.

If you want anna.table2 to be a data frame, not as matrix, then you can subsequently do:

anna.table2 <- data.frame(anna.table2)

Other options are to coerce all factor variables to their integer levels. Here is an example of that:

## dummy data
set.seed(1)
dat <- data.frame(a = factor(sample(letters[1:3], 10, replace = TRUE)),
b = runif(10))

## sapply over `dat`, converting factor to numeric
dat2 <- sapply(dat, function(x) if(is.factor(x)) {
as.numeric(x)
} else {
x
})
dat2 <- data.frame(dat2) ## convert to a data frame

Which gives:

> str(dat)
'data.frame': 10 obs. of 2 variables:
$ a: Factor w/ 3 levels "a","b","c": 1 2 2 3 1 3 3 2 2 1
$ b: num 0.206 0.177 0.687 0.384 0.77 ...
> str(dat2)
'data.frame': 10 obs. of 2 variables:
$ a: num 1 2 2 3 1 3 3 2 2 1
$ b: num 0.206 0.177 0.687 0.384 0.77 ...

However, do note that the above will work only if you want the underlying numeric representation. If your factor has essentially numeric levels, then we need to be a bit cleverer in how we convert the factor to a numeric whilst preserving the "numeric" information coded in the levels. Here is an example:

## dummy data
set.seed(1)
dat3 <- data.frame(a = factor(sample(1:3, 10, replace = TRUE), levels = 3:1),
b = runif(10))

## sapply over `dat3`, converting factor to numeric
dat4 <- sapply(dat3, function(x) if(is.factor(x)) {
as.numeric(as.character(x))
} else {
x
})
dat4 <- data.frame(dat4) ## convert to a data frame

Note how we need to do as.character(x) first before we do as.numeric(). The extra call encodes the level information before we convert that to numeric. To see why this matters, note what dat3$a is

> dat3$a
[1] 1 2 2 3 1 3 3 2 2 1
Levels: 3 2 1

If we just convert that to numeric, we get the wrong data as R converts the underlying level codes

> as.numeric(dat3$a)
[1] 3 2 2 1 3 1 1 2 2 3

If we coerce the factor to a character vector first, then to a numeric one, we preserve the original information not R's internal representation

> as.numeric(as.character(dat3$a))
[1] 1 2 2 3 1 3 3 2 2 1

If your data are like this second example, then you can't use the simple data.matrix() trick as that is the same as applying as.numeric() directly to the factor and as this second example shows, that doesn't preserve the original information.



Related Topics



Leave a reply



Submit