How to convert a factor to integer\numeric without loss of information?
See the Warning section of ?factor
:
In particular,
as.numeric
applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factorf
to
approximately its original numeric
values,as.numeric(levels(f))[f]
is
recommended and slightly more
efficient than
as.numeric(as.character(f))
.
The FAQ on R has similar advice.
Why is as.numeric(levels(f))[f]
more efficent than as.numeric(as.character(f))
?
as.numeric(as.character(f))
is effectively as.numeric(levels(f)[f])
, so you are performing the conversion to numeric on length(x)
values, rather than on nlevels(x)
values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.
Some timings
library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05
Converting a factor to numeric without losing information R (as.numeric() doesn't seem to work)
First, factor consists of indices and levels. This fact is very very important when you are struggling with factor.
For example,
> z <- factor(letters[c(3, 2, 3, 4)])
# human-friendly display, but internal structure is invisible
> z
[1] c b c d
Levels: b c d
# internal structure of factor
> unclass(z)
[1] 2 1 2 3
attr(,"levels")
[1] "b" "c" "d"
here, z
has 4 elements.
The index is 2, 1, 2, 3
in that order.
The level is associated with each index: 1 -> b, 2 -> c, 3 -> d.
Then, as.numeric
converts simply the index part of factor into numeric.as.character
handles the index and levels, and generates character vector expressed by its level.
?as.numeric
says that Factors are handled by the default method.
how to convert factor levels to integer in r
We can use match
with unique
elements
library(dplyr)
dat %>%
mutate_all(funs(match(., unique(.))))
# ID Season Year Weekday
#1 1 1 1 1
#2 2 1 2 2
#3 3 2 1 1
#4 4 2 2 3
Convert factor to integer
You can combine the two functions; coerce to characters thence to numerics:
> fac <- factor(c("1","2","1","2"))
> as.numeric(as.character(fac))
[1] 1 2 1 2
Losing information after converting from factor to numeric in R
You have values with commas( ','
) which turn into NA
when changing to numeric, remove them before converting to numeric.
xdate1$Amount.in.doc..curr. <- as.numeric(gsub(',', '', xdate1$Amount.in.doc..curr.))
Or use parse_number
from readr
xdate1$Amount.in.doc..curr. <- readr::parse_number(as.character(xdate1$Amount.in.doc..curr.))
Related Topics
To Find Most Frequently Occuring Element in Matrix in R
Conditionally Remove Rows from a Database Using R
How to Add a Row to Data Frame Based on a Condition
Removing Columns That Are All 0
How to View the Source Code For a Function
Subset Rows Corresponding to Max Value by Group Using Data.Table
Select/Assign to Data.Table When Variable Names Are Stored in a Character Vector
How to Read Multiple (Excel) Files into R
Paste Multiple Columns Together
Splitting a Dataframe String Column into Multiple Different Columns
How to Convert Only Some Positive Numbers to Negative Numbers (Conditional Recoding)
Add Column Values Based on Other Columns in Data Frame Using for and If
Combine Two Lists in a Dataframe in R
Coerce Multiple Columns to Factors At Once
How to Import Multiple .Csv Files At Once
Counting Unique/Distinct Values by Group in a Data Frame
Does Ifelse Really Calculate Both of Its Vectors Every Time? Is It Slow