How to Convert a Factor Column That Contains Decimal Numbers to Numeric

how to convert factors with decimal points into numeric values

Let's break this down.

First, because gdp is a data frame, levels will return NULL. You may be looking for the output of levels on each column of gdp. In which case you'd want to use something like lapply.

levels(gdp)
# NULL
lapply(gdp, levels)
# this output will make sense
as.numeric(levels(gdp))[gdp]
# this will make no sense

The error is stating that you cannot use a list (gdp) to subscript a vector.

To iterate through the columns of gdp, you will need something like lapply to work on each component.

gdp <- data.frame(lapply(gdp, function(x) {
if(!is.factor(x)) x
else as.numeric(gsub(",","",levels(x),fixed=TRUE))[x]
}))

Possibly your data set would be better served as a matrix since it appears to be all of type numeric. In which case:

gdp <- as.matrix(gdp)

How to convert a factor to integer\numeric without loss of information?

See the Warning section of ?factor:

In particular, as.numeric applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factor f to
approximately its original numeric
values, as.numeric(levels(f))[f] is
recommended and slightly more
efficient than
as.numeric(as.character(f)).

The FAQ on R has similar advice.


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.


Some timings

library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05

How to convert data.frame column from Factor to numeric

breast$class <- as.numeric(as.character(breast$class))

If you have many columns to convert to numeric

indx <- sapply(breast, is.factor)
breast[indx] <- lapply(breast[indx], function(x) as.numeric(as.character(x)))

Another option is to use stringsAsFactors=FALSE while reading the file using read.table or read.csv

Just in case, other options to create/change columns

 breast[,'class'] <- as.numeric(as.character(breast[,'class']))

or

 breast <- transform(breast, class=as.numeric(as.character(breast)))

How to convert factor format to numeric format in R without changing the values?

Replace comma's with dots, which represent decimals in R. Otherwise R thinks it is a character and coerces the value to NA.

Then, to extract values:

as.numeric(levels(df1[,2])[df[,2]])

(thanks @SimonO101 for the correction)

Convert a factor column with numbers in k format into numeric without losing any data

First detect which records with a "k".

df$is_k <- grepl("k", df$Likes)

Strip the "k", and then convert to numeric. If the record had a "k" then multiple my 1000, else multiple by 1.

df$Likes_num <- as.numeric(gsub("k", "", df$Likes)) * ifelse(df$is_k, 1000, 1)


Edit

For multiple units, I adapted something I had elsewhere for a more complex problem. This shows the steps and is simple enough, though I am not sure how robust it is.

Function

convert_units <- function(x) {

if (class(x) == "numeric") return(x)

# named vector of scalings (you can add to this)
unit_scale <- c("k" = 1e3, "m" = 1e6)

# clean up some potential nuisances with the input
x_str <- gsub(",", "", trimws(tolower(as.character(x))))

# extract out the letters
unit_char <- gsub("[^a-z]", "", x_str)

# extract out the numbers and convert to numeric
x_num <- as.numeric(gsub("[a-z]", "", x_str), "", x_str)

# develop a vector of multipliers
multiplier <- unit_scale[match(unit_char, names(unit_scale))]
multiplier[is.na(multiplier)] <- 1

# multiply
x_num * multiplier
}

Application

df$Likes2 <- convert_units(df$Likes)

Sample Result

  ID Likes Likes2
1 1 99k 99000
2 2 997 997
3 3 15.5k 15500
4 4 9.25k 9250
5 5 575 575
6 6 800 800
7 7 8.5k 8500
8 8 2,400 2400


Related Topics



Leave a reply



Submit