how to convert factors with decimal points into numeric values
Let's break this down.
First, because gdp
is a data frame, levels
will return NULL
. You may be looking for the output of levels
on each column of gdp
. In which case you'd want to use something like lapply
.
levels(gdp)
# NULL
lapply(gdp, levels)
# this output will make sense
as.numeric(levels(gdp))[gdp]
# this will make no sense
The error is stating that you cannot use a list (gdp
) to subscript a vector.
To iterate through the columns of gdp
, you will need something like lapply
to work on each component.
gdp <- data.frame(lapply(gdp, function(x) {
if(!is.factor(x)) x
else as.numeric(gsub(",","",levels(x),fixed=TRUE))[x]
}))
Possibly your data set would be better served as a matrix since it appears to be all of type numeric. In which case:
gdp <- as.matrix(gdp)
How to convert a factor to integer\numeric without loss of information?
See the Warning section of ?factor
:
In particular,
as.numeric
applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factorf
to
approximately its original numeric
values,as.numeric(levels(f))[f]
is
recommended and slightly more
efficient than
as.numeric(as.character(f))
.
The FAQ on R has similar advice.
Why is as.numeric(levels(f))[f]
more efficent than as.numeric(as.character(f))
?
as.numeric(as.character(f))
is effectively as.numeric(levels(f)[f])
, so you are performing the conversion to numeric on length(x)
values, rather than on nlevels(x)
values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.
Some timings
library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05
How to convert data.frame column from Factor to numeric
breast$class <- as.numeric(as.character(breast$class))
If you have many columns to convert to numeric
indx <- sapply(breast, is.factor)
breast[indx] <- lapply(breast[indx], function(x) as.numeric(as.character(x)))
Another option is to use stringsAsFactors=FALSE
while reading the file using read.table
or read.csv
Just in case, other options to create/change columns
breast[,'class'] <- as.numeric(as.character(breast[,'class']))
or
breast <- transform(breast, class=as.numeric(as.character(breast)))
How to convert factor format to numeric format in R without changing the values?
Replace comma's with dots, which represent decimals in R. Otherwise R thinks it is a character and coerces the value to NA.
Then, to extract values:
as.numeric(levels(df1[,2])[df[,2]])
(thanks @SimonO101 for the correction)
Convert a factor column with numbers in k format into numeric without losing any data
First detect which records with a "k".
df$is_k <- grepl("k", df$Likes)
Strip the "k", and then convert to numeric. If the record had a "k" then multiple my 1000, else multiple by 1.
df$Likes_num <- as.numeric(gsub("k", "", df$Likes)) * ifelse(df$is_k, 1000, 1)
Edit
For multiple units, I adapted something I had elsewhere for a more complex problem. This shows the steps and is simple enough, though I am not sure how robust it is.
Function
convert_units <- function(x) {
if (class(x) == "numeric") return(x)
# named vector of scalings (you can add to this)
unit_scale <- c("k" = 1e3, "m" = 1e6)
# clean up some potential nuisances with the input
x_str <- gsub(",", "", trimws(tolower(as.character(x))))
# extract out the letters
unit_char <- gsub("[^a-z]", "", x_str)
# extract out the numbers and convert to numeric
x_num <- as.numeric(gsub("[a-z]", "", x_str), "", x_str)
# develop a vector of multipliers
multiplier <- unit_scale[match(unit_char, names(unit_scale))]
multiplier[is.na(multiplier)] <- 1
# multiply
x_num * multiplier
}
Application
df$Likes2 <- convert_units(df$Likes)
Sample Result
ID Likes Likes2
1 1 99k 99000
2 2 997 997
3 3 15.5k 15500
4 4 9.25k 9250
5 5 575 575
6 6 800 800
7 7 8.5k 8500
8 8 2,400 2400
Related Topics
Closing the Lines in a Ggplot2 Radar/Spider Chart
Efficiently Counting Non-Na Elements in Data.Table
Two Y Axis in Highcharter in R
Print the Sourced R File to an Appendix Using Sweave
Setting Working Directory: Julia Versus R
How to Use a Character as Attribute of a Function
Making Binned Scatter Plots for Two Variables in Ggplot2 in R
R Reshape2 'Aggregation Function Missing: Defaulting to Length'
Using Both Color and Size Attributes in Hexagon Binning (Ggplot2)
Maintaining an Input/Output Log in R
R Dataframe with Varied Column Lengths
Extract Consecutive Pairs of Elements from a Vector and Place in a Matrix
R-How to Generate Random Sample of a Discrete Random Variables
Control Number Formatting in Shiny's Implementation of Datatable
How to Classify a Given Date/Time by the Season (E.G. Summer, Autumn)
Dplyr . and _No Visible Binding for Global Variable '.'_ Note in Package Check
Twitter Sentiment Analysis W R Using German Language Set Sentiws
Assign Names to Vector Entries Without Assigning the Vector a Variable Name