How to convert class of several variables at once
Try mutate_each
and (as per @Franks comment the %<>%
operator from the magrittr
package in order to modify in place)
library(magrittr)
df %<>% mutate_each(funs(as.numeric), starts_with("sect1"))
str(df)
# 'data.frame': 5 obs. of 3 variables:
# $ sect1q1: num 1 2 3 4 5
# $ sect1q2: num 2 3 4 7 8
# $ id : num 22 33 44 55 66
Alternatively, using data.table
package, you could modify your data in place using the :=
operator
library(data.table)
indx <- grep("^sect1", names(df), value = TRUE)
setDT(df)[, (indx) := lapply(.SD, as.numeric), .SDcols = indx]
Change class of multiple columns in data frame without for loop
OK I worked it out while writing the question, but figured it might as well go up in case it's use to anyone in future:
mydf[,2:3] <- lapply(mydf[,2:3], as.factor)
Change the class from factor to numeric of many columns in a data frame
Further to Ramnath's answer, the behaviour you are experiencing is that due to as.numeric(x)
returning the internal, numeric representation of the factor x
at the R level. If you want to preserve the numbers that are the levels of the factor (rather than their internal representation), you need to convert to character via as.character()
first as per Ramnath's example.
Your for
loop is just as reasonable as an apply
call and might be slightly more readable as to what the intention of the code is. Just change this line:
stats[,i] <- as.numeric(stats[,i])
to read
stats[,i] <- as.numeric(as.character(stats[,i]))
This is FAQ 7.10 in the R FAQ.
HTH
Convert type of multiple columns of a dataframe at once
Edit See this related question for some simplifications and extensions on this basic idea.
My comment to Brandon's answer using switch
:
convert.magic <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor)
obj[,i] <- FUN(obj[,i])
}
obj
}
out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
For truly large data frames you may want to use lapply
instead of the for
loop:
convert.magic1 <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out,stringsAsFactors = FALSE)
}
When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...))
. Also, be aware of data.frame()
and as.data.frame()
s default behavior of converting character to factor.
converting multiple columns from character to numeric format in r
You could try
DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)
# Check columns classes
sapply(DF, class)
# a b c
# "character" "character" "character"
cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)
# a b c
# "numeric" "numeric" "character"
Coerce multiple columns to factors at once
Choose some columns to coerce to factors:
cols <- c("A", "C", "D", "H")
Use lapply()
to coerce and replace the chosen columns:
data[cols] <- lapply(data[cols], factor) ## as.factor() could also be used
Check the result:
sapply(data, class)
# A B C D E F G
# "factor" "integer" "factor" "factor" "integer" "integer" "integer"
# H I J
# "factor" "integer" "integer"
Set multiple column classes from a vector in data.table
Same idea as @RonakShah's answer but assuming the OP has explicitly named the columns rather than passing by position:
# different input format
cc <- setNames(col_classes, names(dtnew))
# usage
res = lapply(setNames(, names(cc)), function(n)
match.fun(sprintf("as.%s", cc[[n]]))(dtnew[[n]])
)
setDT(res)[]
Some other ways the problem might be solved:
If reading the data in, use the
colClasses=
argument tofread()
or a similar function.Maybe also consider
type.convert
which will automatically guess and apply a class to each column. It cannot return a mix of character and factor columns, however.
Writing a custom function to convert class of variables in a dataframe based on another table
Here's an approach leveraging across
and cur_column
:
library(dplyr) #version >= 1.0.0
df_1 %>%
mutate(across(any_of(df_2$var_name),
~get(paste0("as.",df_2[df_2$var_name == cur_column(),"var_class"]))(.x)))
# A tibble: 10 x 6
name height weight age gender preferred_pet
<chr> <dbl> <dbl> <chr> <fct> <fct>
1 john 161 100 38 female frog
2 jack 192 67 87 female dog
3 mary 193 52 24 male rabbit
4 matt 166 95 92 male dog
5 elizabeth 160 89 82 female cat
6 richard 199 75 57 male dog
7 carlos 195 85 37 female rabbit
8 george 159 86 62 male rabbit
9 ferdinand 177 71 78 female cat
10 william 197 80 89 female rabbit
The any_of
selection helper insures that you only try to mutate columns that are present in df_2
.
The second argument is the function that is applied to the columns that are present. You can use cur_column()
to have access to the name of the column that is being mutated. From there, we just look up that column name in df_2
and return the var_class
you want. Then use get()
from base R to return the appropriate function and apply that to the column with (.x)
.
If you wanted to define a function, and pass the column names unquoted as you would with other tidyverse functions, you could use rlang::enquo
:
library(rlang)
change_class_by_table <- function(data,data_ref,column_name,column_class){
data %>%
mutate(across(any_of(pull(data_ref,!!enquo(column_name))),
~get(paste0("as.",filter(data_ref, !!enquo(column_name) == cur_column()) %>%
pull(!!enquo(column_class))))(.x)))
}
change_class_by_table(df_1,df_2,var_name,var_class)
## A tibble: 10 x 6
# name height weight age gender preferred_pet
# <chr> <dbl> <dbl> <chr> <fct> <fct>
# 1 john 161 100 38 female frog
# 2 jack 192 67 87 female dog
# 3 mary 193 52 24 male rabbit
# ...
Convert column classes in data.table
For a single column:
dtnew <- dt[, Quarter:=as.character(Quarter)]
str(dtnew)
Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
$ Quarter: chr "1" "2" "3" "4" ...
$ value : num -0.838 0.146 -1.059 -1.197 0.282 ...
Using lapply
and as.character
:
dtnew <- dt[, lapply(.SD, as.character), by=ID]
str(dtnew)
Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
$ Quarter: chr "1" "2" "3" "4" ...
$ value : chr "1.487145280568" "-0.827845218358881" "0.028977182770002" "1.35392750102305" ...
Related Topics
Extract English Words from a Text in R
R: in Barplot Midpoints Are Not Centered W.R.T. Bars
Combining .Sd with Renamed Variable Messes with Names of .Sd Columns
Dist Function with Large Number of Points
Convert Data with One Column and Multiple Rows into Multi Column Multi Row Data
Selecting Unique Rows in Matrix Using R
Adding a New Column to Matrix Error
How Can One Mix 2 or More Color Palettes to Show a Combined Color Value
R: Removing Duplicate Elements in a Vector
How to Highlight Area Between Two Lines? Ggplot
Set Standard Legend Key Size with Long Label Names Ggplot
R:Function to Generate a Mixture Distribution
Removing Everything After First 'Backslash' in a String