How to Convert Class of Several Variables at Once

How to convert class of several variables at once

Try mutate_each and (as per @Franks comment the %<>% operator from the magrittr package in order to modify in place)

library(magrittr)
df %<>% mutate_each(funs(as.numeric), starts_with("sect1"))
str(df)
# 'data.frame': 5 obs. of 3 variables:
# $ sect1q1: num 1 2 3 4 5
# $ sect1q2: num 2 3 4 7 8
# $ id : num 22 33 44 55 66

Alternatively, using data.table package, you could modify your data in place using the := operator

library(data.table)
indx <- grep("^sect1", names(df), value = TRUE)
setDT(df)[, (indx) := lapply(.SD, as.numeric), .SDcols = indx]

Change class of multiple columns in data frame without for loop

OK I worked it out while writing the question, but figured it might as well go up in case it's use to anyone in future:

mydf[,2:3] <- lapply(mydf[,2:3], as.factor)

Change the class from factor to numeric of many columns in a data frame

Further to Ramnath's answer, the behaviour you are experiencing is that due to as.numeric(x) returning the internal, numeric representation of the factor x at the R level. If you want to preserve the numbers that are the levels of the factor (rather than their internal representation), you need to convert to character via as.character() first as per Ramnath's example.

Your for loop is just as reasonable as an apply call and might be slightly more readable as to what the intention of the code is. Just change this line:

stats[,i] <- as.numeric(stats[,i])

to read

stats[,i] <- as.numeric(as.character(stats[,i]))

This is FAQ 7.10 in the R FAQ.

HTH

Convert type of multiple columns of a dataframe at once

Edit See this related question for some simplifications and extensions on this basic idea.

My comment to Brandon's answer using switch:

convert.magic <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor)
obj[,i] <- FUN(obj[,i])
}
obj
}

out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...

For truly large data frames you may want to use lapply instead of the for loop:

convert.magic1 <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out,stringsAsFactors = FALSE)
}

When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...)). Also, be aware of data.frame() and as.data.frame()s default behavior of converting character to factor.

converting multiple columns from character to numeric format in r

You could try

DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)

# Check columns classes
sapply(DF, class)

# a b c
# "character" "character" "character"

cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)

# a b c
# "numeric" "numeric" "character"

Coerce multiple columns to factors at once

Choose some columns to coerce to factors:

cols <- c("A", "C", "D", "H")

Use lapply() to coerce and replace the chosen columns:

data[cols] <- lapply(data[cols], factor)  ## as.factor() could also be used

Check the result:

sapply(data, class)
# A B C D E F G
# "factor" "integer" "factor" "factor" "integer" "integer" "integer"
# H I J
# "factor" "integer" "integer"

Set multiple column classes from a vector in data.table

Same idea as @RonakShah's answer but assuming the OP has explicitly named the columns rather than passing by position:

# different input format
cc <- setNames(col_classes, names(dtnew))

# usage
res = lapply(setNames(, names(cc)), function(n)
match.fun(sprintf("as.%s", cc[[n]]))(dtnew[[n]])
)
setDT(res)[]

Some other ways the problem might be solved:

  • If reading the data in, use the colClasses= argument to fread() or a similar function.

  • Maybe also consider type.convert which will automatically guess and apply a class to each column. It cannot return a mix of character and factor columns, however.

Writing a custom function to convert class of variables in a dataframe based on another table

Here's an approach leveraging across and cur_column:

library(dplyr) #version >= 1.0.0
df_1 %>%
mutate(across(any_of(df_2$var_name),
~get(paste0("as.",df_2[df_2$var_name == cur_column(),"var_class"]))(.x)))
# A tibble: 10 x 6
name height weight age gender preferred_pet
<chr> <dbl> <dbl> <chr> <fct> <fct>
1 john 161 100 38 female frog
2 jack 192 67 87 female dog
3 mary 193 52 24 male rabbit
4 matt 166 95 92 male dog
5 elizabeth 160 89 82 female cat
6 richard 199 75 57 male dog
7 carlos 195 85 37 female rabbit
8 george 159 86 62 male rabbit
9 ferdinand 177 71 78 female cat
10 william 197 80 89 female rabbit

The any_of selection helper insures that you only try to mutate columns that are present in df_2.

The second argument is the function that is applied to the columns that are present. You can use cur_column() to have access to the name of the column that is being mutated. From there, we just look up that column name in df_2 and return the var_class you want. Then use get() from base R to return the appropriate function and apply that to the column with (.x).

If you wanted to define a function, and pass the column names unquoted as you would with other tidyverse functions, you could use rlang::enquo:

library(rlang)
change_class_by_table <- function(data,data_ref,column_name,column_class){
data %>%
mutate(across(any_of(pull(data_ref,!!enquo(column_name))),
~get(paste0("as.",filter(data_ref, !!enquo(column_name) == cur_column()) %>%
pull(!!enquo(column_class))))(.x)))
}
change_class_by_table(df_1,df_2,var_name,var_class)
## A tibble: 10 x 6
# name height weight age gender preferred_pet
# <chr> <dbl> <dbl> <chr> <fct> <fct>
# 1 john 161 100 38 female frog
# 2 jack 192 67 87 female dog
# 3 mary 193 52 24 male rabbit
# ...

Convert column classes in data.table

For a single column:

dtnew <- dt[, Quarter:=as.character(Quarter)]
str(dtnew)

Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
$ Quarter: chr "1" "2" "3" "4" ...
$ value : num -0.838 0.146 -1.059 -1.197 0.282 ...

Using lapply and as.character:

dtnew <- dt[, lapply(.SD, as.character), by=ID]
str(dtnew)

Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
$ Quarter: chr "1" "2" "3" "4" ...
$ value : chr "1.487145280568" "-0.827845218358881" "0.028977182770002" "1.35392750102305" ...


Related Topics



Leave a reply



Submit