Convert All Data Frame Character Columns to Factors

Convert all data frame character columns to factors


DF <- data.frame(x=letters[1:5], y=1:5, stringsAsFactors=FALSE)

str(DF)
#'data.frame': 5 obs. of 2 variables:
# $ x: chr "a" "b" "c" "d" ...
# $ y: int 1 2 3 4 5

You can use as.data.frame to turn all character columns into factor columns:

DF <- as.data.frame(unclass(DF),stringsAsFactors=TRUE)
str(DF)
#'data.frame': 5 obs. of 2 variables:
# $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
# $ y: int 1 2 3 4 5

How to convert only character variables to factor in R without dplyr?

it is easier with mutate_if

df %>%
mutate_if(is.character, factor)

In the OP's code, they used sapply, which converts to matrix and matrix can hold only a single class. it is better to use lapply

i1 <- sapply(df, is.character)
df[i1] <- lapply(df[i1], factor)

Convert data.frame column format from character to factor

Hi welcome to the world of R.

mtcars  #look at this built in data set
str(mtcars) #allows you to see the classes of the variables (all numeric)

#one approach it to index with the $ sign and the as.factor function
mtcars$am <- as.factor(mtcars$am)
#another approach
mtcars[, 'cyl'] <- as.factor(mtcars[, 'cyl'])
str(mtcars) # now look at the classes

This also works for character, dates, integers and other classes

Since you're new to R I'd suggest you have a look at these two websites:

R reference manuals:
http://cran.r-project.org/manuals.html

R Reference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf

Coerce multiple columns to factors at once

Choose some columns to coerce to factors:

cols <- c("A", "C", "D", "H")

Use lapply() to coerce and replace the chosen columns:

data[cols] <- lapply(data[cols], factor)  ## as.factor() could also be used

Check the result:

sapply(data, class)
# A B C D E F G
# "factor" "integer" "factor" "factor" "integer" "integer" "integer"
# H I J
# "factor" "integer" "integer"

Converting all and only suitable character columns to numeric in data.table

My first thought was to use type.convert, but that either converts character and Date to factor, or with as.is=TRUE it converts factor to character.

str(DT[, lapply(.SD, type.convert)])
# Classes 'data.table' and 'data.frame': 100 obs. of 13 variables:
# $ panelID : int 4 39 1 34 23 43 14 18 33 21 ...
# $ Country : Factor w/ 3 levels "Albania","Belarus",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ some_NA : int 0 2 4 1 5 3 0 2 4 1 ...
# $ some_NA_factor: int 3 2 0 5 1 4 3 2 0 5 ...
# $ Group : int 1 1 1 1 1 1 1 1 1 1 ...
# $ Time : Factor w/ 20 levels "2010-01-02","2010-02-02",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ wt : num 0.15 0.3 0.15 0.9 1.35 1.2 1.2 0.75 0.6 1.2 ...
# $ Income : num -4.4 -6.41 2.28 -3.85 -0.02 ...
# $ Happiness : int 3 10 6 9 5 7 4 1 2 8 ...
# $ Sex : num 0.61 1.18 0.55 0.69 0.63 0.65 0.67 0.9 0.7 0.6 ...
# $ Age : int 15 2 65 67 73 17 84 5 41 91 ...
# $ Educ : num 0.54 1.04 1.29 0.43 0.76 0.63 0.6 0.44 0.48 1.13 ...
# $ uniqueID : int 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, ".internal.selfref")=<externalptr>
str(DT[, lapply(.SD, type.convert, as.is = TRUE)])
# Classes 'data.table' and 'data.frame': 100 obs. of 13 variables:
# $ panelID : int 4 39 1 34 23 43 14 18 33 21 ...
# $ Country : chr "Albania" "Albania" "Albania" "Albania" ...
# $ some_NA : int 0 2 4 1 5 3 0 2 4 1 ...
# $ some_NA_factor: int 3 2 0 5 1 4 3 2 0 5 ...
# $ Group : int 1 1 1 1 1 1 1 1 1 1 ...
# $ Time : chr "2010-01-02" "2010-02-02" "2010-03-02" "2010-04-02" ...
# $ wt : num 0.15 0.3 0.15 0.9 1.35 1.2 1.2 0.75 0.6 1.2 ...
# $ Income : num -4.4 -6.41 2.28 -3.85 -0.02 ...
# $ Happiness : int 3 10 6 9 5 7 4 1 2 8 ...
# $ Sex : num 0.61 1.18 0.55 0.69 0.63 0.65 0.67 0.9 0.7 0.6 ...
# $ Age : int 15 2 65 67 73 17 84 5 41 91 ...
# $ Educ : num 0.54 1.04 1.29 0.43 0.76 0.63 0.6 0.44 0.48 1.13 ...
# $ uniqueID : int 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, ".internal.selfref")=<externalptr>

So I think we need our own function with similar intentions.

mytype <- function(z) if (is.character(z) && all(grepl("^-?[\\d.]+(?:e-?\\d+)?$", z, perl = TRUE))) as.numeric(z) else z
str(DT[, lapply(.SD, mytype)])
# Classes 'data.table' and 'data.frame': 100 obs. of 13 variables:
# $ panelID : int 4 39 1 34 23 43 14 18 33 21 ...
# $ Country : chr "Albania" "Albania" "Albania" "Albania" ...
# $ some_NA : int 0 2 4 1 5 3 0 2 4 1 ...
# $ some_NA_factor: Factor w/ 6 levels "0","1","2","3",..: 4 3 1 6 2 5 4 3 1 6 ...
# $ Group : num 1 1 1 1 1 1 1 1 1 1 ...
# $ Time : Date, format: "2010-01-02" "2010-02-02" "2010-03-02" ...
# $ wt : num 0.15 0.3 0.15 0.9 1.35 1.2 1.2 0.75 0.6 1.2 ...
# $ Income : num -4.4 -6.41 2.28 -3.85 -0.02 ...
# $ Happiness : int 3 10 6 9 5 7 4 1 2 8 ...
# $ Sex : num 0.61 1.18 0.55 0.69 0.63 0.65 0.67 0.9 0.7 0.6 ...
# $ Age : int 15 2 65 67 73 17 84 5 41 91 ...
# $ Educ : num 0.54 1.04 1.29 0.43 0.76 0.63 0.6 0.44 0.48 1.13 ...
# $ uniqueID : int 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, ".internal.selfref")=<externalptr>

With larger data, you may prefer to break the grepl condition out so that you define which columns to work on:

mytypetest <- function(z) is.character(z) && all(grepl("^-?[\\d.]+(?:e-?\\d+)?$", z, perl = TRUE))
cols <- which(sapply(DT, mytypetest))
cols
# Group
# 5
DT[, (cols) := lapply(.SD, as.numeric), .SDcols = cols]
str(DT)
# Classes 'data.table' and 'data.frame': 100 obs. of 13 variables:
# $ panelID : int 4 39 1 34 23 43 14 18 33 21 ...
# $ Country : chr "Albania" "Albania" "Albania" "Albania" ...
# $ some_NA : int 0 2 4 1 5 3 0 2 4 1 ...
# $ some_NA_factor: Factor w/ 6 levels "0","1","2","3",..: 4 3 1 6 2 5 4 3 1 6 ...
# $ Group : num 1 1 1 1 1 1 1 1 1 1 ...
# $ Time : Date, format: "2010-01-02" "2010-02-02" "2010-03-02" ...
# $ wt : num 0.15 0.3 0.15 0.9 1.35 1.2 1.2 0.75 0.6 1.2 ...
# $ Income : num -4.4 -6.41 2.28 -3.85 -0.02 ...
# $ Happiness : int 3 10 6 9 5 7 4 1 2 8 ...
# $ Sex : num 0.61 1.18 0.55 0.69 0.63 0.65 0.67 0.9 0.7 0.6 ...
# $ Age : int 15 2 65 67 73 17 84 5 41 91 ...
# $ Educ : num 0.54 1.04 1.29 0.43 0.76 0.63 0.6 0.44 0.48 1.13 ...
# $ uniqueID : int 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, ".internal.selfref")=<externalptr>

This last one will be technically faster with any sized data, but it might be noticeable for larger (columns and/or rows) data.

Convert multiple columns to factor and give them numerical values

We can use mutate with across

df <- df %>% 
mutate(across(contains('growth'), ~ ordered(.,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100'))))

Or with lapply in base R

nm1 <- grep('growth', names(df), value = TRUE)
df[nm1] <- lapply(df[nm1], function(x) ordered(x,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100')))

Or this can be also done with ftransform (ftransformv - for multiple columns) from collapse

library(collapse)
f1 <- function(x) {
ordered(x, levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100'))
}

i1 <- grep('growth', names(df))
ftransformv(df, i1, f1)

-output

#   ABC_growth ZFG_growth
#1 40 <NA>
#2 40 <NA>
#3 40 <NA>
#4 40 <NA>
#5 40 <NA>
#6 12 12
#7 12 12
#8 12 12
#9 12 12
#10 12 12
#11 0 2.5
#12 0 2.5
#13 0 2.5
#14 0 2.5
#15 0 2.5

Converting DF columns to factor is less than straightforward

To change multiple columns to factor, use:

DF[,1:3] <- lapply(DF[,1:3], factor)

To change from factor to numeric, remember to use as.numeric(as.character(x)), like this:

DF[,1:3] <- lapply(DF[,1:3], function(x) as.numeric(as.character(x)))

Convert data.frame columns from factors to characters

Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:

bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)

This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.

As @hadley points out, the following is more concise.

bob[] <- lapply(bob, as.character)

In both cases, lapply outputs a list; however, owing to the magical properties of R, the use of [] in the second case keeps the data.frame class of the bob object, thereby eliminating the need to convert back to a data.frame using as.data.frame with the argument stringsAsFactors = FALSE.



Related Topics



Leave a reply



Submit