How to Get the Classes of All Columns in a Data Frame

How do I get the classes of all columns in a data frame?

One option is to use lapply and class. For example:

> foo <- data.frame(c("a", "b"), c(1, 2))
> names(foo) <- c("SomeFactor", "SomeNumeric")
> lapply(foo, class)
$SomeFactor
[1] "factor"

$SomeNumeric
[1] "numeric"

Another option is str:

> str(foo)
'data.frame': 2 obs. of 2 variables:
$ SomeFactor : Factor w/ 2 levels "a","b": 1 2
$ SomeNumeric: num 1 2

Checking class of all columns in data.frame

You are looking for lapply(diamonds, class)

Also apply still worked , but the result is not right, it will return all type to character. look into the link

apply works on arrays/matrices, not data.frames.
when you using it in data.frame it will convert to matrix.

Getting the class of all columns of data.frames in a list

Credit to @det (see comments).

dfList <- lapply(dfList , function(x) lapply(x, class))

How to find out the classes of a data.frame

apply doesn't work for you because, as in the docs:

 If ‘X’ is not an array but an object of a class with a non-null
‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it
to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data
frame) or via ‘as.array’.

so your data frame becomes a matrix with the column classes set to the simplest possible class that can represent your columns - in this case a character matrix:

> as.matrix(Example)
Col1 Col2 Col3
[1,] " 2" "Hello" " TRUE"
[2,] " 5" "I am a" "FALSE"
[3,] "10" "Factor" " TRUE"

Use sapply

> sapply(Example,class)
Col1 Col2 Col3
"numeric" "factor" "logical"

Determine the data types of a data frame's columns

Your best bet to start is to use ?str(). To explore some examples, let's make some data:

set.seed(3221)  # this makes the example exactly reproducible
my.data <- data.frame(y=rnorm(5),
x1=c(1:5),
x2=c(TRUE, TRUE, FALSE, FALSE, FALSE),
X3=letters[1:5])

@Wilmer E Henao H's solution is very streamlined:

sapply(my.data, class)
y x1 x2 X3
"numeric" "integer" "logical" "factor"

Using str() gets you that information plus extra goodies (such as the levels of your factors and the first few values of each variable):

str(my.data)
'data.frame': 5 obs. of 4 variables:
$ y : num 1.03 1.599 -0.818 0.872 -2.682
$ x1: int 1 2 3 4 5
$ x2: logi TRUE TRUE FALSE FALSE FALSE
$ X3: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

@Gavin Simpson's approach is also streamlined, but provides slightly different information than class():

sapply(my.data, typeof)
y x1 x2 X3
"double" "integer" "logical" "integer"

For more information about class, typeof, and the middle child, mode, see this excellent SO thread: A comprehensive survey of the types of things in R. 'mode' and 'class' and 'typeof' are insufficient.

Set multiple column classes from a vector in data.table

Same idea as @RonakShah's answer but assuming the OP has explicitly named the columns rather than passing by position:

# different input format
cc <- setNames(col_classes, names(dtnew))

# usage
res = lapply(setNames(, names(cc)), function(n)
match.fun(sprintf("as.%s", cc[[n]]))(dtnew[[n]])
)
setDT(res)[]

Some other ways the problem might be solved:

  • If reading the data in, use the colClasses= argument to fread() or a similar function.

  • Maybe also consider type.convert which will automatically guess and apply a class to each column. It cannot return a mix of character and factor columns, however.

R: Classes of specific columns in list of dataframes

Here's how I would use lapply to find the class of column a in a list of 2 data frames, named x and y.

datalist <- list(x = data.frame(a = letters),
y = data.frame(a = 1:26))
lapply(datalist, function(x) class(x$a))

$x
[1] "factor"

$y
[1] "integer"

Changing Class of Column Across Multiple Dataframes

We can get the datasets loaded into a list with mget (assuming the dataset objects are already created in the global environment) and then loop over the list with map, change the class of 'Name' column in mutate and row bind with suffix _dfr in map

library(dplyr)
library(purrr)
out <- map_dfr(mget(dts), ~ .x %>%
mutate(Name = as.character(Name)))

If there are many columns that are different class. May be, it is better to convert to a single class for all the columns and then bind

out <- map_dfr(mget(dts), ~ .x %>%
mutate(across(everything(), as.character)))
out <- type.convert(out, as.is = TRUE)

If the dplyr version is < 1.0.0, use mutate_all

out <- map_dfr(mget(dts), ~ .x %>%
mutate_all(as.character))


Related Topics



Leave a reply



Submit