How do I get the classes of all columns in a data frame?
One option is to use lapply
and class
. For example:
> foo <- data.frame(c("a", "b"), c(1, 2))
> names(foo) <- c("SomeFactor", "SomeNumeric")
> lapply(foo, class)
$SomeFactor
[1] "factor"
$SomeNumeric
[1] "numeric"
Another option is str
:
> str(foo)
'data.frame': 2 obs. of 2 variables:
$ SomeFactor : Factor w/ 2 levels "a","b": 1 2
$ SomeNumeric: num 1 2
Checking class of all columns in data.frame
You are looking for lapply(diamonds, class)
Also apply
still worked , but the result is not right, it will return all type to character
. look into the link
apply
works on arrays/matrices
, not data.frames
.
when you using it in data.frame
it will convert to matrix
.
Getting the class of all columns of data.frames in a list
Credit to @det (see comments).
dfList <- lapply(dfList , function(x) lapply(x, class))
How to find out the classes of a data.frame
apply
doesn't work for you because, as in the docs:
If ‘X’ is not an array but an object of a class with a non-null
‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it
to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data
frame) or via ‘as.array’.
so your data frame becomes a matrix with the column classes set to the simplest possible class that can represent your columns - in this case a character matrix:
> as.matrix(Example)
Col1 Col2 Col3
[1,] " 2" "Hello" " TRUE"
[2,] " 5" "I am a" "FALSE"
[3,] "10" "Factor" " TRUE"
Use sapply
> sapply(Example,class)
Col1 Col2 Col3
"numeric" "factor" "logical"
Determine the data types of a data frame's columns
Your best bet to start is to use ?str()
. To explore some examples, let's make some data:
set.seed(3221) # this makes the example exactly reproducible
my.data <- data.frame(y=rnorm(5),
x1=c(1:5),
x2=c(TRUE, TRUE, FALSE, FALSE, FALSE),
X3=letters[1:5])
@Wilmer E Henao H's solution is very streamlined:
sapply(my.data, class)
y x1 x2 X3
"numeric" "integer" "logical" "factor"
Using str()
gets you that information plus extra goodies (such as the levels of your factors and the first few values of each variable):
str(my.data)
'data.frame': 5 obs. of 4 variables:
$ y : num 1.03 1.599 -0.818 0.872 -2.682
$ x1: int 1 2 3 4 5
$ x2: logi TRUE TRUE FALSE FALSE FALSE
$ X3: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
@Gavin Simpson's approach is also streamlined, but provides slightly different information than class()
:
sapply(my.data, typeof)
y x1 x2 X3
"double" "integer" "logical" "integer"
For more information about class
, typeof
, and the middle child, mode
, see this excellent SO thread: A comprehensive survey of the types of things in R. 'mode' and 'class' and 'typeof' are insufficient.
Set multiple column classes from a vector in data.table
Same idea as @RonakShah's answer but assuming the OP has explicitly named the columns rather than passing by position:
# different input format
cc <- setNames(col_classes, names(dtnew))
# usage
res = lapply(setNames(, names(cc)), function(n)
match.fun(sprintf("as.%s", cc[[n]]))(dtnew[[n]])
)
setDT(res)[]
Some other ways the problem might be solved:
If reading the data in, use the
colClasses=
argument tofread()
or a similar function.Maybe also consider
type.convert
which will automatically guess and apply a class to each column. It cannot return a mix of character and factor columns, however.
R: Classes of specific columns in list of dataframes
Here's how I would use lapply
to find the class of column a
in a list of 2 data frames, named x
and y
.
datalist <- list(x = data.frame(a = letters),
y = data.frame(a = 1:26))
lapply(datalist, function(x) class(x$a))
$x
[1] "factor"
$y
[1] "integer"
Changing Class of Column Across Multiple Dataframes
We can get the datasets loaded into a list
with mget
(assuming the dataset objects are already created in the global environment) and then loop over the list
with map
, change the class
of 'Name' column in mutate
and row bind with suffix _dfr
in map
library(dplyr)
library(purrr)
out <- map_dfr(mget(dts), ~ .x %>%
mutate(Name = as.character(Name)))
If there are many columns that are different class
. May be, it is better to convert to a single class for all the columns and then bind
out <- map_dfr(mget(dts), ~ .x %>%
mutate(across(everything(), as.character)))
out <- type.convert(out, as.is = TRUE)
If the dplyr
version is < 1.0.0
, use mutate_all
out <- map_dfr(mget(dts), ~ .x %>%
mutate_all(as.character))
Related Topics
Why Would R Use the "L" Suffix to Denote an Integer
Calculate Cumulative Average (Mean)
Create Categorical Variable in R Based on Range
"Correct" Way to Specifiy Optional Arguments in R Functions
Convert Data.Frame Column to a Vector
Use Trycatch Skip to Next Value of Loop Upon Error
How to Use R with Google Colaboratory
Check for Installed Packages Before Running Install.Packages()
How to Use Objects from Global Environment in Rstudio Markdown
Remove Multiple Objects with Rm()
Creating a Prompt/Answer System to Input Data into R
How to Install Development Version of R Packages Github Repository
Display a Time Clock in the R Command Line
Angle Between Two Vectors in R