Determine the data types of a data frame's columns
Your best bet to start is to use ?str()
. To explore some examples, let's make some data:
set.seed(3221) # this makes the example exactly reproducible
my.data <- data.frame(y=rnorm(5),
x1=c(1:5),
x2=c(TRUE, TRUE, FALSE, FALSE, FALSE),
X3=letters[1:5])
@Wilmer E Henao H's solution is very streamlined:
sapply(my.data, class)
y x1 x2 X3
"numeric" "integer" "logical" "factor"
Using str()
gets you that information plus extra goodies (such as the levels of your factors and the first few values of each variable):
str(my.data)
'data.frame': 5 obs. of 4 variables:
$ y : num 1.03 1.599 -0.818 0.872 -2.682
$ x1: int 1 2 3 4 5
$ x2: logi TRUE TRUE FALSE FALSE FALSE
$ X3: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
@Gavin Simpson's approach is also streamlined, but provides slightly different information than class()
:
sapply(my.data, typeof)
y x1 x2 X3
"double" "integer" "logical" "integer"
For more information about class
, typeof
, and the middle child, mode
, see this excellent SO thread: A comprehensive survey of the types of things in R. 'mode' and 'class' and 'typeof' are insufficient.
pandas how to check dtype for all columns in a dataframe?
The singular form dtype
is used to check the data type for a single column. And the plural form dtypes
is for data frame which returns data types for all columns. Essentially:
For a single column:
dataframe.column.dtype
For all columns:
dataframe.dtypes
Example:
import pandas as pd
df = pd.DataFrame({'A': [1,2,3], 'B': [True, False, False], 'C': ['a', 'b', 'c']})
df.A.dtype
# dtype('int64')
df.B.dtype
# dtype('bool')
df.C.dtype
# dtype('O')
df.dtypes
#A int64
#B bool
#C object
#dtype: object
Pandas: Setting no. of max rows
Set display.max_rows
:
pd.set_option('display.max_rows', 500)
For older versions of pandas (<=0.11.0) you need to change both display.height
and display.max_rows
.
pd.set_option('display.height', 500)
pd.set_option('display.max_rows', 500)
See also pd.describe_option('display')
.
You can set an option only temporarily for this one time like this:
from IPython.display import display
with pd.option_context('display.max_rows', 100, 'display.max_columns', 10):
display(df) #need display to show the dataframe when using with in jupyter
#some pandas stuff
You can also reset an option back to its default value like this:
pd.reset_option('display.max_rows')
And reset all of them back:
pd.reset_option('all')
How do I get the classes of all columns in a data frame?
One option is to use lapply
and class
. For example:
> foo <- data.frame(c("a", "b"), c(1, 2))
> names(foo) <- c("SomeFactor", "SomeNumeric")
> lapply(foo, class)
$SomeFactor
[1] "factor"
$SomeNumeric
[1] "numeric"
Another option is str
:
> str(foo)
'data.frame': 2 obs. of 2 variables:
$ SomeFactor : Factor w/ 2 levels "a","b": 1 2
$ SomeNumeric: num 1 2
Determining Pandas Column DataType
This is only a partial answer, but you can get frequency counts of the data type of the elements in a variable over the entire DataFrame as follows:
dtypeCount =[df.iloc[:,i].apply(type).value_counts() for i in range(df.shape[1])]
This returns
dtypeCount
[<class 'numpy.int32'> 4
Name: a, dtype: int64,
<class 'int'> 2
<class 'str'> 2
Name: b, dtype: int64,
<class 'numpy.int32'> 4
Name: c, dtype: int64]
It doesn't print this nicely, but you can pull out information for any variable by location:
dtypeCount[1]
<class 'int'> 2
<class 'str'> 2
Name: b, dtype: int64
which should get you started in finding what data types are causing the issue and how many of them there are.
You can then inspect the rows that have a str object in the second variable using
df[df.iloc[:,1].map(lambda x: type(x) == str)]
a b c
1 1 n 4
3 3 g 6
data
df = DataFrame({'a': range(4),
'b': [6, 'n', 7, 'g'],
'c': range(3, 7)})
Get list of pandas dataframe columns based on data type
If you want a list of columns of a certain type, you can use groupby
:
>>> df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>> df
A B C D E
0 1 2.3456 c d 78
[1 rows x 5 columns]
>>> df.dtypes
A int64
B float64
C object
D object
E int64
dtype: object
>>> g = df.columns.to_series().groupby(df.dtypes).groups
>>> g
{dtype('int64'): ['A', 'E'], dtype('float64'): ['B'], dtype('O'): ['C', 'D']}
>>> {k.name: v for k, v in g.items()}
{'object': ['C', 'D'], 'int64': ['A', 'E'], 'float64': ['B']}
Related Topics
Convert Data.Frame Column to a Vector
Get Values and Positions to Label a Ggplot Histogram
Saving Grid.Arrange() Plot to File
Create Categories by Comparing a Numeric Column with a Fixed Value
How to Deal with "'Somefunction' Is Not an Exported Object from 'Namespace:Somepackage'" Error
Avoid Ggplot Sorting the X-Axis While Plotting Geom_Bar()
Combining Bar and Line Chart (Double Axis) in Ggplot2
Sparklyr: How to Center a Spark Table Based on Column
How to Drop Columns by Name Pattern in R
Creating a New Variable from a Lookup Table
Add New Row to Dataframe, at Specific Row-Index, Not Appended
Twitter, Roauth and Windows: Register Ok, But Certificate Verify Failed
How to Change the Formatting of Numbers on an Axis with Ggplot
Perform a Semi-Join with Data.Table
Pasting Elements of Two Vectors Alphabetically
How to Check Whether a Function Call Results in a Warning