Stating which columns are numerical values only and stating it in original data frame
num_cols = list(df2.select_dtypes(include=[np.number]).columns.values)
values = ["numerical" if c in num_cols else "" for c in df2.columns]
# values: ['numerical', '']
desired_result = pd.DataFrame(values).T
desired_result.columns = df2.columns
# desired_result:
# column1 column2
# 0 numerical
how to check the dtype of a column in python pandas
You can access the data-type of a column with dtype
:
for y in agg.columns:
if(agg[y].dtype == np.float64 or agg[y].dtype == np.int64):
treat_numeric(agg[y])
else:
treat_str(agg[y])
How to check if float pandas column contains only integer numbers?
Comparison with astype(int)
Tentatively convert your column to int
and test with np.array_equal
:
np.array_equal(df.v, df.v.astype(int))
True
float.is_integer
You can use this python function in conjunction with an apply
:
df.v.apply(float.is_integer).all()
True
Or, using python's all
in a generator comprehension, for space efficiency:
all(x.is_integer() for x in df.v)
True
How to check if a variable is either a python list, numpy array or pandas series
You can do it using isinstance
:
import pandas as pd
import numpy as np
def f(l):
if isinstance(l,(list,pd.core.series.Series,np.ndarray)):
print(5)
else:
raise Exception('wrong type')
Then f([1,2,3])
prints 5 while f(3.34)
raises an error.
Check if dataframe column is Categorical
Use the name
property to do the comparison instead, it should always work because it's just a string:
>>> import numpy as np
>>> arr = np.array([1, 2, 3, 4])
>>> arr.dtype.name
'int64'
>>> import pandas as pd
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat.dtype.name
'category'
So, to sum up, you can end up with a simple, straightforward function:
def is_categorical(array_like):
return array_like.dtype.name == 'category'
Is there an efficient method of checking whether a column has mixed dtypes?
Here is an approach that uses the fact that in Python3 different types cannot be compared. The idea is to run max
over the array which being a builtin should be reasonably fast. And it does short-cicuit.
def ismixed(a):
try:
max(a)
return False
except TypeError as e: # we take this to imply mixed type
msg, fst, and_, snd = str(e).rsplit(' ', 3)
assert msg=="'>' not supported between instances of"
assert and_=="and"
assert fst!=snd
return True
except ValueError as e: # catch empty arrays
assert str(e)=="max() arg is an empty sequence"
return False
It doesn't catch mixed numeric types, though. Also, objects that just do not support comparison may trip this up.
But it's reasonably fast. If we strip away all pandas
overhead:
v = df.values
list(map(is_mixed, v.T))
# [True, False, False]
timeit(lambda: list(map(ismixed, v.T)), number=1000)
# 0.008936170022934675
For comparison
timeit(lambda: list(map(infer_dtype, v.T)), number=1000)
# 0.02499613002873957
How to determine if a pandas column type can be reduced from int64 to int32 or from float64 to float32?
I have a dataframe which is huge(8 gb). I am trying to find if i will loose any information if i downsize the columns from int64 to int32 ...
The simplest way to cast integers to a smaller type and make sure that you are not losing information is to use
df['col'] = pd.to_numeric(df['col'], downcast='integer')
This will both do the conversion, and check that the conversion didn't lose data. You'll need to do that for each integer column in your dataframe.
... or from float64 to float32.
Casting a number to a smaller floating point number always loses some information, unless you are dealing with an exact binary fraction. In practice, you can use 32-bit float if you need around 7 digits or fewer of precision.
Related Topics
Saving Numpy Array to Txt File Row Wise
How to Iterate Over a Timespan After Days, Hours, Weeks and Months
Pandas.To_Sql Replace Old Data With New Data Based on 'Unique Id'
Visual Studio Code Pylint: Unable to Import 'Protorpc'
Convert SQL Result to List Python
Python - How to Extract Elements from an Array Based on an Array of Indices
How to Truncate the Time on a Datetime Object
Importing Zip File in Google Colaboratory Stored in Google Drive
How to Save Plotly Offline Graph in Format Png
Subtract a Value from Every Number in a List in Python
How to Run an .Ipynb Jupyter Notebook from Terminal
Formatting Datetimefield in Django
Python Turning 2 Dimensional Strings on My List into Floats
Plotting Data from Multiple Pandas Data Frames in One Plot
How to Find the Closest Values in a Pandas Series to an Input Number