check if variable is dataframe
Use isinstance
, nothing else:
if isinstance(x, pd.DataFrame):
... # do something
PEP8 says explicitly that isinstance
is the preferred way to check types
No: type(x) is pd.DataFrame
No: type(x) == pd.DataFrame
Yes: isinstance(x, pd.DataFrame)
And don't even think about
if obj.__class__.__name__ = 'DataFrame':
expect_problems_some_day()
isinstance
handles inheritance (see What are the differences between type() and isinstance()?). For example, it will tell you if a variable is a string (either str
or unicode
), because they derive from basestring
)
if isinstance(obj, basestring):
i_am_string(obj)
Specifically for pandas
DataFrame
objects:
import pandas as pd
isinstance(var, pd.DataFrame)
Check if certain value is contained in a dataframe column in pandas
I think you need str.contains
, if you need rows where values of column date
contains string 07311954
:
print df[df['date'].astype(str).str.contains('07311954')]
Or if type
of date
column is string
:
print df[df['date'].str.contains('07311954')]
If you want check last 4 digits for string
1954
in column date
:
print df[df['date'].astype(str).str[-4:].str.contains('1954')]
Sample:
print df['date']
0 8152007
1 9262007
2 7311954
3 2252011
4 2012011
5 2012011
6 2222011
7 2282011
Name: date, dtype: int64
print df['date'].astype(str).str[-4:].str.contains('1954')
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
Name: date, dtype: bool
print df[df['date'].astype(str).str[-4:].str.contains('1954')]
cmte_id trans_typ entity_typ state employer occupation date \
2 C00119040 24K CCM MD NaN NaN 7311954
amount fec_id cand_id
2 1000 C00140715 H2MD05155
how to check if a value exists in a dataframe
Use DataFrame.isin
for check all columns and DataFrame.any
for check at least one True
per row:
m = df.isin(my_word).any()
print (m)
0 False
1 True
2 False
dtype: bool
And then get columns names by filtering:
cols = m.index[m].tolist()
print(cols)
[1]
Data:
print (df)
0 1 2
0 NaN good employee
1 Not available best employer
2 not required well manager
3 not eligible super reportee
Detail:
print (df.isin(my_word))
0 1 2
0 False False False
1 False False False
2 False True False
3 False False False
print (df.isin(my_word).any())
0 False
1 True
2 False
dtype: bool
EDIT After converting get nested list
s, so flattening is necessary:
my_word=["well","manager"]
m = df.isin(my_word).any()
print (m)
0 False
1 True
2 True
dtype: bool
nested = df.loc[:,m].values.tolist()
flat_list = [item for sublist in nested for item in sublist]
print (flat_list)
['good', 'employee', 'best', 'employer', 'well', 'manager', 'super', 'reportee']
How to check if a value in the list exists in the dataframe?
Try str.extract
:
lst = ['glock', 'siper']
df['D'] = df.apply(lambda x: x.str.extract(fr"\b({'|'.join(lst)})\b")
.bfill().iloc[0].fillna('unknown'), axis=1)
print(df)
# Output
A B C D
0 lfkdjs siper ldjkslkdjq siper
1 the glock hammer ldksqjflsdkj dljkfdslkfjs glock
2 lfdkslkdfjsdl dflskjfsdlkjf tipper unknown
3 fdlsjkfsldkjf dlfjksdflkdsjfs The glockmaster hammer unknown
Check if value from one dataframe exists in another dataframe
Use isin
Df1.name.isin(Df2.IDs).astype(int)
0 1
1 1
2 0
3 0
Name: name, dtype: int32
Show result in data frame
Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))
name InDf2
0 Marc 1
1 Jake 1
2 Sam 0
3 Brad 0
In a Series object
pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)
Marc 1
Jake 1
Sam 0
Brad 0
dtype: int32
Check if value is in Pandas dataframe column
You don't need a if
loop. You can directly use Series.eq
with any
to check if any row has -1
for this column:
In [990]: df['PositionEMA25M50M'].eq(-1).any()
Out[990]: True
How to check if a pandas dataframe contains only numeric values column-wise?
You can check that using to_numeric
and coercing errors:
pd.to_numeric(df['column'], errors='coerce').notnull().all()
For all columns, you can iterate through columns or just use apply
df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())
E.g.
df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'],
'col2': ['a', 10, 30, 40 ,50],
'col3': [1,2,3,4,5.0]})
Outputs
col False
col2 False
col3 True
dtype: bool
How to determine whether a Pandas Column contains a particular value
in
of a Series checks whether the value is in the index:
In [11]: s = pd.Series(list('abc'))
In [12]: s
Out[12]:
0 a
1 b
2 c
dtype: object
In [13]: 1 in s
Out[13]: True
In [14]: 'a' in s
Out[14]: False
One option is to see if it's in unique values:
In [21]: s.unique()
Out[21]: array(['a', 'b', 'c'], dtype=object)
In [22]: 'a' in s.unique()
Out[22]: True
or a python set:
In [23]: set(s)
Out[23]: {'a', 'b', 'c'}
In [24]: 'a' in set(s)
Out[24]: True
As pointed out by @DSM, it may be more efficient (especially if you're just doing this for one value) to just use in directly on the values:
In [31]: s.values
Out[31]: array(['a', 'b', 'c'], dtype=object)
In [32]: 'a' in s.values
Out[32]: True
How to check if a value is in the list in selection from pandas data frame?
Use isin
df_new[df_new['l_ext'].isin([31, 22, 30, 25, 64])]
Related Topics
How to Select All Unique Combinations of Two Columns in an R Data Frame
R Lubridate Converting Seconds to Date
R Subsetting a Data Frame into Multiple Data Frames Based on Multiple Column Values
How to Calculate the Area of Polygon Overlap in R
In R, Match Function for Rows or Columns of Matrix
Ggplot2 Find Number of Counts in Histogram Maximum
S3 Method Consistency Warning When Building R Package with Roxygen
How to Remove All Rows from a Data.Frame
How to Check If a Vector Contains N Consecutive Numbers
How to Show Corpus Text in R Tm Package
Using Strsplit and Subset in Dplyr and Mutate
Faster Way to Find the First True Value in a Vector