Check If Value Is in Data Frame

check if variable is dataframe

Use isinstance, nothing else:

if isinstance(x, pd.DataFrame):
... # do something

PEP8 says explicitly that isinstance is the preferred way to check types

No:  type(x) is pd.DataFrame
No: type(x) == pd.DataFrame
Yes: isinstance(x, pd.DataFrame)

And don't even think about

if obj.__class__.__name__ = 'DataFrame':
expect_problems_some_day()

isinstance handles inheritance (see What are the differences between type() and isinstance()?). For example, it will tell you if a variable is a string (either str or unicode), because they derive from basestring)

if isinstance(obj, basestring):
i_am_string(obj)

Specifically for pandas DataFrame objects:

import pandas as pd
isinstance(var, pd.DataFrame)

Check if certain value is contained in a dataframe column in pandas

I think you need str.contains, if you need rows where values of column date contains string 07311954:

print df[df['date'].astype(str).str.contains('07311954')]

Or if type of date column is string:

print df[df['date'].str.contains('07311954')]

If you want check last 4 digits for string 1954 in column date:

print df[df['date'].astype(str).str[-4:].str.contains('1954')]

Sample:

print df['date']
0 8152007
1 9262007
2 7311954
3 2252011
4 2012011
5 2012011
6 2222011
7 2282011
Name: date, dtype: int64

print df['date'].astype(str).str[-4:].str.contains('1954')
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
Name: date, dtype: bool

print df[df['date'].astype(str).str[-4:].str.contains('1954')]
cmte_id trans_typ entity_typ state employer occupation date \
2 C00119040 24K CCM MD NaN NaN 7311954

amount fec_id cand_id
2 1000 C00140715 H2MD05155

how to check if a value exists in a dataframe

Use DataFrame.isin for check all columns and DataFrame.any for check at least one True per row:

m = df.isin(my_word).any()
print (m)
0 False
1 True
2 False
dtype: bool

And then get columns names by filtering:

cols = m.index[m].tolist()
print(cols)
[1]

Data:

print (df)
0 1 2
0 NaN good employee
1 Not available best employer
2 not required well manager
3 not eligible super reportee

Detail:

print (df.isin(my_word))
0 1 2
0 False False False
1 False False False
2 False True False
3 False False False

print (df.isin(my_word).any())
0 False
1 True
2 False
dtype: bool

EDIT After converting get nested lists, so flattening is necessary:

my_word=["well","manager"]

m = df.isin(my_word).any()
print (m)
0 False
1 True
2 True
dtype: bool

nested = df.loc[:,m].values.tolist()
flat_list = [item for sublist in nested for item in sublist]
print (flat_list)
['good', 'employee', 'best', 'employer', 'well', 'manager', 'super', 'reportee']

How to check if a value in the list exists in the dataframe?

Try str.extract:

lst = ['glock', 'siper']

df['D'] = df.apply(lambda x: x.str.extract(fr"\b({'|'.join(lst)})\b")
.bfill().iloc[0].fillna('unknown'), axis=1)
print(df)

# Output
A B C D
0 lfkdjs siper ldjkslkdjq siper
1 the glock hammer ldksqjflsdkj dljkfdslkfjs glock
2 lfdkslkdfjsdl dflskjfsdlkjf tipper unknown
3 fdlsjkfsldkjf dlfjksdflkdsjfs The glockmaster hammer unknown

Check if value from one dataframe exists in another dataframe

Use isin

Df1.name.isin(Df2.IDs).astype(int)

0 1
1 1
2 0
3 0
Name: name, dtype: int32

Show result in data frame

Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))

name InDf2
0 Marc 1
1 Jake 1
2 Sam 0
3 Brad 0

In a Series object

pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)

Marc 1
Jake 1
Sam 0
Brad 0
dtype: int32

Check if value is in Pandas dataframe column

You don't need a if loop. You can directly use Series.eq with any to check if any row has -1 for this column:

In [990]: df['PositionEMA25M50M'].eq(-1).any()
Out[990]: True

How to check if a pandas dataframe contains only numeric values column-wise?

You can check that using to_numeric and coercing errors:

pd.to_numeric(df['column'], errors='coerce').notnull().all()

For all columns, you can iterate through columns or just use apply

df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())

E.g.

df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'], 
'col2': ['a', 10, 30, 40 ,50],
'col3': [1,2,3,4,5.0]})

Outputs

col     False
col2 False
col3 True
dtype: bool

How to determine whether a Pandas Column contains a particular value

in of a Series checks whether the value is in the index:

In [11]: s = pd.Series(list('abc'))

In [12]: s
Out[12]:
0 a
1 b
2 c
dtype: object

In [13]: 1 in s
Out[13]: True

In [14]: 'a' in s
Out[14]: False

One option is to see if it's in unique values:

In [21]: s.unique()
Out[21]: array(['a', 'b', 'c'], dtype=object)

In [22]: 'a' in s.unique()
Out[22]: True

or a python set:

In [23]: set(s)
Out[23]: {'a', 'b', 'c'}

In [24]: 'a' in set(s)
Out[24]: True

As pointed out by @DSM, it may be more efficient (especially if you're just doing this for one value) to just use in directly on the values:

In [31]: s.values
Out[31]: array(['a', 'b', 'c'], dtype=object)

In [32]: 'a' in s.values
Out[32]: True

How to check if a value is in the list in selection from pandas data frame?

Use isin

df_new[df_new['l_ext'].isin([31, 22, 30, 25, 64])]


Related Topics



Leave a reply



Submit