Search for "Does-Not-Contain" on a Dataframe in Pandas

Search for does-not-contain on a DataFrame in pandas

You can use the invert (~) operator (which acts like a not for boolean data):

new_df = df[~df["col"].str.contains(word)]

where new_df is the copy returned by RHS.

contains also accepts a regular expression...


If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False:

new_df = df[~df["col"].str.contains(word, na=False)]

Or,

new_df = df[df["col"].str.contains(word) == False]

How to filter a pandas dataframe by cells that DO NOT contain a substring?

Just do frame[~frame['Media'].str.contains('Site')]

The ~ negates the boolean condition

So your method becomes:

def rbs(): #removes blocked sites
frame = fill_rate()
return frame[~frame['Media'].str.contains('Site')]

EDIT

it looks like you have NaN values judging by your errors so you have to filter these out first so your method becomes:

def rbs(): #removes blocked sites
frame = fill_rate()
frame = frame[frame['Media'].notnull()]
return frame[~frame['Media'].str.contains('Site')]

the notnull will filter out the missing values

Python Pandas: String Contains and Doesn't Contain

You're almost there, you just haven't got the syntax quite right, it should be:

df[(df.str.contains("b") == True) & (df.str.contains("a") == False)]

Another approach which might be cleaner if you have a lot of conditions to apply would to be to chain your filters together with reduce or a loop:

from functools import reduce
filters = [("a", False), ("b", True)]
reduce(lambda df, f: df[df.str.contains(f[0]) == f[1]], filters, df)
#outputs b2

Filter dataframe for words which do not contain any of the letters in a list

Use Series.str.contains with joined letters with | for regex or for filter by matched values and add ~ for filter by inverse mask, so get not matched rows:

df = df[~df['WORD'].str.contains('|'.join(letter_list))]
print (df)
ID WORD
2 3 'green'
3 4 'blue'

Select rows in Pandas which does not contain a specific character

You want df['string_column'].str.contains('c')

>>> df
str_name
0 aaabaa
1 aabbcb
2 baabba
3 aacbba
4 baccaa
5 ababaa
>>> df['str_name'].str.contains('c')
0 False
1 True
2 False
3 True
4 True
5 False
Name: str_name, dtype: bool

Now, you can "delete" like this

>>> df = df[~df['str_name'].str.contains('c')]
>>> df
str_name
0 aaabaa
2 baabba
5 ababaa
>>>

Edited to add:

If you only want to check the first k characters, you can slice. Suppose k=3:

>>> df.str_name.str.slice(0,3)
0 aaa
1 aab
2 baa
3 aac
4 bac
5 aba
Name: str_name, dtype: object
>>> df.str_name.str.slice(0,3).str.contains('c')
0 False
1 False
2 False
3 True
4 True
5 False
Name: str_name, dtype: bool

Note, Series.str.slice does not behave like a typical Python slice.

python: if substring not part of string in a pandas df.column

You may use:

interim_2_df[~ interim_2_df.Stelle.str.contains('Vgl')]

Output:

    Kuerzel AT/NT   Stelle  Zitat   
5 2Cor_5,19 nt 2 Kor 5,19. was warhafftig inn Christo und versönet die we...

Search for does-not-contain on a DataFrame in pandas

You can use the invert (~) operator (which acts like a not for boolean data):

new_df = df[~df["col"].str.contains(word)]

where new_df is the copy returned by RHS.

contains also accepts a regular expression...


If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False:

new_df = df[~df["col"].str.contains(word, na=False)]

Or,

new_df = df[df["col"].str.contains(word) == False]

Filter all rows that do not contain letters (alpha) in ´pandas´

I think you'd need str.contains to filter values which contain letters by the means of boolean indexing:

df =  df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
Col A.
2 dog
3 cat 583
4 rabbit 444

If there are some NaNs values you can pass a parameter:

df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]    
print (df)
Col A.
3 dog
4 cat 583
5 rabbit 444

str.contains function AND does not contain

You can use the bitwise operators & (and) and ~ (not) in combination.
The syntax looks like df.loc[(condition_A) & (~condition_B)]

An example relevant for your question would be:

df = df_merged

selected = df.loc[
(df[rule_col].astype(str).str.contains('Applicant Age', na=False)) &
(~df[rule_col].astype(str).str.contains('200', na=False))
]


Related Topics



Leave a reply



Submit