Search for does-not-contain on a DataFrame in pandas
You can use the invert (~) operator (which acts like a not for boolean data):
new_df = df[~df["col"].str.contains(word)]
where new_df
is the copy returned by RHS.
contains also accepts a regular expression...
If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False
:
new_df = df[~df["col"].str.contains(word, na=False)]
Or,
new_df = df[df["col"].str.contains(word) == False]
How to filter a pandas dataframe by cells that DO NOT contain a substring?
Just do frame[~frame['Media'].str.contains('Site')]
The ~
negates the boolean condition
So your method becomes:
def rbs(): #removes blocked sites
frame = fill_rate()
return frame[~frame['Media'].str.contains('Site')]
EDIT
it looks like you have NaN
values judging by your errors so you have to filter these out first so your method becomes:
def rbs(): #removes blocked sites
frame = fill_rate()
frame = frame[frame['Media'].notnull()]
return frame[~frame['Media'].str.contains('Site')]
the notnull
will filter out the missing values
Python Pandas: String Contains and Doesn't Contain
You're almost there, you just haven't got the syntax quite right, it should be:
df[(df.str.contains("b") == True) & (df.str.contains("a") == False)]
Another approach which might be cleaner if you have a lot of conditions to apply would to be to chain your filters together with reduce or a loop:
from functools import reduce
filters = [("a", False), ("b", True)]
reduce(lambda df, f: df[df.str.contains(f[0]) == f[1]], filters, df)
#outputs b2
Filter dataframe for words which do not contain any of the letters in a list
Use Series.str.contains
with joined letters with |
for regex or
for filter by matched values and add ~
for filter by inverse mask, so get not matched rows:
df = df[~df['WORD'].str.contains('|'.join(letter_list))]
print (df)
ID WORD
2 3 'green'
3 4 'blue'
Select rows in Pandas which does not contain a specific character
You want df['string_column'].str.contains('c')
>>> df
str_name
0 aaabaa
1 aabbcb
2 baabba
3 aacbba
4 baccaa
5 ababaa
>>> df['str_name'].str.contains('c')
0 False
1 True
2 False
3 True
4 True
5 False
Name: str_name, dtype: bool
Now, you can "delete" like this
>>> df = df[~df['str_name'].str.contains('c')]
>>> df
str_name
0 aaabaa
2 baabba
5 ababaa
>>>
Edited to add:
If you only want to check the first k
characters, you can slice
. Suppose k=3
:
>>> df.str_name.str.slice(0,3)
0 aaa
1 aab
2 baa
3 aac
4 bac
5 aba
Name: str_name, dtype: object
>>> df.str_name.str.slice(0,3).str.contains('c')
0 False
1 False
2 False
3 True
4 True
5 False
Name: str_name, dtype: bool
Note, Series.str.slice
does not behave like a typical Python slice.
python: if substring not part of string in a pandas df.column
You may use:
interim_2_df[~ interim_2_df.Stelle.str.contains('Vgl')]
Output:
Kuerzel AT/NT Stelle Zitat
5 2Cor_5,19 nt 2 Kor 5,19. was warhafftig inn Christo und versönet die we...
Search for does-not-contain on a DataFrame in pandas
You can use the invert (~) operator (which acts like a not for boolean data):
new_df = df[~df["col"].str.contains(word)]
where new_df
is the copy returned by RHS.
contains also accepts a regular expression...
If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False
:
new_df = df[~df["col"].str.contains(word, na=False)]
Or,
new_df = df[df["col"].str.contains(word) == False]
Filter all rows that do not contain letters (alpha) in ´pandas´
I think you'd need str.contains
to filter values which contain letters by the means of boolean indexing
:
df = df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
Col A.
2 dog
3 cat 583
4 rabbit 444
If there are some NaN
s values you can pass a parameter:
df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]
print (df)
Col A.
3 dog
4 cat 583
5 rabbit 444
str.contains function AND does not contain
You can use the bitwise operators &
(and) and ~
(not) in combination.
The syntax looks like df.loc[(condition_A) & (~condition_B)]
An example relevant for your question would be:
df = df_merged
selected = df.loc[
(df[rule_col].astype(str).str.contains('Applicant Age', na=False)) &
(~df[rule_col].astype(str).str.contains('200', na=False))
]
Related Topics
Python String.Strip Stripping Too Many Characters
Get Last "Column" After .Str.Split() Operation on Column in Pandas Dataframe
Calculation Error with Pow Operator
Can't Install New Packages for Python (Python 3.9.0, Windows 10)
What Do I Use for a Max-Heap Implementation in Python
Configuring Spark to Work with Jupyter Notebook and Anaconda
Shuffling/Permutating a Dataframe in Pandas
Inverse Distance Weighted (Idw) Interpolation with Python
Making an Asynchronous Task in Flask
Python: Call a Function from String Name
Best Way to Create a "Reversed" List in Python
Interpolate Nan Values in a Numpy Array
Select Pandas Rows Based on List Index
The Problem with Installing Pil Using Virtualenv or Buildout
Load CSV Data into MySQL in Python