How to filter rows containing a string pattern from a Pandas dataframe
In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
ids vals
0 aball 1
1 bball 2
3 fball 4
How to filter Pandas Dataframe rows which contains any string from a list?
Setup
df = pd.DataFrame(dict(
A=['I need avocado', 'something', 'useless', 'nothing'],
B=['something', 'I eat margarina', 'eat apple', 'more nothing']
))
includeKeywords = ["apple", "avocado", "bannana"]
Problem
A B
0 I need avocado something # True 'avocado' in A
1 something I eat margarina
2 useless eat apple # True 'apple' in B
3 nothing more nothing
Solution
pandas.DataFrame.stack
to makedf
aSeries
and enable us to use thepandas.Series.str
accessor functionspandas.Series.str.contains
with'|'.join(includeKeywords)
pandas.Series.any
with argumentlevel=0
because we added a level to the index when we stacked
df[df.stack().str.contains('|'.join(includeKeywords)).any(level=0)]
A B
0 I need avocado something
2 useless eat apple
Details
This produces a regex
search string. In regex
, '|'
means or
. So for a regex
search, this says match 'apple'
, 'avocado'
, or 'bannana'
kwstr = '|'.join(includeKeywords)
print(kwstr)
apple|avocado|bannana
Stacking will flatten our DataFrame
df.stack()
0 A I need avocado
B something
1 A something
B I eat margarina
2 A useless
B eat apple
3 A nothing
B more nothing
dtype: object
Fortunately, the pandas.Series.str.contains
method can handle regex
and it will produce a boolean Series
df.stack().str.contains(kwstr)
0 A True
B False
1 A False
B False
2 A False
B True
3 A False
B False
dtype: bool
At which point we can cleverly use pandas.Series.any
by suggesting it only care about level=0
mask = df.stack().str.contains(kwstr).any(level=0)
mask
0 True
1 False
2 True
3 False
dtype: bool
By using level=0
we preserved the original index in the resulting Series
. This makes it perfect for filtering df
df[mask]
A B
0 I need avocado something
2 useless eat apple
How to filter rows containing specific string values with an AND operator
df[df['ids'].str.contains("ball")]
Would become:
df[df['ids'].str.contains("ball") & df['ids'].str.contains("field")]
If you are into neater code:
contains_balls = df['ids'].str.contains("ball")
contains_fields = df['ids'].str.contains("field")
filtered_df = df[contains_balls & contains_fields]
Filter dataframe rows containing a set of string in python
Use join
with |
for regex OR
with \b
for word boundary:
L = ['cat', 'dog']
pat = r'(\b{}\b)'.format('|'.join(L))
df[df["B"].str.contains(pat, case=False, na=False)]
Pandas filtering rows with regex pattern present in the row itself
After a bit of modification, here is the result:
df[df.apply(lambda row: re.compile(row['pattern']).match(row['data']) is not None, axis=1)]
How to drop/delete/filter rows in pandas dataframe based on string pattern condition?
You should make a list of characters that are the conditions for dropping rows:
list = ['<character>', '\|',....]
and then filter your df
by
df = df[~df['your column'].isin(['list'])]
Note the \|
for the pipe character.
Related Topics
Python Socket Receive Large Amount of Data
How to Limit Concurrency with Python Asyncio
How to Convert Integer Timestamp into a Datetime
How Are Glob.Glob()'s Return Values Ordered
Create a List with Initial Capacity in Python
How to Implement a Python for Range Loop Without an Iterator Variable
How Would I Access Variables from One Class to Another
Create a "With" Block on Several Context Managers
Windows Is Not Passing Command Line Arguments to Python Programs Executed from the Shell
Adding a Module (Specifically Pymorph) to Spyder (Python Ide)
How to Use Pip with Python 3.X Alongside Python 2.X
How to Set Environment Variables in Pycharm
Maximum Value for Long Integer
List VS Generator Comprehension Speed with Join Function