Efficient way to apply multiple filters to pandas DataFrame or Series
Pandas (and numpy) allow for boolean indexing, which will be much more efficient:
In [11]: df.loc[df['col1'] >= 1, 'col1']
Out[11]:
1 1
2 2
Name: col1
In [12]: df[df['col1'] >= 1]
Out[12]:
col1 col2
1 1 11
2 2 12
In [13]: df[(df['col1'] >= 1) & (df['col1'] <=1 )]
Out[13]:
col1 col2
1 1 11
If you want to write helper functions for this, consider something along these lines:
In [14]: def b(x, col, op, n):
return op(x[col],n)
In [15]: def f(x, *b):
return x[(np.logical_and(*b))]
In [16]: b1 = b(df, 'col1', ge, 1)
In [17]: b2 = b(df, 'col1', le, 1)
In [18]: f(df, b1, b2)
Out[18]:
col1 col2
1 1 11
Update: pandas 0.13 has a query method for these kind of use cases, assuming column names are valid identifiers the following works (and can be more efficient for large frames as it uses numexpr behind the scenes):
In [21]: df.query('col1 <= 1 & 1 <= col1')
Out[21]:
col1 col2
1 1 11
I want to apply multiple filters and change a column value accordingly in pandas [Working now]
The code is working! It had no error.
The value that I referred to here as 'a' was a mess in my real dataset that caused the problem.
pythonic way to apply multiple filters to a dataframe based on user input
Since the filter performs an and (&), it would make sense to do it like this:
import pandas as pd
def filter_data(df, filter_col, filter_val, filter_amount):
out = df.copy()
for i in range(filter_amount):
out = out[out[filter_col[i]] == filter_val[i]]
return out
def main():
x = pd.DataFrame({"Age": [12, 44, 23], "Ethnicity": ["White", "Black", "White"], "Height": [180, 182, 168]})
# Age Ethnicity Height
# 0 12 White 180
# 1 44 Black 182
# 2 23 White 168
y = filter_data(x, ["Ethnicity", "Height"], ["White", 180], 1)
# Age Ethnicity Height
# 0 12 White 180
# 2 23 White 168
z = filter_data(x, ["Ethnicity", "Height"], ["White", 180], 2)
# Age Ethnicity Height
# 0 12 White 180
Pandas - Create multiple filters and apply to dataframe
You can use np.logical_and.reduce:
filterlist = [filt1, filt2, filt3]
df[np.logical_and.reduce(filterlist)]
Or concat
with DataFrame.all
for test all True
s per rows:
df[pd.concat(filterlist, axis=1).all(axis=1)]
If possible use |
for regex or
:
filt = ~df["message"].str.contains("<Media omitted>|http://|Dropped pin", na=False)
Pandas: Filtering multiple conditions
Use ()
because operator precedence:
temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
Alternatively, create conditions on separate rows:
cond1 = df["bin"] == 3
cond2 = df["days since"] > 7
cond3 = ~df["Def"]
temp2 = df[cond1 & cond2 & cond3]
Sample:
df = pd.DataFrame({'Def':[True] *2 + [False]*4,
'days since':[7,8,9,14,2,13],
'bin':[1,3,5,3,3,3]})
print (df)
Def bin days since
0 True 1 7
1 True 3 8
2 False 5 9
3 False 3 14
4 False 3 2
5 False 3 13
temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
print (temp2)
Def bin days since
3 False 3 14
5 False 3 13
Pandas Data Frame Filtering Multiple Conditions
You could do:
mask = ~df[['year', 'month']].apply(tuple, 1).isin([(1990, 7), (1990, 8), (1991, 1)])
print(df[mask])
Output
year month data1
2 1990 9 2500
3 1990 9 1500
5 1991 2 350
6 1991 3 350
7 1991 7 450
how to filter several columns with LIKE and follow the sequence?
Use regex
instead of like
:
out = df.filter(regex=('^(NAME|EMAIL)'))
print(out)
# Output (sample)
NAME_1 EMAIL_1 NAME_2 EMAIL_2
0 5 9 5 9
1 8 2 3 9
2 8 8 1 5
3 6 7 9 5
4 6 6 4 3
Related Topics
How to Prevent Numbers Being Changed to Exponential Form in Python Matplotlib Figure
Combine Two Pandas Data Frames (Join on a Common Column)
How to Limit Memory Usage Within a Python Process
How to Install and Import Python Modules at Runtime
Negative Integer Division Surprising Result
How to Write to a CSV Line by Line
Pil Installation Fails Missing:Stdarg.H
How to Set Explicitly the Terminal Size When Using Pexpect
The Correct Cmakelists.Txt File to Call a Maxon Libarary in a Python Script Using Pybind11
Python Requests. 403 Forbidden
Python3.6 Importerror: Cannot Import Name 'Main' Linux Rhel6
Circular Shift of Vector (Equivalent to Numpy.Roll)
Add Text to Existing PDF Using Python
Pelican 3.3 Pelican-Quickstart Error "Valueerror: Unknown Locale: Utf-8"