Efficient Way to Apply Multiple Filters to Pandas Dataframe or Series

Efficient way to apply multiple filters to pandas DataFrame or Series

Pandas (and numpy) allow for boolean indexing, which will be much more efficient:

In [11]: df.loc[df['col1'] >= 1, 'col1']
Out[11]:
1 1
2 2
Name: col1

In [12]: df[df['col1'] >= 1]
Out[12]:
col1 col2
1 1 11
2 2 12

In [13]: df[(df['col1'] >= 1) & (df['col1'] <=1 )]
Out[13]:
col1 col2
1 1 11

If you want to write helper functions for this, consider something along these lines:

In [14]: def b(x, col, op, n): 
return op(x[col],n)

In [15]: def f(x, *b):
return x[(np.logical_and(*b))]

In [16]: b1 = b(df, 'col1', ge, 1)

In [17]: b2 = b(df, 'col1', le, 1)

In [18]: f(df, b1, b2)
Out[18]:
col1 col2
1 1 11

Update: pandas 0.13 has a query method for these kind of use cases, assuming column names are valid identifiers the following works (and can be more efficient for large frames as it uses numexpr behind the scenes):

In [21]: df.query('col1 <= 1 & 1 <= col1')
Out[21]:
col1 col2
1 1 11

I want to apply multiple filters and change a column value accordingly in pandas [Working now]

The code is working! It had no error.

The value that I referred to here as 'a' was a mess in my real dataset that caused the problem.

pythonic way to apply multiple filters to a dataframe based on user input

Since the filter performs an and (&), it would make sense to do it like this:

import pandas as pd

def filter_data(df, filter_col, filter_val, filter_amount):
out = df.copy()
for i in range(filter_amount):
out = out[out[filter_col[i]] == filter_val[i]]
return out

def main():
x = pd.DataFrame({"Age": [12, 44, 23], "Ethnicity": ["White", "Black", "White"], "Height": [180, 182, 168]})
# Age Ethnicity Height
# 0 12 White 180
# 1 44 Black 182
# 2 23 White 168

y = filter_data(x, ["Ethnicity", "Height"], ["White", 180], 1)
# Age Ethnicity Height
# 0 12 White 180
# 2 23 White 168

z = filter_data(x, ["Ethnicity", "Height"], ["White", 180], 2)
# Age Ethnicity Height
# 0 12 White 180

Pandas - Create multiple filters and apply to dataframe

You can use np.logical_and.reduce:

filterlist = [filt1, filt2, filt3]

df[np.logical_and.reduce(filterlist)]

Or concat with DataFrame.all for test all Trues per rows:

df[pd.concat(filterlist, axis=1).all(axis=1)]

If possible use | for regex or:

filt = ~df["message"].str.contains("<Media omitted>|http://|Dropped pin", na=False)

Pandas: Filtering multiple conditions

Use () because operator precedence:

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]

Alternatively, create conditions on separate rows:

cond1 = df["bin"] == 3    
cond2 = df["days since"] > 7
cond3 = ~df["Def"]

temp2 = df[cond1 & cond2 & cond3]

Sample:

df = pd.DataFrame({'Def':[True] *2 + [False]*4,
'days since':[7,8,9,14,2,13],
'bin':[1,3,5,3,3,3]})

print (df)
Def bin days since
0 True 1 7
1 True 3 8
2 False 5 9
3 False 3 14
4 False 3 2
5 False 3 13

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
print (temp2)
Def bin days since
3 False 3 14
5 False 3 13

Pandas Data Frame Filtering Multiple Conditions

You could do:

mask = ~df[['year', 'month']].apply(tuple, 1).isin([(1990, 7), (1990, 8), (1991, 1)])
print(df[mask])

Output

   year  month  data1
2 1990 9 2500
3 1990 9 1500
5 1991 2 350
6 1991 3 350
7 1991 7 450

how to filter several columns with LIKE and follow the sequence?

Use regex instead of like:

out = df.filter(regex=('^(NAME|EMAIL)'))
print(out)

# Output (sample)
NAME_1 EMAIL_1 NAME_2 EMAIL_2
0 5 9 5 9
1 8 2 3 9
2 8 8 1 5
3 6 7 9 5
4 6 6 4 3


Related Topics



Leave a reply



Submit