Efficient Way to Apply Multiple Filters to Pandas Dataframe or Series

Efficient way to apply multiple filters to pandas DataFrame or Series

Pandas (and numpy) allow for boolean indexing, which will be much more efficient:

In [11]: df.loc[df['col1'] >= 1, 'col1']
Out[11]: 
1    1
2    2
Name: col1

In [12]: df[df['col1'] >= 1]
Out[12]: 
   col1  col2
1     1    11
2     2    12

In [13]: df[(df['col1'] >= 1) & (df['col1'] <=1 )]
Out[13]: 
   col1  col2
1     1    11

If you want to write helper functions for this, consider something along these lines:

In [14]: def b(x, col, op, n): 
             return op(x[col],n)

In [15]: def f(x, *b):
             return x[(np.logical_and(*b))]

In [16]: b1 = b(df, 'col1', ge, 1)

In [17]: b2 = b(df, 'col1', le, 1)

In [18]: f(df, b1, b2)
Out[18]: 
   col1  col2
1     1    11

Update: pandas 0.13 has a query method for these kind of use cases, assuming column names are valid identifiers the following works (and can be more efficient for large frames as it uses numexpr behind the scenes):

In [21]: df.query('col1 <= 1 & 1 <= col1')
Out[21]:
   col1  col2
1     1    11

I want to apply multiple filters and change a column value accordingly in pandas [Working now]

The code is working! It had no error.

The value that I referred to here as 'a' was a mess in my real dataset that caused the problem.

pythonic way to apply multiple filters to a dataframe based on user input

Since the filter performs an and (&), it would make sense to do it like this:

import pandas as pd

def filter_data(df, filter_col, filter_val, filter_amount):
    out = df.copy()
    for i in range(filter_amount):
        out = out[out[filter_col[i]] == filter_val[i]]
    return out

def main():
    x = pd.DataFrame({"Age": [12, 44, 23], "Ethnicity": ["White", "Black", "White"], "Height": [180, 182, 168]})
    #    Age Ethnicity  Height
    # 0   12     White     180
    # 1   44     Black     182
    # 2   23     White     168

    y = filter_data(x, ["Ethnicity", "Height"], ["White", 180], 1)
    #    Age Ethnicity  Height
    # 0   12     White     180
    # 2   23     White     168

    z = filter_data(x, ["Ethnicity", "Height"], ["White", 180], 2)
    #    Age Ethnicity  Height
    # 0   12     White     180

Pandas - Create multiple filters and apply to dataframe

You can use np.logical_and.reduce:

filterlist = [filt1, filt2, filt3]

df[np.logical_and.reduce(filterlist)]

Or concat with DataFrame.all for test all Trues per rows:

df[pd.concat(filterlist, axis=1).all(axis=1)]

If possible use | for regex or:

filt = ~df["message"].str.contains("<Media omitted>|http://|Dropped pin", na=False)

Pandas: Filtering multiple conditions

Use () because operator precedence:

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]

Alternatively, create conditions on separate rows:

cond1 = df["bin"] == 3    
cond2 = df["days since"] > 7
cond3 = ~df["Def"]

temp2 = df[cond1 & cond2 & cond3]

Sample:

df = pd.DataFrame({'Def':[True] *2 + [False]*4,
                   'days since':[7,8,9,14,2,13],
                   'bin':[1,3,5,3,3,3]})

print (df)
     Def  bin  days since
0   True    1           7
1   True    3           8
2  False    5           9
3  False    3          14
4  False    3           2
5  False    3          13

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
print (temp2)
     Def  bin  days since
3  False    3          14
5  False    3          13

Pandas Data Frame Filtering Multiple Conditions

You could do:

mask = ~df[['year', 'month']].apply(tuple, 1).isin([(1990, 7), (1990, 8), (1991, 1)])
print(df[mask])

Output

   year  month  data1
2  1990      9   2500
3  1990      9   1500
5  1991      2    350
6  1991      3    350
7  1991      7    450

how to filter several columns with LIKE and follow the sequence?

Use regex instead of like:

out = df.filter(regex=('^(NAME|EMAIL)'))
print(out)

# Output (sample)
   NAME_1  EMAIL_1  NAME_2  EMAIL_2
0       5        9       5        9
1       8        2       3        9
2       8        8       1        5
3       6        7       9        5
4       6        6       4        3

Efficient Way to Apply Multiple Filters to Pandas Dataframe or Series