Pandas: Filtering Multiple Conditions

Efficient way to apply multiple filters to pandas DataFrame or Series

Pandas (and numpy) allow for boolean indexing, which will be much more efficient:

In [11]: df.loc[df['col1'] >= 1, 'col1']
Out[11]: 
1    1
2    2
Name: col1

In [12]: df[df['col1'] >= 1]
Out[12]: 
   col1  col2
1     1    11
2     2    12

In [13]: df[(df['col1'] >= 1) & (df['col1'] <=1 )]
Out[13]: 
   col1  col2
1     1    11

If you want to write helper functions for this, consider something along these lines:

In [14]: def b(x, col, op, n): 
             return op(x[col],n)

In [15]: def f(x, *b):
             return x[(np.logical_and(*b))]

In [16]: b1 = b(df, 'col1', ge, 1)

In [17]: b2 = b(df, 'col1', le, 1)

In [18]: f(df, b1, b2)
Out[18]: 
   col1  col2
1     1    11

Update: pandas 0.13 has a query method for these kind of use cases, assuming column names are valid identifiers the following works (and can be more efficient for large frames as it uses numexpr behind the scenes):

In [21]: df.query('col1 <= 1 & 1 <= col1')
Out[21]:
   col1  col2
1     1    11

Pandas: Filtering multiple conditions

Use () because operator precedence:

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]

Alternatively, create conditions on separate rows:

cond1 = df["bin"] == 3    
cond2 = df["days since"] > 7
cond3 = ~df["Def"]

temp2 = df[cond1 & cond2 & cond3]

Sample:

df = pd.DataFrame({'Def':[True] *2 + [False]*4,
                   'days since':[7,8,9,14,2,13],
                   'bin':[1,3,5,3,3,3]})

print (df)
     Def  bin  days since
0   True    1           7
1   True    3           8
2  False    5           9
3  False    3          14
4  False    3           2
5  False    3          13

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
print (temp2)
     Def  bin  days since
3  False    3          14
5  False    3          13

Pandas Filtering Multiple Conditions on Single Column

I think using pd.to_datetime and pd.Series.between should work for you:

filtered_df = df[pd.to_datetime(df['Login Date'].str.split(' ').str[0], format="%Y/%m/%d").between(semester_start, semester_end)]

Filter pandas dataframe rows based on multiple conditions

Assuming df1 and df2 the two dataframes, you can inner merge:

df1.merge(df2,
          left_on=['first.seqnames', 'first.start', 'first.end'],
          right_on=['Chrom', 'Start', 'End'],
          how='inner'
         )[df1.columns]

output:

  first.seqnames  first.start  first.end                                        first.width first.strand second.seqnames  second.start  second.end                                       second.width second.strand
0           chr1     10590184   10590618  GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...            *            chr1      10730773    10731207  GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...             *
1           chr1     10590958   10591541  CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...            *            chr1      10731548    10732131  CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...             *
2           chr1     10597414   10597918  ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...            *            chr1      10738018    10738522  ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...             *

python pandas: filter out rows with multiple conditions

IIUC replace all empty strings or spaces to missing values first:

#removed ' ', by default plitting by arbitrary space
df = df['test_column'].str.split(expand = True)
#starting columns by 1
df.columns += 1

df = df.replace(r'^\s*$', np.nan, regex=True)

a = 122 #just a column name
df = df[df[a].isna()]
print (df)

Pandas - Filter based on multiple conditions

You need match not equal dff.Default != 1 with bitwise OR by |:

df = dff[(dff.Time != 'November') | (dff.Default != 1) ]

Or invert mask, but change | to & for bitwise AND and change != to ==:

df = dff[~((dff.Time == 'November') & (dff.Default == 1)) ]

How to filter with multiple conditions in a pandas dataframe that include unwanted entries?

Get your parenthesis right:

>>> df[(df.Edition == 2004) & 
       ((df.Discipline == "Athletics") | 
       (df.Discipline == "Aquatics"))]