Pandas: Filtering Multiple Conditions

Efficient way to apply multiple filters to pandas DataFrame or Series

Pandas (and numpy) allow for boolean indexing, which will be much more efficient:

In [11]: df.loc[df['col1'] >= 1, 'col1']
Out[11]:
1 1
2 2
Name: col1

In [12]: df[df['col1'] >= 1]
Out[12]:
col1 col2
1 1 11
2 2 12

In [13]: df[(df['col1'] >= 1) & (df['col1'] <=1 )]
Out[13]:
col1 col2
1 1 11

If you want to write helper functions for this, consider something along these lines:

In [14]: def b(x, col, op, n): 
return op(x[col],n)

In [15]: def f(x, *b):
return x[(np.logical_and(*b))]

In [16]: b1 = b(df, 'col1', ge, 1)

In [17]: b2 = b(df, 'col1', le, 1)

In [18]: f(df, b1, b2)
Out[18]:
col1 col2
1 1 11

Update: pandas 0.13 has a query method for these kind of use cases, assuming column names are valid identifiers the following works (and can be more efficient for large frames as it uses numexpr behind the scenes):

In [21]: df.query('col1 <= 1 & 1 <= col1')
Out[21]:
col1 col2
1 1 11

Pandas: Filtering multiple conditions

Use () because operator precedence:

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]

Alternatively, create conditions on separate rows:

cond1 = df["bin"] == 3    
cond2 = df["days since"] > 7
cond3 = ~df["Def"]

temp2 = df[cond1 & cond2 & cond3]

Sample:

df = pd.DataFrame({'Def':[True] *2 + [False]*4,
'days since':[7,8,9,14,2,13],
'bin':[1,3,5,3,3,3]})

print (df)
Def bin days since
0 True 1 7
1 True 3 8
2 False 5 9
3 False 3 14
4 False 3 2
5 False 3 13

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
print (temp2)
Def bin days since
3 False 3 14
5 False 3 13

Pandas Filtering Multiple Conditions on Single Column

I think using pd.to_datetime and pd.Series.between should work for you:

filtered_df = df[pd.to_datetime(df['Login Date'].str.split(' ').str[0], format="%Y/%m/%d").between(semester_start, semester_end)]

Filter pandas dataframe rows based on multiple conditions

Assuming df1 and df2 the two dataframes, you can inner merge:

df1.merge(df2,
left_on=['first.seqnames', 'first.start', 'first.end'],
right_on=['Chrom', 'Start', 'End'],
how='inner'
)[df1.columns]

output:

  first.seqnames  first.start  first.end                                        first.width first.strand second.seqnames  second.start  second.end                                       second.width second.strand
0 chr1 10590184 10590618 GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC... * chr1 10730773 10731207 GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC... *
1 chr1 10590958 10591541 CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA... * chr1 10731548 10732131 CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA... *
2 chr1 10597414 10597918 ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC... * chr1 10738018 10738522 ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC... *

python pandas: filter out rows with multiple conditions

IIUC replace all empty strings or spaces to missing values first:

#removed ' ', by default plitting by arbitrary space
df = df['test_column'].str.split(expand = True)
#starting columns by 1
df.columns += 1

df = df.replace(r'^\s*$', np.nan, regex=True)

a = 122 #just a column name
df = df[df[a].isna()]
print (df)

Pandas - Filter based on multiple conditions

You need match not equal dff.Default != 1 with bitwise OR by |:

df = dff[(dff.Time != 'November') | (dff.Default != 1) ]

Or invert mask, but change | to & for bitwise AND and change != to ==:

df = dff[~((dff.Time == 'November') & (dff.Default == 1)) ]

How to filter with multiple conditions in a pandas dataframe that include unwanted entries?

Get your parenthesis right:

>>> df[(df.Edition == 2004) & 
((df.Discipline == "Athletics") |
(df.Discipline == "Aquatics"))]


Related Topics



Leave a reply



Submit