Efficient way to apply multiple filters to pandas DataFrame or Series
Pandas (and numpy) allow for boolean indexing, which will be much more efficient:
In [11]: df.loc[df['col1'] >= 1, 'col1']
Out[11]:
1 1
2 2
Name: col1
In [12]: df[df['col1'] >= 1]
Out[12]:
col1 col2
1 1 11
2 2 12
In [13]: df[(df['col1'] >= 1) & (df['col1'] <=1 )]
Out[13]:
col1 col2
1 1 11
If you want to write helper functions for this, consider something along these lines:
In [14]: def b(x, col, op, n):
return op(x[col],n)
In [15]: def f(x, *b):
return x[(np.logical_and(*b))]
In [16]: b1 = b(df, 'col1', ge, 1)
In [17]: b2 = b(df, 'col1', le, 1)
In [18]: f(df, b1, b2)
Out[18]:
col1 col2
1 1 11
Update: pandas 0.13 has a query method for these kind of use cases, assuming column names are valid identifiers the following works (and can be more efficient for large frames as it uses numexpr behind the scenes):
In [21]: df.query('col1 <= 1 & 1 <= col1')
Out[21]:
col1 col2
1 1 11
Pandas: Filtering multiple conditions
Use ()
because operator precedence:
temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
Alternatively, create conditions on separate rows:
cond1 = df["bin"] == 3
cond2 = df["days since"] > 7
cond3 = ~df["Def"]
temp2 = df[cond1 & cond2 & cond3]
Sample:
df = pd.DataFrame({'Def':[True] *2 + [False]*4,
'days since':[7,8,9,14,2,13],
'bin':[1,3,5,3,3,3]})
print (df)
Def bin days since
0 True 1 7
1 True 3 8
2 False 5 9
3 False 3 14
4 False 3 2
5 False 3 13
temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
print (temp2)
Def bin days since
3 False 3 14
5 False 3 13
Pandas Filtering Multiple Conditions on Single Column
I think using pd.to_datetime
and pd.Series.between
should work for you:
filtered_df = df[pd.to_datetime(df['Login Date'].str.split(' ').str[0], format="%Y/%m/%d").between(semester_start, semester_end)]
Filter pandas dataframe rows based on multiple conditions
Assuming df1
and df2
the two dataframes, you can inner merge
:
df1.merge(df2,
left_on=['first.seqnames', 'first.start', 'first.end'],
right_on=['Chrom', 'Start', 'End'],
how='inner'
)[df1.columns]
output:
first.seqnames first.start first.end first.width first.strand second.seqnames second.start second.end second.width second.strand
0 chr1 10590184 10590618 GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC... * chr1 10730773 10731207 GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC... *
1 chr1 10590958 10591541 CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA... * chr1 10731548 10732131 CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA... *
2 chr1 10597414 10597918 ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC... * chr1 10738018 10738522 ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC... *
python pandas: filter out rows with multiple conditions
IIUC replace all empty strings or spaces to missing values first:
#removed ' ', by default plitting by arbitrary space
df = df['test_column'].str.split(expand = True)
#starting columns by 1
df.columns += 1
df = df.replace(r'^\s*$', np.nan, regex=True)
a = 122 #just a column name
df = df[df[a].isna()]
print (df)
Pandas - Filter based on multiple conditions
You need match not equal dff.Default != 1
with bitwise OR
by |
:
df = dff[(dff.Time != 'November') | (dff.Default != 1) ]
Or invert mask, but change |
to &
for bitwise AND
and change !=
to ==
:
df = dff[~((dff.Time == 'November') & (dff.Default == 1)) ]
How to filter with multiple conditions in a pandas dataframe that include unwanted entries?
Get your parenthesis right:
>>> df[(df.Edition == 2004) &
((df.Discipline == "Athletics") |
(df.Discipline == "Aquatics"))]
Related Topics
Using Moviepy, Scipy and Numpy in Amazon Lambda
How to Prevent Errno 32 Broken Pipe
Error Installing Geopandas:" a Gdal API Version Must Be Specified " in Anaconda
Fill Username and Password Using Selenium in Python
Writing to Existing Workbook Using Xlwt
Group by & Count Function in SQLalchemy
Using Requests with Tls Doesn't Give Sni Support
How to Get the "Id" After Insert into MySQL Database with Python
How to List Pip Dependencies/Requirements
Python, Https Get with Basic Authentication
How to Update JSON File with Python
How to Create a Read-Only Class Property in Python
Why Does Python Pep-8 Strongly Recommend Spaces Over Tabs for Indentation
Why Is Looping Over Range() in Python Faster Than Using a While Loop
How to Check If a File Is a Valid Image File