Select DataFrame rows between two dates
There are two possible solutions:
- Use a boolean mask, then use
df.loc[mask]
- Set the date column as a DatetimeIndex, then use
df[start_date : end_date]
Using a boolean mask:
Ensure df['date']
is a Series with dtype datetime64[ns]
:
df['date'] = pd.to_datetime(df['date'])
Make a boolean mask. start_date
and end_date
can be datetime.datetime
s,np.datetime64
s, pd.Timestamp
s, or even datetime strings:
#greater than the start date and smaller than the end date
mask = (df['date'] > start_date) & (df['date'] <= end_date)
Select the sub-DataFrame:
df.loc[mask]
or re-assign to df
df = df.loc[mask]
For example,
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
print(df.loc[mask])
yields
0 1 2 date
153 0.208875 0.727656 0.037787 2000-06-02
154 0.750800 0.776498 0.237716 2000-06-03
155 0.812008 0.127338 0.397240 2000-06-04
156 0.639937 0.207359 0.533527 2000-06-05
157 0.416998 0.845658 0.872826 2000-06-06
158 0.440069 0.338690 0.847545 2000-06-07
159 0.202354 0.624833 0.740254 2000-06-08
160 0.465746 0.080888 0.155452 2000-06-09
161 0.858232 0.190321 0.432574 2000-06-10
Using a DatetimeIndex:
If you are going to do a lot of selections by date, it may be quicker to set thedate
column as the index first. Then you can select rows by date usingdf.loc[start_date:end_date]
.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
df = df.set_index(['date'])
print(df.loc['2000-6-1':'2000-6-10'])
yields
0 1 2
date
2000-06-01 0.040457 0.326594 0.492136 # <- includes start_date
2000-06-02 0.279323 0.877446 0.464523
2000-06-03 0.328068 0.837669 0.608559
2000-06-04 0.107959 0.678297 0.517435
2000-06-05 0.131555 0.418380 0.025725
2000-06-06 0.999961 0.619517 0.206108
2000-06-07 0.129270 0.024533 0.154769
2000-06-08 0.441010 0.741781 0.470402
2000-06-09 0.682101 0.375660 0.009916
2000-06-10 0.754488 0.352293 0.339337
While Python list indexing, e.g. seq[start:end]
includes start
but not end
, in contrast, Pandas df.loc[start_date : end_date]
includes both end-points in the result if they are in the index. Neither start_date
nor end_date
has to be in the index however.
Also note that pd.read_csv
has a parse_dates
parameter which you could use to parse the date
column as datetime64
s. Thus, if you use parse_dates
, you would not need to use df['date'] = pd.to_datetime(df['date'])
.
Filter pandas dataframe on dates and wrong format
Use boolean indexing with 2 masks:
# save date as datetime in series
date = pd.to_datetime(df['Date'], errors='coerce', dayfirst=True)
# is it NaT?
m1 = date.isna()
# is it in the last 6 months?
m2 = date.ge(pd.to_datetime('today')-pd.DateOffset(months=6))
# if any condition is True, keep the row
out = df[m1|m2]
output:
Date
0 01/06/2022
1 03/07/2022
2 18/05/2022
4 WK28
5 WK30
intermediate masks:
Date m1 m2 m1|m2
0 01/06/2022 False True True
1 03/07/2022 False True True
2 18/05/2022 False True True
3 12/02/2021 False False False
4 WK28 True False True
5 WK30 True False True
6 15/09/2021 False False False
7 09/02/2021 False False False
How to filter a dataframe of dates by a particular month/day?
Map an anonymous function to calculate the month on to the series and compare it to 11 for nov.
That will give you a boolean mask. You can then use that mask to filter your dataframe.
nov_mask = df['Dates'].map(lambda x: x.month) == 11
df[nov_mask]
I don't think there is straight forward way to filter the way you want ignoring the year so try this.
nov_mar_series = pd.Series(pd.date_range("2013-11-15", "2014-03-15"))
#create timestamp without year
nov_mar_no_year = nov_mar_series.map(lambda x: x.strftime("%m-%d"))
#add a yearless timestamp to the dataframe
df["no_year"] = df['Date'].map(lambda x: x.strftime("%m-%d"))
no_year_mask = df['no_year'].isin(nov_mar_no_year)
df[no_year_mask]
How to filter python pandas dataframe column by date
Let's start from the way how you read your DataFrame:
df = pd.DataFrame(pd.read_csv("Dates.csv"))
Note that:
pd.read_csv
already returns a DataFrame,- so there is no need to create another DataFrame from the first one.
A simpler approach is: df = pd.read_csv("Dates.csv")
.
But this is not all. If you have a column containing a date then convert it
to datetime type as early as when your read the DateFrame, so, assuming that
your file contains only Met By and Date columns (no index column),
the proper formula to read is:
df = pd.read_csv("Dates.csv", parse_dates=[1])
And now how to filter your DataFrame:
The first hint is not to use datetime module, as Pandas has its native
today and Timedelta functions.
As Date column is now of proper (datetime) type, you don't need any conversions.
Just use:
df[df.Date > pd.Timestamp.today() - pd.Timedelta('30D')]
If you have also future dates and want to filter them out, run:
df[df.Date > (pd.Timestamp.today() - pd.Timedelta('30D'))
and df.Date < pd.Timestamp.today()]
Filtering dataframe for previous week dates in Python
I found the solution to my problem. The values of column 'Date' in my dataframe were being compared with the entire columns of my function week_range(start) table which is not possible. I needed a scalar value to filter my dataframe.
The simplest way to write would be as follows-
df = df[(df['Date'] >= prev_week.Prev_week_start[0]) & (df['Date'] >= prev_week.Prev_week_end[0])][["Date","Actual Call Volume","Forecasted Call Volume"]]
I simply specified the index for Prev_week_start & Prev_week_end by adding index [0].
How to filter pandas dataframe based on date value with exact match
Use dt.date astype string then compare i.e
df[df['Date'].dt.date.astype(str) == '2017-03-20']
Output:
StaffID Date
0 90047 2017-03-20 19:00:00
1 90049 2017-03-20 19:00:00
How do I filter by a certain date and hour using Pandas dataframe in python
You need .dt
accessor with ()
for second and third condition:
newData = data[(data.Datetime.dt.day == data.Datetime.dt.day.max()) &
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]
For converting to days only once:
s = data.Datetime.dt.day
newData = data[(s == s.max()) &
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]
Related Topics
Storing Value from a Parsed Ping
How to Cross Compile Python Interpreter for Windows Under Linux
Crontab Failed to Run Python Script at Reboot
Magicexception:File 5.41 Supports Only Version 16 Magic File, Magic.Mgc Is Version 14
Why Can't Python Sockets Resolve Url's with Http in It
Pandas Get Topmost N Records Within Each Group
Python For-In Loop Preceded by a Variable
How to Print a Variable Name in Python
What Are the Differences Between the Urllib, Urllib2, Urllib3 and Requests Module
How to Emulate a Do-While Loop
Docker.Errors.Dockerexception: Error While Fetching Server API Version
Bring the Current Python Program to Background
Unix Socket Credential Passing in Python
To Read Line from File Without Getting "\N" Appended at the End
What Does "The Following Packages Will Be Superseded by a Higher Priority Channel" Mean
How to Use Expect on Windows Without Installing Cygwin