Filtering Pandas Dataframes on Dates

Select DataFrame rows between two dates

There are two possible solutions:

  • Use a boolean mask, then use df.loc[mask]
  • Set the date column as a DatetimeIndex, then use df[start_date : end_date]

Using a boolean mask:

Ensure df['date'] is a Series with dtype datetime64[ns]:

df['date'] = pd.to_datetime(df['date'])  

Make a boolean mask. start_date and end_date can be datetime.datetimes,
np.datetime64s, pd.Timestamps, or even datetime strings:

#greater than the start date and smaller than the end date
mask = (df['date'] > start_date) & (df['date'] <= end_date)

Select the sub-DataFrame:

df.loc[mask]

or re-assign to df

df = df.loc[mask]

For example,

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
print(df.loc[mask])

yields

            0         1         2       date
153 0.208875 0.727656 0.037787 2000-06-02
154 0.750800 0.776498 0.237716 2000-06-03
155 0.812008 0.127338 0.397240 2000-06-04
156 0.639937 0.207359 0.533527 2000-06-05
157 0.416998 0.845658 0.872826 2000-06-06
158 0.440069 0.338690 0.847545 2000-06-07
159 0.202354 0.624833 0.740254 2000-06-08
160 0.465746 0.080888 0.155452 2000-06-09
161 0.858232 0.190321 0.432574 2000-06-10

Using a DatetimeIndex:

If you are going to do a lot of selections by date, it may be quicker to set the
date column as the index first. Then you can select rows by date using
df.loc[start_date:end_date].

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
df = df.set_index(['date'])
print(df.loc['2000-6-1':'2000-6-10'])

yields

                   0         1         2
date
2000-06-01 0.040457 0.326594 0.492136 # <- includes start_date
2000-06-02 0.279323 0.877446 0.464523
2000-06-03 0.328068 0.837669 0.608559
2000-06-04 0.107959 0.678297 0.517435
2000-06-05 0.131555 0.418380 0.025725
2000-06-06 0.999961 0.619517 0.206108
2000-06-07 0.129270 0.024533 0.154769
2000-06-08 0.441010 0.741781 0.470402
2000-06-09 0.682101 0.375660 0.009916
2000-06-10 0.754488 0.352293 0.339337

While Python list indexing, e.g. seq[start:end] includes start but not end, in contrast, Pandas df.loc[start_date : end_date] includes both end-points in the result if they are in the index. Neither start_date nor end_date has to be in the index however.


Also note that pd.read_csv has a parse_dates parameter which you could use to parse the date column as datetime64s. Thus, if you use parse_dates, you would not need to use df['date'] = pd.to_datetime(df['date']).

Filter pandas dataframe on dates and wrong format

Use boolean indexing with 2 masks:

# save date as datetime in series
date = pd.to_datetime(df['Date'], errors='coerce', dayfirst=True)
# is it NaT?
m1 = date.isna()
# is it in the last 6 months?
m2 = date.ge(pd.to_datetime('today')-pd.DateOffset(months=6))

# if any condition is True, keep the row
out = df[m1|m2]

output:

         Date
0 01/06/2022
1 03/07/2022
2 18/05/2022
4 WK28
5 WK30

intermediate masks:

         Date     m1     m2  m1|m2
0 01/06/2022 False True True
1 03/07/2022 False True True
2 18/05/2022 False True True
3 12/02/2021 False False False
4 WK28 True False True
5 WK30 True False True
6 15/09/2021 False False False
7 09/02/2021 False False False

How to filter a dataframe of dates by a particular month/day?

Map an anonymous function to calculate the month on to the series and compare it to 11 for nov.
That will give you a boolean mask. You can then use that mask to filter your dataframe.

nov_mask = df['Dates'].map(lambda x: x.month) == 11
df[nov_mask]

I don't think there is straight forward way to filter the way you want ignoring the year so try this.

nov_mar_series = pd.Series(pd.date_range("2013-11-15", "2014-03-15"))
#create timestamp without year
nov_mar_no_year = nov_mar_series.map(lambda x: x.strftime("%m-%d"))
#add a yearless timestamp to the dataframe
df["no_year"] = df['Date'].map(lambda x: x.strftime("%m-%d"))
no_year_mask = df['no_year'].isin(nov_mar_no_year)
df[no_year_mask]

How to filter python pandas dataframe column by date

Let's start from the way how you read your DataFrame:

df = pd.DataFrame(pd.read_csv("Dates.csv"))

Note that:

  • pd.read_csv already returns a DataFrame,
  • so there is no need to create another DataFrame from the first one.

A simpler approach is: df = pd.read_csv("Dates.csv").

But this is not all. If you have a column containing a date then convert it
to datetime type as early as when your read the DateFrame, so, assuming that
your file contains only Met By and Date columns (no index column),
the proper formula to read is:

df = pd.read_csv("Dates.csv", parse_dates=[1])

And now how to filter your DataFrame:

The first hint is not to use datetime module, as Pandas has its native
today and Timedelta functions.
As Date column is now of proper (datetime) type, you don't need any conversions.
Just use:

df[df.Date > pd.Timestamp.today() - pd.Timedelta('30D')]

If you have also future dates and want to filter them out, run:

df[df.Date > (pd.Timestamp.today() - pd.Timedelta('30D'))
and df.Date < pd.Timestamp.today()]

Filtering dataframe for previous week dates in Python

I found the solution to my problem. The values of column 'Date' in my dataframe were being compared with the entire columns of my function week_range(start) table which is not possible. I needed a scalar value to filter my dataframe.

The simplest way to write would be as follows-

df = df[(df['Date'] >= prev_week.Prev_week_start[0]) & (df['Date'] >= prev_week.Prev_week_end[0])][["Date","Actual Call Volume","Forecasted Call Volume"]]

I simply specified the index for Prev_week_start & Prev_week_end by adding index [0].

How to filter pandas dataframe based on date value with exact match

Use dt.date astype string then compare i.e

df[df['Date'].dt.date.astype(str) == '2017-03-20']

Output:


StaffID Date
0 90047 2017-03-20 19:00:00
1 90049 2017-03-20 19:00:00

How do I filter by a certain date and hour using Pandas dataframe in python

You need .dt accessor with () for second and third condition:

newData = data[(data.Datetime.dt.day == data.Datetime.dt.day.max()) & 
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]

For converting to days only once:

s = data.Datetime.dt.day
newData = data[(s == s.max()) &
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]


Related Topics



Leave a reply



Submit