Get Business Days Between Start and End Date Using Pandas

Get business days between start and end date using pandas

Use BDay() to get the business days in range.

from pandas.tseries.offsets import *

In [185]: s
Out[185]:
2011-01-01 -0.011629
2011-01-02 -0.089666
2011-01-03 -1.314430
2011-01-04 -1.867307
2011-01-05 0.779609
2011-01-06 0.588950
2011-01-07 -2.505803
2011-01-08 0.800262
2011-01-09 0.376406
2011-01-10 -0.469988
Freq: D

In [186]: s.asfreq(BDay())
Out[186]:
2011-01-03 -1.314430
2011-01-04 -1.867307
2011-01-05 0.779609
2011-01-06 0.588950
2011-01-07 -2.505803
2011-01-10 -0.469988
Freq: B

With slicing:

In [187]: x=datetime(2011, 1, 5)

In [188]: y=datetime(2011, 1, 9)

In [189]: s.ix[x:y]
Out[189]:
2011-01-05 0.779609
2011-01-06 0.588950
2011-01-07 -2.505803
2011-01-08 0.800262
2011-01-09 0.376406
Freq: D

In [190]: s.ix[x:y].asfreq(BDay())
Out[190]:
2011-01-05 0.779609
2011-01-06 0.588950
2011-01-07 -2.505803
Freq: B

and count()

In [191]: s.ix[x:y].asfreq(BDay()).count()
Out[191]: 3

How to calculate the quantity of business days between two dates using Pandas

pd.date_range's parameters need to be datetimes, not series.

For this reason, we can use df.apply to apply the function to each row.

In addition, pandas has bdate_range which is just date_range with freq defaulting to business days, which is exactly what you need.

Using apply and a lambda function, we can create a new Series calculating business days between each start and current date for each row.

projects_df['start_date'] = pd.to_datetime(projects_df['start_date'])
projects_df['current_date'] = pd.to_datetime(projects_df['current_date'])

projects_df['days_count'] = projects_df.apply(lambda row: len(pd.bdate_range(row['start_date'], row['current_date'])), axis=1)

Using a random sample of 10 date pairs, my output is the following:

           start_date        current_date  bdays
0 2022-01-03 17:08:04 2022-05-20 00:53:46 100
1 2022-04-18 09:43:02 2022-06-10 16:56:16 40
2 2022-09-01 12:02:34 2022-09-25 14:59:29 17
3 2022-04-02 14:24:12 2022-04-24 21:05:55 15
4 2022-01-31 02:15:46 2022-07-02 16:16:02 110
5 2022-08-02 22:05:15 2022-08-17 17:25:10 12
6 2022-03-06 05:30:20 2022-07-04 08:43:00 86
7 2022-01-15 17:01:33 2022-08-09 21:48:41 147
8 2022-06-04 14:47:53 2022-12-12 18:05:58 136
9 2022-02-16 11:52:03 2022-10-18 01:30:58 175

Add business days to pandas dataframe with dates and skip over holidays python

Input data

df = pd.DataFrame(['2021-02-09', '2021-02-10', '2021-06-28', '2021-06-29', '2021-07-02'], columns=['DATE'])
df['DATE'] = pd.to_datetime(df['DATE'])

Suggested solution using apply

from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay

def offset_date(start, offset):
return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USFederalHolidayCalendar())

offset = 5
df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)

DATE END
2021-02-09 2021-02-17
2021-02-10 2021-02-18
2021-06-28 2021-07-06
2021-06-29 2021-07-07
2021-07-02 2021-07-12

PS: If you want to use a particular calendar such as the NYSE, instead of the default USFederalHolidayCalendar, I recommend following the instructions on this answer, about creating a custom calendar.

Alternative solution which I do not recommend

Currently, to the best of my knowledge, pandas do not support a vectorized approach to your problem. But if you want to follow a similar approach to the one you mentioned, here is what you should do.

First, you will have to define an arbitrary far away end date that includes all the periods you might need and use it to create a list of holidays.

holidays = USFederalHolidayCalendar().holidays(start='2021-02-09', end='2030-02-09')

Then, you pass the holidays list to CustomBusinessDay through the holidays parameter instead of the calendar to generate the desired offset.

offset = 5
bday_us = pd.offsets.CustomBusinessDay(n=offset, holidays=holidays)
df['END'] = df['DATE'] + bday_us

However, this type of approach is not a true vectorized solution, even though it might seem like it. See the following SO answer for further clarification. Under the hood, this approach is probably doing a conversion that is not efficient. This why it yields the following warning.

PerformanceWarning: Non-vectorized DateOffset being applied to Series
or DatetimeIndex

How to find business days between dates with Pandas CDay?

You can simply pass your week mask and holidays from the CDay calendar directly to np.busday_count.

np.busday_count(start_date, end_date,
weekmask=bday_custom.weekmask, holidays=bday_custom.holidays)

Alternatively (but certainly slower), you can use pd.date_range, and pass your custom CDay calendar as the freq.

pd.date_range(datetime(2017, 3, 5), datetime(2017, 3, 12), freq=bday_cust).size

This has the unfortunate side-effect of creating an intermediary date range only to use its size.


Example

Let's set up a meaningless custom business day calendar.

from pandas.tseries.offsets import CustomBusinessDay
weekmask = 'Mon Wed Fri Sat'
holidays = [datetime(2017, 3, 6), datetime(2017, 3, 11)]

bday_cust = CustomBusinessDay(holidays=holidays, weekmask=weekmask)

Now we've set the Monday and the Saturday to be business days (and holidays) for the week of March 5th to 11th. Now looking at that particular date range, we can count the remaining business days (2)

>>> np.busday_count(datetime(2017, 3, 5), datetime(2017, 3, 12),
weekmask=bday_custom.weekmask,
holidays=bday_custom.holidays)
2
>>> pd.date_range(datetime(2017, 3, 5), datetime(2017, 3, 12), freq=bday_cust).size
2

Rough benchmark on example

%timeit np.busday_count(datetime(2017, 3, 5), datetime(2017, 3, 12),
weekmask=bday_custom.weekmask,
holidays=bday_custom.holidays)
100000 loops, best of 3: 17.2 us per loop

% timeit pd.date_range(datetime(2017, 3, 5), datetime(2017, 3, 12), freq=bday_cust).size
1000 loops, best of 3: 573 us per loop

Pandas Dataframe Calculate Num Business Days

Using busday_count from np

Ex:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Signin Date": ["2018-01-01", "2018-02-01"]})
df["Signin Date"] = pd.to_datetime(df["Signin Date"])
df['Signin Date Shifted'] = pd.DatetimeIndex(df['Signin Date']) + pd.DateOffset(months=1)

df["bussDays"] = np.busday_count( df["Signin Date"].values.astype('datetime64[D]'), df['Signin Date Shifted'].values.astype('datetime64[D]'))
print(df)

Output:

  Signin Date Signin Date Shifted  bussDays
0 2018-01-01 2018-02-01 23
1 2018-02-01 2018-03-01 20

MoreInfo

Difference between datetimes in terms of number of business days using pandas

I think np.busday_count here is good idea, also convert to numpy arrays is not necessary:

s1 = pd.Series(pd.date_range(start='05/01/2019',end='05/10/2019'))
s2 = pd.Series(pd.date_range(start='05/04/2019',periods=10, freq='5d'))

s = pd.Series([np.busday_count(a, b) for a, b in zip(s1, s2)])
print (s)
0 3
1 5
2 7
3 10
4 14
5 17
6 19
7 23
8 25
9 27
dtype: int64


Related Topics



Leave a reply



Submit