Get business days between start and end date using pandas
Use BDay()
to get the business days in range.
from pandas.tseries.offsets import *
In [185]: s
Out[185]:
2011-01-01 -0.011629
2011-01-02 -0.089666
2011-01-03 -1.314430
2011-01-04 -1.867307
2011-01-05 0.779609
2011-01-06 0.588950
2011-01-07 -2.505803
2011-01-08 0.800262
2011-01-09 0.376406
2011-01-10 -0.469988
Freq: D
In [186]: s.asfreq(BDay())
Out[186]:
2011-01-03 -1.314430
2011-01-04 -1.867307
2011-01-05 0.779609
2011-01-06 0.588950
2011-01-07 -2.505803
2011-01-10 -0.469988
Freq: B
With slicing:
In [187]: x=datetime(2011, 1, 5)
In [188]: y=datetime(2011, 1, 9)
In [189]: s.ix[x:y]
Out[189]:
2011-01-05 0.779609
2011-01-06 0.588950
2011-01-07 -2.505803
2011-01-08 0.800262
2011-01-09 0.376406
Freq: D
In [190]: s.ix[x:y].asfreq(BDay())
Out[190]:
2011-01-05 0.779609
2011-01-06 0.588950
2011-01-07 -2.505803
Freq: B
and count()
In [191]: s.ix[x:y].asfreq(BDay()).count()
Out[191]: 3
How to calculate the quantity of business days between two dates using Pandas
pd.date_range
's parameters need to be datetimes, not series.
For this reason, we can use df.apply
to apply the function to each row.
In addition, pandas has bdate_range
which is just date_range
with freq
defaulting to business days, which is exactly what you need.
Using apply and a lambda function, we can create a new Series calculating business days between each start and current date for each row.
projects_df['start_date'] = pd.to_datetime(projects_df['start_date'])
projects_df['current_date'] = pd.to_datetime(projects_df['current_date'])
projects_df['days_count'] = projects_df.apply(lambda row: len(pd.bdate_range(row['start_date'], row['current_date'])), axis=1)
Using a random sample of 10 date pairs, my output is the following:
start_date current_date bdays
0 2022-01-03 17:08:04 2022-05-20 00:53:46 100
1 2022-04-18 09:43:02 2022-06-10 16:56:16 40
2 2022-09-01 12:02:34 2022-09-25 14:59:29 17
3 2022-04-02 14:24:12 2022-04-24 21:05:55 15
4 2022-01-31 02:15:46 2022-07-02 16:16:02 110
5 2022-08-02 22:05:15 2022-08-17 17:25:10 12
6 2022-03-06 05:30:20 2022-07-04 08:43:00 86
7 2022-01-15 17:01:33 2022-08-09 21:48:41 147
8 2022-06-04 14:47:53 2022-12-12 18:05:58 136
9 2022-02-16 11:52:03 2022-10-18 01:30:58 175
Add business days to pandas dataframe with dates and skip over holidays python
Input data
df = pd.DataFrame(['2021-02-09', '2021-02-10', '2021-06-28', '2021-06-29', '2021-07-02'], columns=['DATE'])
df['DATE'] = pd.to_datetime(df['DATE'])
Suggested solution using applyfrom pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay
def offset_date(start, offset):
return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USFederalHolidayCalendar())
offset = 5
df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)
DATE END
2021-02-09 2021-02-17
2021-02-10 2021-02-18
2021-06-28 2021-07-06
2021-06-29 2021-07-07
2021-07-02 2021-07-12
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay
def offset_date(start, offset):
return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USFederalHolidayCalendar())
offset = 5
df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)
DATE END
2021-02-09 2021-02-17
2021-02-10 2021-02-18
2021-06-28 2021-07-06
2021-06-29 2021-07-07
2021-07-02 2021-07-12
PS: If you want to use a particular calendar such as the NYSE, instead of the default USFederalHolidayCalendar
, I recommend following the instructions on this answer, about creating a custom calendar.
Alternative solution which I do not recommend
Currently, to the best of my knowledge, pandas do not support a vectorized approach to your problem. But if you want to follow a similar approach to the one you mentioned, here is what you should do.
First, you will have to define an arbitrary far away end
date that includes all the periods you might need and use it to create a list of holidays.
holidays = USFederalHolidayCalendar().holidays(start='2021-02-09', end='2030-02-09')
Then, you pass the holidays
list to CustomBusinessDay through the holidays
parameter instead of the calendar
to generate the desired offset.
offset = 5
bday_us = pd.offsets.CustomBusinessDay(n=offset, holidays=holidays)
df['END'] = df['DATE'] + bday_us
However, this type of approach is not a true vectorized solution, even though it might seem like it. See the following SO answer for further clarification. Under the hood, this approach is probably doing a conversion that is not efficient. This why it yields the following warning.
PerformanceWarning: Non-vectorized DateOffset being applied to Series
or DatetimeIndex
How to find business days between dates with Pandas CDay?
You can simply pass your week mask and holidays from the CDay
calendar directly to np.busday_count
.
np.busday_count(start_date, end_date,
weekmask=bday_custom.weekmask, holidays=bday_custom.holidays)
Alternatively (but certainly slower), you can use pd.date_range
, and pass your custom CDay
calendar as the freq
.
pd.date_range(datetime(2017, 3, 5), datetime(2017, 3, 12), freq=bday_cust).size
This has the unfortunate side-effect of creating an intermediary date range only to use its size.
Example
Let's set up a meaningless custom business day calendar.
from pandas.tseries.offsets import CustomBusinessDay
weekmask = 'Mon Wed Fri Sat'
holidays = [datetime(2017, 3, 6), datetime(2017, 3, 11)]
bday_cust = CustomBusinessDay(holidays=holidays, weekmask=weekmask)
Now we've set the Monday and the Saturday to be business days (and holidays) for the week of March 5th to 11th. Now looking at that particular date range, we can count the remaining business days (2)
>>> np.busday_count(datetime(2017, 3, 5), datetime(2017, 3, 12),
weekmask=bday_custom.weekmask,
holidays=bday_custom.holidays)
2
>>> pd.date_range(datetime(2017, 3, 5), datetime(2017, 3, 12), freq=bday_cust).size
2
Rough benchmark on example
%timeit np.busday_count(datetime(2017, 3, 5), datetime(2017, 3, 12),
weekmask=bday_custom.weekmask,
holidays=bday_custom.holidays)
100000 loops, best of 3: 17.2 us per loop
% timeit pd.date_range(datetime(2017, 3, 5), datetime(2017, 3, 12), freq=bday_cust).size
1000 loops, best of 3: 573 us per loop
Pandas Dataframe Calculate Num Business Days
Using busday_count
from np
Ex:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Signin Date": ["2018-01-01", "2018-02-01"]})
df["Signin Date"] = pd.to_datetime(df["Signin Date"])
df['Signin Date Shifted'] = pd.DatetimeIndex(df['Signin Date']) + pd.DateOffset(months=1)
df["bussDays"] = np.busday_count( df["Signin Date"].values.astype('datetime64[D]'), df['Signin Date Shifted'].values.astype('datetime64[D]'))
print(df)
Output:
Signin Date Signin Date Shifted bussDays
0 2018-01-01 2018-02-01 23
1 2018-02-01 2018-03-01 20
MoreInfo
Difference between datetimes in terms of number of business days using pandas
I think np.busday_count
here is good idea, also convert to numpy arrays is not necessary:
s1 = pd.Series(pd.date_range(start='05/01/2019',end='05/10/2019'))
s2 = pd.Series(pd.date_range(start='05/04/2019',periods=10, freq='5d'))
s = pd.Series([np.busday_count(a, b) for a, b in zip(s1, s2)])
print (s)
0 3
1 5
2 7
3 10
4 14
5 17
6 19
7 23
8 25
9 27
dtype: int64
Related Topics
Split Datetime Column into a Date and Time Python
Cast String to Float Is Not Supported in Linear Model
How to Print Colored Text to the Terminal
How to Print Just the First Letters of Each Word
Python: How to Calculate the Average Word Length in a Sentence Using the .Split Command
Get All Rows That Have Same Value in Pandas
Is There a Memory Efficient and Fast Way to Load Big Json Files
Collecting and Reporting Pytest Results
Write a Program That Find the Largest Integer in a String
Get Character Position in Alphabet
Pandas Extract Numbers from Column into New Columns
Python: Searching for Common Values in Two Files
How to Download Multiple Files or an Entire Folder from Google Colab
Sqlalchemy: How to Filter Date Field
Numpy: Checking If a Value Is Nat
Using a Global Variable With a Thread