Remove Weekend Data in a Dataframe

Remove weekend data in a dataframe

Convert the date column to a POSIXlt ,eg

date <- as.POSIXlt(date,format="%Y-%m-%d")

Then you can access the day of the week using

date$wday

and subset the frame appropriately

How do I remove weekends and holidays from time series data

As holidays are country and year specific, you will need to use a package for this.

I would recommend to use holidays:

import holidays

for day in holidays.UnitedStates(years=2021).items():
print(day)

will give you a list of datetime objects for all holidays in the respective year:

(datetime.date(2021, 1, 1), "New Year's Day")
(datetime.date(2021, 12, 31), "New Year's Day (Observed)")
(datetime.date(2021, 1, 18), 'Martin Luther King Jr. Day')
(datetime.date(2021, 2, 15), "Washington's Birthday")
...

Next step would be to cast your days to the same format:

import pandas as pd

df = pd.DataFrame([{"id":1, "day":"2021-07-22 08:41:36.625573856+00:00"}, {"id":1, "day":"2021-12-31 08:41:36.625573856+00:00"}])

df.day = pd.to_datetime(df.day)

Afterwards it is easy to compare if the day is contained in the list of holidays:

df.loc[:,"isholiday"] = df.apply(lambda x: x.day.date() in [d[0] for d in holidays.UnitedStates(years=2021).items()], axis=1)

df
id day isholiday
0 1 2021-07-22 08:41:36.625573856+00:00 False
1 1 2021-12-31 08:41:36.625573856+00:00 True

And same goes for the weekends of course as well by checking if the dt.dayofweek property is in [5,6] (zero-indexed days)

Remove non-business days rows from pandas dataframe

One simple solution is to slice out the days not in Monday to Friday:

In [11]: s[s.index.dayofweek < 5]
Out[11]:
2016-05-02 00:00:00 4.780
2016-05-02 00:01:00 4.777
2016-05-02 00:02:00 4.780
2016-05-02 00:03:00 4.780
2016-05-02 00:04:00 4.780
Name: closeAsk, dtype: float64

Note: this doesn't take into account bank holidays etc.

Pandas dataframe: omit weekends and days near holidays

The first part can be easily accomplished using the Pandas DatetimeIndex.dayofweek property, which starts counting weekdays with Monday as 0 and ending with Sunday as 6.

df[df.index.dayofweek < 5] will give you only the weekdays.


For the second part you can use the datetime module. Below I will give an example for only one date, namely 2017-12-25. You can easily generalize it to a list of dates, for example by defining a helper function.

from datetime import datetime, timedelta

N = 3

df[abs(df.index.date - datetime.strptime("2017-12-25", '%Y-%m-%d').date()) > timedelta(N)]

This will give all dates that are more than N=3 days away from 2017-12-25. That is, it will exclude an interval of 7 days from 2017-12-22 to 2017-12-28.


Lastly, you can combine the two criteria using the & operator, as you probably know.

df[
(df.index.dayofweek < 5)
&
(abs(df.index.date - datetime.strptime("2017-12-25", '%Y-%m-%d').date()) > timedelta(N))
]

Exclude Weekends in Dot in Dplyr

We could remove weekend days prior to predict:

df %>% 
group_by(group) %>%
mutate(weekdays = weekdays(ds)) %>%
filter(weekdays != "Saturday" & weekdays != "Sunday") %>%
do(predict(prophet(., daily.seasonality = TRUE, yearly.seasonality = TRUE),
filter(make_future_dataframe(prophet(., daily.seasonality = TRUE, yearly.seasonality = TRUE), periods = 14), weekdays(ds) != "Saturday" & weekdays(ds) != "Sunday"))) %>%
select(ds, group, yhat)


Related Topics



Leave a reply



Submit