Remove weekend data in a dataframe
Convert the date column to a POSIXlt ,eg
date <- as.POSIXlt(date,format="%Y-%m-%d")
Then you can access the day of the week using
date$wday
and subset the frame appropriately
How do I remove weekends and holidays from time series data
As holidays are country and year specific, you will need to use a package for this.
I would recommend to use holidays
:
import holidays
for day in holidays.UnitedStates(years=2021).items():
print(day)
will give you a list of datetime objects for all holidays in the respective year:
(datetime.date(2021, 1, 1), "New Year's Day")
(datetime.date(2021, 12, 31), "New Year's Day (Observed)")
(datetime.date(2021, 1, 18), 'Martin Luther King Jr. Day')
(datetime.date(2021, 2, 15), "Washington's Birthday")
...
Next step would be to cast your days to the same format:
import pandas as pd
df = pd.DataFrame([{"id":1, "day":"2021-07-22 08:41:36.625573856+00:00"}, {"id":1, "day":"2021-12-31 08:41:36.625573856+00:00"}])
df.day = pd.to_datetime(df.day)
Afterwards it is easy to compare if the day is contained in the list of holidays:
df.loc[:,"isholiday"] = df.apply(lambda x: x.day.date() in [d[0] for d in holidays.UnitedStates(years=2021).items()], axis=1)
df
id day isholiday
0 1 2021-07-22 08:41:36.625573856+00:00 False
1 1 2021-12-31 08:41:36.625573856+00:00 True
And same goes for the weekends of course as well by checking if the dt.dayofweek property is in [5,6] (zero-indexed days)
Remove non-business days rows from pandas dataframe
One simple solution is to slice out the days not in Monday to Friday:
In [11]: s[s.index.dayofweek < 5]
Out[11]:
2016-05-02 00:00:00 4.780
2016-05-02 00:01:00 4.777
2016-05-02 00:02:00 4.780
2016-05-02 00:03:00 4.780
2016-05-02 00:04:00 4.780
Name: closeAsk, dtype: float64
Note: this doesn't take into account bank holidays etc.
Pandas dataframe: omit weekends and days near holidays
The first part can be easily accomplished using the Pandas DatetimeIndex.dayofweek
property, which starts counting weekdays with Monday as 0 and ending with Sunday as 6.
df[df.index.dayofweek < 5]
will give you only the weekdays.
For the second part you can use the datetime
module. Below I will give an example for only one date, namely 2017-12-25. You can easily generalize it to a list of dates, for example by defining a helper function.
from datetime import datetime, timedelta
N = 3
df[abs(df.index.date - datetime.strptime("2017-12-25", '%Y-%m-%d').date()) > timedelta(N)]
This will give all dates that are more than N=3
days away from 2017-12-25. That is, it will exclude an interval of 7 days from 2017-12-22 to 2017-12-28.
Lastly, you can combine the two criteria using the &
operator, as you probably know.
df[
(df.index.dayofweek < 5)
&
(abs(df.index.date - datetime.strptime("2017-12-25", '%Y-%m-%d').date()) > timedelta(N))
]
Exclude Weekends in Dot in Dplyr
We could remove weekend days prior to predict:
df %>%
group_by(group) %>%
mutate(weekdays = weekdays(ds)) %>%
filter(weekdays != "Saturday" & weekdays != "Sunday") %>%
do(predict(prophet(., daily.seasonality = TRUE, yearly.seasonality = TRUE),
filter(make_future_dataframe(prophet(., daily.seasonality = TRUE, yearly.seasonality = TRUE), periods = 14), weekdays(ds) != "Saturday" & weekdays(ds) != "Sunday"))) %>%
select(ds, group, yhat)
Related Topics
How Does R's Ifelse Work with Character Data
Combine Multiple .Rdata Files Containing Objects with the Same Name into One Single .Rdata File
In Shiny Apps for R, How to Delay the Firing of a Reactive
How to Create a Plot with Customized Points in R
How to Rename Element's List Indexed by a Loop in R
Sum Non Na Elements Only, But If All Na Then Return Na
How to Use the Spread Function Properly in Tidyr
Use Lapply for Multiple Regression with Formula Changing, Not the Dataset
Collapse a Data.Frame into a Vector
Ggplot2': Label Values of Barplot That Uses 'Fun.Y="Mean"' of 'Stat_Summary'
How to Use a Character Vector of Column Names in the Formula Argument of Dcast (Reshape2)
Adjusting the Width of Legend for Continuous Variable
Extract Columns from Data Table by Numeric Indices Stored in a Vector