As.Date Produces Unexpected Result in a Sequence of Week-Based Dates

as.Date produces unexpected result in a sequence of week-based dates

Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:

# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))

The result

#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"

is of class Date.

Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.

So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.

Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.

About the ISOweek package

Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).

POSIX date from dates in weekly time format

As I have to deal a lot with reporting by ISO weeks, I've created the ISOweek package some years ago.

The package includes the function ISOweek2date() which returns the date of a given weekdate (year, week of the year, day of week according to ISO 8601). It's the inverse function to date2ISOweek().

With ISOweek, your examples become:

library(ISOweek)

# define dates to convert
dates <- as.Date(c("2016-12-01", "2016-01-01"))

# convert to full ISO 8601 week-based date yyyy-Www-d
(week_dates <- date2ISOweek(dates))
[1] "2016-W48-4" "2015-W53-5"
# convert back to class Date 
ISOweek2date(week_dates)
[1] "2016-12-01" "2016-01-01"

Note that date2ISOweek() requires a full ISO week-based date in the format yyyy-Www-d including the day of the week (1 to 7, Monday to Sunday).

So, if you only have year and ISO week number you have to create a character string with a day of the week specified.

A typical phrase in many reports is, e.g., "reporting week 31 ending 2017-08-06":h

yr <- 2017
wk <- 31
ISOweek2date(sprintf("%4i-W%02i-%1i", yr, wk, 7))
 [1] "2017-08-06"

Addendum

Please, see this answer for another use case and more background information on the ISOweek package.

Fill in missing date and fill with the data above

The following solution uses lubridate and tidyr packages. Note that in OP example, dates are malformed, but implies having data with last-day-of-month input, so tried to replicate it here. Solution creates a sequence of dates from min input date to max input date to get all possible months of interest. Note that input dates are normalized to first-day-of-month to ensure proper sequence generation. With the sequence created, a left-join merge is done to merge data we have and identify missing data. Then fill() is applied to columns to fill in the missing NAs.

library(lubridate)
library(tidyr)
#Note OP has month of Feb with 31 days... Corrected to 28 but this fails to parse as a date
df <- data.frame(client_id=c(123,123,123),value=c(100,200,300),repmonth=c("2012-01-31","2012-02-29","2012-05-31"),stringsAsFactors = F)

df$repmonth <- ymd(df$repmonth) #convert character dates to Dates
start_month <- min(df$repmonth)
start_month <- start_month - days(day(start_month)-1) #first day of month to so seq.Date sequences properly

all_dates <- seq.Date(from=start_month,to=max(df$repmonth),by="1 month")
all_dates <- (all_dates %m+% months(1)) - days(1) #all end-of-month-day since OP suggests having last-day-of-month input?
all_dates <- data.frame(repmonth=all_dates)
df<-merge(x=all_dates,y=df,by="repmonth",all.x=T)

df <- fill(df,c("client_id","value"))

Solution yields:

> df
repmonth client_id value
1 2012-01-31 123 100
2 2012-02-29 123 200
3 2012-03-31 123 200
4 2012-04-30 123 200
5 2012-05-31 123 300

generate date range between min and max dates Athena presto SQL sequence error

You need to provide interval to generate date sequence (in this case interval '1' day):

WITH dataset AS (
SELECT *
FROM
( VALUES
('A', DATE '2021-08-21'), ('A', DATE '2021-08-25'),
('B', DATE '2021-08-07'), ('B', DATE '2021-08-24')
) AS d (job_name, run_date)
)

select job_name, sequence(min(run_date), max(run_date), interval '1' day) seq
from dataset
group by job_name

Output:



















job_nameseq
A[2021-08-21 00:00:00.000, 2021-08-22 00:00:00.000, 2021-08-23 00:00:00.000, 2021-08-24 00:00:00.000, 2021-08-25 00:00:00.000]
B[2021-08-07 00:00:00.000, 2021-08-08 00:00:00.000, 2021-08-09 00:00:00.000, 2021-08-10 00:00:00.000, 2021-08-11 00:00:00.000, 2021-08-12 00:00:00.000, 2021-08-13 00:00:00.000, 2021-08-14 00:00:00.000, 2021-08-15 00:00:00.000, 2021-08-16 00:00:00.000, 2021-08-17 00:00:00.000, 2021-08-18 00:00:00.000, 2021-08-19 00:00:00.000, 2021-08-20 00:00:00.000, 2021-08-21 00:00:00.000, 2021-08-22 00:00:00.000, 2021-08-23 00:00:00.000, 2021-08-24 00:00:00.000]



Related Topics



Leave a reply



Submit