Generate records in an R data frame between two dates
Another option using tidyverse
Input %>%
mutate(Frequency = case_when(Frequency == "Weekly" ~ 7L,
Frequency == "Fortnightly" ~ 14L,
Frequency == "Monthly" ~ 30L,
TRUE ~ 0L)) %>%
group_by(CusID) %>%
group_modify(~ {PaymentDate <- seq.Date(from = .x$FromDate, to = .x$ToDate, by = .x$Frequency)
crossing(.x[,1], PaymentDate)})
# A tibble: 7 x 3
# Groups: CusID [3]
CusID PaymentDate Amount
<dbl> <date> <dbl>
1 1 2019-01-01 5
2 1 2019-01-08 5
3 1 2019-01-15 5
4 2 2019-01-04 10
5 3 2019-02-02 12
6 3 2019-03-04 12
7 3 2019-04-03 12
Payment dates are a little different from your expected output because seq.Date
adds 30 days taking into account the different number of days in those months.
UPDATE:
Here is a more verbatim solution
Input %>%
mutate(PaymentDate = FromDate,
RFrequency = case_when(Frequency == "Weekly" ~ '1 week',
Frequency == "Fortnightly" ~ '2 weeks',
Frequency == "Monthly" ~ '1 month')) %>%
group_by(CusID, Amount) %>%
expand(PaymentDate = seq.Date(FromDate,ToDate, by = RFrequency))
# A tibble: 7 x 3
# Groups: CusID, Amount [3]
CusID Amount PaymentDate
<dbl> <dbl> <date>
1 1 5 2019-01-01
2 1 5 2019-01-08
3 1 5 2019-01-15
4 2 10 2019-01-04
5 3 12 2019-02-02
6 3 12 2019-03-02
7 3 12 2019-04-02
Generate dates between two dates in a dataframe
Assuming that the 'min_date/max_date' columns are Date
class, we use Map
to get the sequence of each 'min_date' with the corresponding 'max_date' in a list
, replicate the sequence of rows of 'df1' with the number of rows of the list
elements, create a data.frame
by expanding the dataset based on 'i1' and get create 'dates' by concatenating the 'lst' elements.
lst <- Map(function(x, y) seq(x,y, by = "1 day"), df1$min_date, df1$max_date)
i1 <- rep(1:nrow(df1), lengths(lst))
data.frame(df1[i1,-3], dates = do.call("c", lst))
Or if we are using dplyr
library(dplyr)
df1 %>%
rowwise() %>%
do(data.frame(.[1:2], date = seq(.$min_date, .$max_date, by = "1 day")))
Or using data.table
, we can do this in a single line of code
library(data.table)
setDT(df1)[,.(date = seq(min_date, max_date, by = "1 day")) ,.(id1, id2)]
Generate rows between two dates into a data frame in R
Since you don't state otherwise, this answer assumes the stDte
and edDte
columns are both of "Date" class.
In base R you could use Map()
to create the sequence of dates, then data.frame
to bring the new data frame together after creating the new Name
column with rep.int()
.
M <- Map(seq, df$stDte, df$edDte, by = "month")
df2 <- data.frame(
Name = rep.int(df$Name, vapply(M, length, 1L)),
Dte = do.call(c, M)
)
str(df2)
# 'data.frame': 65 obs. of 2 variables:
# $ Name: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
# $ Dte : Date, format: "2010-05-01" "2010-06-01" ...
head(df2, 3)
# Name Dte
# 1 A 2010-05-01
# 2 A 2010-06-01
# 3 A 2010-07-01
tail(df2, 3)
# Name Dte
# 63 B 2013-12-01
# 64 B 2014-01-01
# 65 B 2014-02-01
Or you can use the data.table
package and do
library(data.table)
setDT(df)[, .(Dte = seq(stDte, edDte, by = "month")), by = Name]
How to fill dates between two dates
Here, we may need to use by
as sequence of rows
library(data.table)
setDT(df)[, .(date = seq(as.Date(start_date), as.Date(end_date),
by = 'day')), .(rn = seq_len(nrow(df)), name, value)][, rn := NULL][]
Or create a list
column by looping over corresponding elements of 'start_date', 'end_date' to create a sequence of dates in Map
and then unnest
the list
library(tidyr)
library(magrittr)
setDT(df)[, .(name, date = Map(seq, MoreArgs = list(by = '1 day'),
as.Date(start_date), as.Date(end_date)), value)] %>%
unnest(date)
# A tibble: 731 x 3
# name date value
# <chr> <date> <dbl>
# 1 A 2020-01-23 8.1
# 2 A 2020-01-24 8.1
# 3 A 2020-01-25 8.1
# 4 A 2020-01-26 8.1
# 5 A 2020-01-27 8.1
# 6 A 2020-01-28 8.1
# 7 A 2020-01-29 8.1
# 8 A 2020-01-30 8.1
# 9 A 2020-01-31 8.1
#10 A 2020-02-01 8.1
# … with 721 more rows
R: Insert rows into dataframe by date
Here is something you can try - let me know if this is helpful.
First, make sure your dates are Date
format (if not already):
df$date <- as.Date(df$date, format = "%d-%m-%Y")
df$death <- as.Date(df$death, format = "%d-%m-%Y")
You can use tidyverse
and lubridate
for most functions needed. Also note that padr
can be useful for padding dates.
In this case, first group_by
id. You can use complete
to fill in missing dates. The sequence of dates used will be from the first date to 89 days later (total 90 days). The fill
will make the score
zero for these additional dates added.
Another fill
can be used to include the date of death for each subsequent row added (since it appears to be the structure used already).
Finally, you can mutate
the score
- if death
is not NA
and the date
for a row comes after death
, then make the value 20 (otherwise just retain the current score
).
library(tidyverse)
library(lubridate)
df %>%
group_by(id) %>%
complete(date = seq(date[1], date[1] + days(89), by = "1 day"), fill = list(score = 0)) %>%
fill(death, .direction = "down") %>%
mutate(score = ifelse(!is.na(death) & date > death, 20, score)) %>%
print(n=180)
Output
# A tibble: 180 x 4
# Groups: id [2]
id date death score
<int> <date> <date> <dbl>
1 1 2001-01-01 2001-02-04 14
2 1 2001-01-02 2001-02-04 16
3 1 2001-01-03 2001-02-04 0
4 1 2001-01-04 2001-02-04 0
5 1 2001-01-05 2001-02-04 0
6 1 2001-01-06 2001-02-04 0
7 1 2001-01-07 2001-02-04 0
8 1 2001-01-08 2001-02-04 0
9 1 2001-01-09 2001-02-04 0
10 1 2001-01-10 2001-02-04 0
11 1 2001-01-11 2001-02-04 0
12 1 2001-01-12 2001-02-04 0
13 1 2001-01-13 2001-02-04 0
14 1 2001-01-14 2001-02-04 0
15 1 2001-01-15 2001-02-04 0
16 1 2001-01-16 2001-02-04 0
17 1 2001-01-17 2001-02-04 0
18 1 2001-01-18 2001-02-04 0
19 1 2001-01-19 2001-02-04 0
20 1 2001-01-20 2001-02-04 0
21 1 2001-01-21 2001-02-04 0
22 1 2001-01-22 2001-02-04 0
23 1 2001-01-23 2001-02-04 0
24 1 2001-01-24 2001-02-04 0
25 1 2001-01-25 2001-02-04 0
26 1 2001-01-26 2001-02-04 0
27 1 2001-01-27 2001-02-04 0
28 1 2001-01-28 2001-02-04 0
29 1 2001-01-29 2001-02-04 0
30 1 2001-01-30 2001-02-04 0
31 1 2001-01-31 2001-02-04 15
32 1 2001-02-01 2001-02-04 16
33 1 2001-02-02 2001-02-04 12
34 1 2001-02-03 2001-02-04 0
35 1 2001-02-04 2001-02-04 0
36 1 2001-02-05 2001-02-04 20
37 1 2001-02-06 2001-02-04 20
38 1 2001-02-07 2001-02-04 20
39 1 2001-02-08 2001-02-04 20
40 1 2001-02-09 2001-02-04 20
41 1 2001-02-10 2001-02-04 20
42 1 2001-02-11 2001-02-04 20
43 1 2001-02-12 2001-02-04 20
44 1 2001-02-13 2001-02-04 20
45 1 2001-02-14 2001-02-04 20
46 1 2001-02-15 2001-02-04 20
47 1 2001-02-16 2001-02-04 20
48 1 2001-02-17 2001-02-04 20
49 1 2001-02-18 2001-02-04 20
50 1 2001-02-19 2001-02-04 20
51 1 2001-02-20 2001-02-04 20
52 1 2001-02-21 2001-02-04 20
53 1 2001-02-22 2001-02-04 20
54 1 2001-02-23 2001-02-04 20
55 1 2001-02-24 2001-02-04 20
56 1 2001-02-25 2001-02-04 20
57 1 2001-02-26 2001-02-04 20
58 1 2001-02-27 2001-02-04 20
59 1 2001-02-28 2001-02-04 20
60 1 2001-03-01 2001-02-04 20
61 1 2001-03-02 2001-02-04 20
62 1 2001-03-03 2001-02-04 20
63 1 2001-03-04 2001-02-04 20
64 1 2001-03-05 2001-02-04 20
65 1 2001-03-06 2001-02-04 20
66 1 2001-03-07 2001-02-04 20
67 1 2001-03-08 2001-02-04 20
68 1 2001-03-09 2001-02-04 20
69 1 2001-03-10 2001-02-04 20
70 1 2001-03-11 2001-02-04 20
71 1 2001-03-12 2001-02-04 20
72 1 2001-03-13 2001-02-04 20
73 1 2001-03-14 2001-02-04 20
74 1 2001-03-15 2001-02-04 20
75 1 2001-03-16 2001-02-04 20
76 1 2001-03-17 2001-02-04 20
77 1 2001-03-18 2001-02-04 20
78 1 2001-03-19 2001-02-04 20
79 1 2001-03-20 2001-02-04 20
80 1 2001-03-21 2001-02-04 20
81 1 2001-03-22 2001-02-04 20
82 1 2001-03-23 2001-02-04 20
83 1 2001-03-24 2001-02-04 20
84 1 2001-03-25 2001-02-04 20
85 1 2001-03-26 2001-02-04 20
86 1 2001-03-27 2001-02-04 20
87 1 2001-03-28 2001-02-04 20
88 1 2001-03-29 2001-02-04 20
89 1 2001-03-30 2001-02-04 20
90 1 2001-03-31 2001-02-04 20
91 2 2001-04-01 NA 5
92 2 2001-04-02 NA 0
93 2 2001-04-03 NA 6
94 2 2001-04-04 NA 0
95 2 2001-04-05 NA 0
96 2 2001-04-06 NA 0
97 2 2001-04-07 NA 0
98 2 2001-04-08 NA 0
99 2 2001-04-09 NA 0
100 2 2001-04-10 NA 0
101 2 2001-04-11 NA 0
102 2 2001-04-12 NA 0
103 2 2001-04-13 NA 0
104 2 2001-04-14 NA 0
105 2 2001-04-15 NA 0
106 2 2001-04-16 NA 0
107 2 2001-04-17 NA 0
108 2 2001-04-18 NA 0
109 2 2001-04-19 NA 0
110 2 2001-04-20 NA 0
111 2 2001-04-21 NA 0
112 2 2001-04-22 NA 0
113 2 2001-04-23 NA 0
114 2 2001-04-24 NA 0
115 2 2001-04-25 NA 0
116 2 2001-04-26 NA 0
117 2 2001-04-27 NA 0
118 2 2001-04-28 NA 0
119 2 2001-04-29 NA 0
120 2 2001-04-30 NA 0
121 2 2001-05-01 NA 0
122 2 2001-05-02 NA 0
123 2 2001-05-03 NA 0
124 2 2001-05-04 NA 0
125 2 2001-05-05 NA 9
126 2 2001-05-06 NA 2
127 2 2001-05-07 NA 10
128 2 2001-05-08 NA 0
129 2 2001-05-09 NA 0
130 2 2001-05-10 NA 0
131 2 2001-05-11 NA 0
132 2 2001-05-12 NA 0
133 2 2001-05-13 NA 0
134 2 2001-05-14 NA 0
135 2 2001-05-15 NA 0
136 2 2001-05-16 NA 0
137 2 2001-05-17 NA 0
138 2 2001-05-18 NA 0
139 2 2001-05-19 NA 0
140 2 2001-05-20 NA 0
141 2 2001-05-21 NA 0
142 2 2001-05-22 NA 0
143 2 2001-05-23 NA 0
144 2 2001-05-24 NA 0
145 2 2001-05-25 NA 0
146 2 2001-05-26 NA 0
147 2 2001-05-27 NA 0
148 2 2001-05-28 NA 0
149 2 2001-05-29 NA 0
150 2 2001-05-30 NA 0
151 2 2001-05-31 NA 0
152 2 2001-06-01 NA 0
153 2 2001-06-02 NA 0
154 2 2001-06-03 NA 0
155 2 2001-06-04 NA 0
156 2 2001-06-05 NA 0
157 2 2001-06-06 NA 0
158 2 2001-06-07 NA 0
159 2 2001-06-08 NA 0
160 2 2001-06-09 NA 0
161 2 2001-06-10 NA 0
162 2 2001-06-11 NA 0
163 2 2001-06-12 NA 0
164 2 2001-06-13 NA 0
165 2 2001-06-14 NA 0
166 2 2001-06-15 NA 0
167 2 2001-06-16 NA 0
168 2 2001-06-17 NA 0
169 2 2001-06-18 NA 0
170 2 2001-06-19 NA 0
171 2 2001-06-20 NA 0
172 2 2001-06-21 NA 0
173 2 2001-06-22 NA 0
174 2 2001-06-23 NA 0
175 2 2001-06-24 NA 0
176 2 2001-06-25 NA 0
177 2 2001-06-26 NA 0
178 2 2001-06-27 NA 0
179 2 2001-06-28 NA 0
180 2 2001-06-29 NA 0
R Create new rows in data frame from given dates between two time points
So I modified the initial code slightly :
library(lubridate)
library(data.table)
ID1 <- c("AUT","AUT","AUT","BEL","BEL","BEL")
start_date <- ymd("2008-12-02", "2013-12-16", "2016-05-17", "2007-06-10", "2007-12-21", "2008-03-20")
end_date <- ymd("2013-12-15", "2016-05-16", "2017-11-30", "2007-12-20", "2008-03-19", "2008-12-29")
ID2 <- 1:6
df <- data.table(ID1, start_date, end_date, ID2)
df[,yearDiff:=year(end_date)-year(start_date)]
df<-df[,cbind(.SD,year=(year(start_date)+1):year(end_date)),by="ID2"]
df[,dateInterval:=interval(df$start_date,df$end_date)]
df[,IsYearWithinDate:=(ymd(paste0(year,"01","01",sep="-"))%within% dateInterval)]
df[,.(ID1,start_date,end_date,ID2,year,IsYearWithinDate)]
resulting in (via df[,.(ID1,start_date,end_date,ID2,year
):
ID1 start_date end_date ID2 year
1: AUT 2008-12-02 2013-12-15 1 2009
2: AUT 2008-12-02 2013-12-15 1 2010
3: AUT 2008-12-02 2013-12-15 1 2011
4: AUT 2008-12-02 2013-12-15 1 2012
5: AUT 2008-12-02 2013-12-15 1 2013
6: AUT 2013-12-16 2016-05-16 2 2014
7: AUT 2013-12-16 2016-05-16 2 2015
8: AUT 2013-12-16 2016-05-16 2 2016
9: AUT 2016-05-17 2017-11-30 3 2017
10: BEL 2007-06-10 2007-12-20 4 2008
11: BEL 2007-06-10 2007-12-20 4 2007
12: BEL 2007-12-21 2008-03-19 5 2008
13: BEL 2008-03-20 2008-12-29 6 2009
14: BEL 2008-03-20 2008-12-29 6 2008
insert rows between dates by group
By using dplyr
and tidyr
:)(one line solution )
library(dplyr)
library(tidyr)
dt %>% group_by(user) %>% complete(date=full_seq(date,1),fill=list(dummy=0))
# A tibble: 9 x 3
# Groups: user [2]
user date dummy
<fctr> <date> <dbl>
1 A 2017-01-03 1
2 A 2017-01-04 0
3 A 2017-01-05 0
4 A 2017-01-06 1
5 B 2016-05-01 1
6 B 2016-05-02 0
7 B 2016-05-03 1
8 B 2016-05-04 0
9 B 2016-05-05 1
Related Topics
How to Get Confidence Interval for Smooth.Spline
Unicode Characters Conversion in R
Convert Time Object to Categorical (Morning, Afternoon, Evening, Night) Variable in R
Only Source Functions in a .R File
How to Pass Vector to Integrate Function
Keep First Row by Multiple Columns in an R Data.Table
Global Variable in a Package - Which Approach Is More Recommended
Transposition of a Tibble Using Pivot_Longer() and Pivot_Wider (Tidyverse)
Is There an R Library That Estimates a Multivariate Natural Cubic Spline (Or Similar) Function
Ordered Factors in Ggplot2 Bar Chart
Collapse a Data.Frame into a Vector
How to Convert Time Stamp String "2014-07-20T05:11:49.988Z" into Posixt in R
Order of Dates Is Not Chronological in Ggplot2
Replicate a List to Create a List-Of-Lists
Time Series and Stl in R: Error Only Univariate Series Are Allowed
Getting File Path from Shiny UI (Not Just Directory) Using Browse Button Without Uploading the File
Error When Plotting Sf Object --- Error: Could Not Find Function "Geom_Sf"