Return Data Subset Time Frames Within Another Timeframes

Return data subset time frames within another timeframes?

You can use the .index* family of functions to get certain months or certain days of the month. See ?index for the full list of functions. For example:

library(quantmod)
getSymbols("SPY")
SPY[.indexmon(SPY)==0]   # January for all years (note zero-based indexing!)
SPY[.indexmday(SPY)==1]  # The first of every month
SPY[.indexwday(SPY)==1]  # All Mondays

subset list of xts objects

You can use lapply to loop over all the elements in your list, and use an anonymous function to subset them.

lapply(xts_list, function(x) x["2011/"])

Subsetting data frames in R

The reason must be in different treatment of NA values by these two methods. If you remove rows with NA from the data frame you should get the same results:

dat_clean = na.omit(dat)

Extract subset of multiple time series

We create a unique group_indices() by group and x, then we filter groups that have fewer than 3 observations and row_number()s of observations where x != 1 that are %in% the range n() (group size) to n()-2 to keep only the 3 observations prior to a change of x occuring.

library(dplyr)

df %>%
  mutate(g = group_indices_(., .dots = c("group", "x"))) %>%
  group_by(g) %>%
  mutate(condition = ifelse(x == 1, NA, row_number())) %>%
  filter(n() >= 3, ifelse(is.na(condition), TRUE, condition %in% n():(n()-2)))

Which gives:

#Source: local data frame [13 x 5]
#Groups: g [4]
#
#   group     x  time     g condition
#   <int> <int> <int> <int>     <int>
#1      1     0  1636     1         1
#2      1     0  1637     1         2
#3      1     0  1638     1         3
#4      1     1  1639     2        NA
#5      1     1  1640     2        NA
#6      1     1  1641     2        NA
#7      1     1  1642     2        NA
#8      2     0  1686     3         4
#9      2     0  1687     3         5
#10     2     0  1688     3         6
#11     2     1  1689     4        NA
#12     2     1  1690     4        NA
#13     2     1  1691     4        NA

You can optionally remove the g and condition columns by adding select(-(g:condition)) to the chain.

Data

df <- structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), x = c(0L, 0L, 0L, 1L, 
1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L), 
    time = c(1636L, 1637L, 1638L, 1639L, 1640L, 1641L, 1642L, 
    1683L, 1684L, 1685L, 1686L, 1687L, 1688L, 1689L, 1690L, 1691L, 
    1638L, 1639L, 1640L)), .Names = c("group", "x", "time"), 
class = "data.frame", row.names = c(NA, -19L))

Using apply to run functions over subset of time series

Put your data in the long format using reshape2 then apply ddply from plyr for each region.

library(reshape2)
dat.m <- melt(dat,id.vars=c('date','province'))
library(plyr)

ddply(dat.m,.(province),function(ts){
## each ts looks like this (here for alpha)
## you can process it 
# date province variable      value
# 1 2014-09-21  region1    alpha  0.3981059
# 2 2015-01-06  region1    alpha -0.6120264


})

Dplyr grouped percentages in different timeframes

We can create a function to do the calculation

library(dplyr)
library(purrr)

f1 <- function(data) {
    data %>% 
          
         filter(ELIGIBLE == 1 ) %>% 

         group_by(GROUP) %>%

         transmute(count_Eligible = sum(ELIGIBLE == 1 ),                    
                 count_events = sum(EVENT == 1 ), 
         Percentage = round(100*count_events/count_Eligible,2))


   }

Then, loop over the 'lookback' periods, subset the data based on the 'DATE' column and apply the function

map2_dfr(list(three_month_lookback, six_month_lookback, 
       one_year_lookback) list(today(), three_month_lookback, today()),
        ~ data %>%
           mutate(DATE = as.Date(DATE)) %>%
           filter(DATE >= .x,  DATE <= .y) %>%
           f1(.), .id = 'grp'
    )

If we need to combine by columns

map2(list(three_month_lookback, six_month_lookback, 
       one_year_lookback) list(today(), three_month_lookback, today()),
        ~ data %>%
           mutate(DATE = as.Date(DATE)) %>%
           filter(DATE >= .x,  DATE <= .y) %>%
           f1(.)
    ) %>%
      reduce(full_join, by = "GROUP")

Return Data Subset Time Frames Within Another Timeframes