Return Data Subset Time Frames Within Another Timeframes

Return data subset time frames within another timeframes?

You can use the .index* family of functions to get certain months or certain days of the month. See ?index for the full list of functions. For example:

library(quantmod)
getSymbols("SPY")
SPY[.indexmon(SPY)==0] # January for all years (note zero-based indexing!)
SPY[.indexmday(SPY)==1] # The first of every month
SPY[.indexwday(SPY)==1] # All Mondays

subset list of xts objects

You can use lapply to loop over all the elements in your list, and use an anonymous function to subset them.

lapply(xts_list, function(x) x["2011/"])

Subsetting data frames in R

The reason must be in different treatment of NA values by these two methods. If you remove rows with NA from the data frame you should get the same results:

dat_clean = na.omit(dat)

Extract subset of multiple time series

We create a unique group_indices() by group and x, then we filter groups that have fewer than 3 observations and row_number()s of observations where x != 1 that are %in% the range n() (group size) to n()-2 to keep only the 3 observations prior to a change of x occuring.

library(dplyr)

df %>%
mutate(g = group_indices_(., .dots = c("group", "x"))) %>%
group_by(g) %>%
mutate(condition = ifelse(x == 1, NA, row_number())) %>%
filter(n() >= 3, ifelse(is.na(condition), TRUE, condition %in% n():(n()-2)))

Which gives:

#Source: local data frame [13 x 5]
#Groups: g [4]
#
# group x time g condition
# <int> <int> <int> <int> <int>
#1 1 0 1636 1 1
#2 1 0 1637 1 2
#3 1 0 1638 1 3
#4 1 1 1639 2 NA
#5 1 1 1640 2 NA
#6 1 1 1641 2 NA
#7 1 1 1642 2 NA
#8 2 0 1686 3 4
#9 2 0 1687 3 5
#10 2 0 1688 3 6
#11 2 1 1689 4 NA
#12 2 1 1690 4 NA
#13 2 1 1691 4 NA

You can optionally remove the g and condition columns by adding select(-(g:condition)) to the chain.


Data

df <- structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), x = c(0L, 0L, 0L, 1L,
1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L),
time = c(1636L, 1637L, 1638L, 1639L, 1640L, 1641L, 1642L,
1683L, 1684L, 1685L, 1686L, 1687L, 1688L, 1689L, 1690L, 1691L,
1638L, 1639L, 1640L)), .Names = c("group", "x", "time"),
class = "data.frame", row.names = c(NA, -19L))

Using apply to run functions over subset of time series

Put your data in the long format using reshape2 then apply ddply from plyr for each region.

library(reshape2)
dat.m <- melt(dat,id.vars=c('date','province'))
library(plyr)

ddply(dat.m,.(province),function(ts){
## each ts looks like this (here for alpha)
## you can process it
# date province variable value
# 1 2014-09-21 region1 alpha 0.3981059
# 2 2015-01-06 region1 alpha -0.6120264


})

Dplyr grouped percentages in different timeframes

We can create a function to do the calculation

library(dplyr)
library(purrr)

f1 <- function(data) {
data %>%

filter(ELIGIBLE == 1 ) %>%

group_by(GROUP) %>%

transmute(count_Eligible = sum(ELIGIBLE == 1 ),
count_events = sum(EVENT == 1 ),
Percentage = round(100*count_events/count_Eligible,2))


}

Then, loop over the 'lookback' periods, subset the data based on the 'DATE' column and apply the function

map2_dfr(list(three_month_lookback, six_month_lookback, 
one_year_lookback) list(today(), three_month_lookback, today()),
~ data %>%
mutate(DATE = as.Date(DATE)) %>%
filter(DATE >= .x, DATE <= .y) %>%
f1(.), .id = 'grp'
)

If we need to combine by columns

map2(list(three_month_lookback, six_month_lookback, 
one_year_lookback) list(today(), three_month_lookback, today()),
~ data %>%
mutate(DATE = as.Date(DATE)) %>%
filter(DATE >= .x, DATE <= .y) %>%
f1(.)
) %>%
reduce(full_join, by = "GROUP")


Related Topics



Leave a reply



Submit