Indicator Function in R

Indicator function in R

If i am able to understand you correctly then you want to make changes into entire data frame,assuming of which i can suggest you to use apply like below, where df is your data frame.

apply(df,2,function(x)ifelse((x>0),1,0))

You can also use if its for only one vector something like below:

x <- c(-2,3,1,0)
y <- ifelse(x>0,1,0)
print(y)
[1] 0 1 1 0 #Output

Hope this helps

How to create an indicator function based on existence of past occurrences?

I'll demonstrate using zoo::rollapplyr for the "this and previous 3 quarters" (4-wide window) in a dplyr pipe. We first need to "fill out" the quarters, which I'll do by summarizing, converting to Date, then sequence by 3 months, merge back in to df, and then roll the calcs.

Note, compared to your result, in some cases my "forecasting" goes further into the future. I don't know if this is an inconsistency or just an indication of you estimating what you wanted result to look like.

Up front, the full code and results, then I'll walk through it.

library(dplyr)
# library(purrr) # map2
# library(tidyr) # unnest
out <- df %>%
group_by(fund_id) %>%
mutate(miny = min(year_q), maxy = max(year_q)) %>%
distinct(fund_id, stock_id, miny, maxy) %>%
group_by(fund_id, stock_id) %>%
mutate(across(c(miny, maxy), ~ as.Date(paste0(., "-01")))) %>%
transmute(year_q = purrr::map2(miny, maxy, ~ format(seq(.x, .y, by = "3 months"), format = "%Y-%m"))) %>%
tidyr::unnest(year_q) %>%
full_join(df, by = c("fund_id", "stock_id", "year_q")) %>%
arrange(fund_id, stock_id, year_q) %>% # only 'year_q' is strictly required, other 2 are aesthetic
mutate(
Indicator2 = zoo::rollapplyr(value, 4, FUN = function(z) any(!is.na(z)), partial = TRUE)
) %>%
ungroup() %>%
mutate(value = coalesce(value, 0))

out
# # A tibble: 38 x 5
# fund_id stock_id year_q value Indicator2
# <dbl> <dbl> <chr> <dbl> <lgl>
# 1 1 1 2011-03 1 TRUE
# 2 1 1 2011-06 2 TRUE
# 3 1 1 2011-09 1 TRUE
# 4 1 1 2011-12 3 TRUE
# 5 1 1 2012-03 4 TRUE
# 6 1 1 2012-06 2 TRUE
# 7 1 1 2012-09 0 TRUE
# 8 1 1 2012-12 0 TRUE
# 9 1 1 2013-03 0 TRUE
# 10 1 1 2013-06 0 FALSE
# # ... with 28 more rows

Quick validation, works conveniently because the order is the same:

with(full_join(out, result, by = c("fund_id", "stock_id", "year_q")),
identical(Indicator, Indicator2))
# [1] TRUE

Walk-through:

  1. Find the min/max year_q for each fund_id:

    df %>%
    group_by(fund_id) %>%
    mutate(miny = min(year_q), maxy = max(year_q)) %>%
    distinct(fund_id, stock_id, miny, maxy)
    # # A tibble: 4 x 4
    # # Groups: fund_id [2]
    # fund_id stock_id miny maxy
    # <dbl> <dbl> <chr> <chr>
    # 1 1 1 2011-03 2013-06
    # 2 1 2 2011-03 2013-06
    # 3 2 1 2013-03 2015-03
    # 4 2 3 2013-03 2015-03
  2. "Fill" that out so that each stock_id is covered for the full time-frame of the fund_id:

    ... %>%
    group_by(fund_id, stock_id) %>%
    mutate(across(c(miny, maxy), ~ as.Date(paste0(., "-01")))) %>%
    transmute(year_q = purrr::map2(miny, maxy, ~ format(seq(.x, .y, by = "3 months"), format = "%Y-%m"))) %>%
    tidyr::unnest(year_q)
    # # A tibble: 38 x 3
    # # Groups: fund_id, stock_id [4]
    # fund_id stock_id year_q
    # <dbl> <dbl> <chr>
    # 1 1 1 2011-03
    # 2 1 1 2011-06
    # 3 1 1 2011-09
    # 4 1 1 2011-12
    # 5 1 1 2012-03
    # 6 1 1 2012-06
    # 7 1 1 2012-09
    # 8 1 1 2012-12
    # 9 1 1 2013-03
    # 10 1 1 2013-06
    # # ... with 28 more rows
  3. Join the original data back in. This shows us quarters that were missing.

    ... %>%
    full_join(df, by = c("fund_id", "stock_id", "year_q")) %>%
    arrange(fund_id, stock_id, year_q)
    # # A tibble: 38 x 4
    # # Groups: fund_id, stock_id [4]
    # fund_id stock_id year_q value
    # <dbl> <dbl> <chr> <dbl>
    # 1 1 1 2011-03 1
    # 2 1 1 2011-06 2
    # 3 1 1 2011-09 1
    # 4 1 1 2011-12 3
    # 5 1 1 2012-03 4
    # 6 1 1 2012-06 2
    # 7 1 1 2012-09 NA
    # 8 1 1 2012-12 NA
    # 9 1 1 2013-03 NA
    # 10 1 1 2013-06 NA
    # # ... with 28 more rows
  4. Perform the "rolling" calculation. This is the majority of "work": for each value, we look at it and the previous 3 values (4 minus 1 for the current gives us 3 previous values) and determine if any are non-NA. partial=TRUE means that for the first value (that has no previous), we look at it alone; for the second value, we look at it and the 1 previous; etc. With partial=TRUE, this returns the same length as value; if partial=FALSE (the default), then it would return length(value) - (4-1) values, which is not what we want.

    ... %>%
    # only 'year_q' is strictly required, other 2 are aesthetic
    mutate(
    Indicator2 = zoo::rollapplyr(value, 4, FUN = function(z) any(!is.na(z)), partial = TRUE)
    )
    # # A tibble: 38 x 5
    # # Groups: fund_id, stock_id [4]
    # fund_id stock_id year_q value Indicator2
    # <dbl> <dbl> <chr> <dbl> <lgl>
    # 1 1 1 2011-03 1 TRUE
    # 2 1 1 2011-06 2 TRUE
    # 3 1 1 2011-09 1 TRUE
    # 4 1 1 2011-12 3 TRUE
    # 5 1 1 2012-03 4 TRUE
    # 6 1 1 2012-06 2 TRUE
    # 7 1 1 2012-09 NA TRUE
    # 8 1 1 2012-12 NA TRUE
    # 9 1 1 2013-03 NA TRUE
    # 10 1 1 2013-06 NA FALSE
    # # ... with 28 more rows
  5. Finally, arrange (for convenience only) and convert all is.na(value) to 0.



data.table

library(data.table)
DT <- as.data.table(df)
year_qs <- distinct(
DT[, c("miny", "maxy") := .(min(year_q), max(year_q)), by = .(fund_id)
][, .(fund_id, stock_id, miny, maxy)]
)[, c("miny","maxy") := lapply(.SD, function(z) as.Date(paste0(z, "-01"))), .SDcols = c("miny","maxy")
][, .(year_q = Map(function(a, b) format(seq.Date(a, b, by = "3 months"), format = "%Y-%m"), miny, maxy)),
by = .(fund_id, stock_id)
][, tidyr::unnest(.SD, year_q) ]
setDT(year_qs)
DT[, c("miny", "maxy") := NULL]

DT <- rbindlist(list(
DT,
year_qs[!DT, on = .(fund_id, stock_id, year_q)]),
use.names = TRUE, fill = TRUE)
setorder(DT, fund_id, stock_id, year_q)

DT[, Indicator := zoo::rollapplyr(value, 4, FUN = function(z) any(!is.na(z)), partial = TRUE),
by = .(fund_id, stock_id)
][, value := fcoalesce(value, 0)]

To find the sum using indicator function?

You can define a indicator function Z using ifelse function where Z takes 1 when X>0 and takes 0 when X=0.

Z <- ifelse(X>0 , 1 , 0)

(Z-1)*X+Z*Y

[1] 2.7460 2.7760 2.7850 2.7880 0.0000 2.7780 0.0000 0.0000 2.5460 -1.3201 0.9050 2.7770 -11.7463 0.0000 -0.2417
[16] 0.0000 0.0000 2.3524 1.7479 0.0000

Plotting a function with a indicator part in R

Define your function as func

func <- function(n,x){
k <- seq(1:2^n)
tt <- 2^n *(exp((k+1)/2^n) - exp(k/2^n)) * I(k/2^n < x & x <= (k+1)/2^n)
return(sum(tt))
}

and I think support of that func is in (0,1) so

x <- seq(from = 0, to = 1, by = 0.001)
fx <- sapply(x, function(t) func(3,t))
plot(x, fx)

func

Plotting a Function with Indicator terms

Every case is different, but in your case it is very easy to handle individuals values of x using the following function:

f <- function(t, x) prod((x <= 5) * x / t + (x > 5) * x^2)

It is based on vectorization, e.g. comparison x <= 5 is done element-wise and returns a logical vector.

library(ggplot2)
ts <- seq(1, 2, length = 1000)
data <- data.frame(t = ts, y = sapply(ts, f, x = 1:10))
ggplot(data = data, aes(x = t, y = y)) + geom_line()

Sample Image

Using stat_function we redefine f to take only one argument (t), specify x separately, give a range of values of t, and use Vectorize to vectorize f w.r.t. t:

x <- 1:10
f <- function(t) prod((x <= 5) * x / t + (x > 5) * x^2)

library(ggplot2)
ggplot(data.frame(t = c(1, 2)), aes(t)) +
stat_function(fun = Vectorize(f), geom = "line")


Related Topics



Leave a reply



Submit