How to Cumulatively Add Values in One Vector in R

how to cumulatively add values in one vector in R

Here is the succinct dplyr solution for the same problem.

NOTE: Make sure that stringsAsFactors = FALSE while reading in the data.

library(dplyr)
dat %>%
  group_by(name, job) %>%
  filter(job != "Boss" | year == min(year)) %>%
  mutate(cumu_job2 = cumsum(job2))

Output:

   id name year     job job2 cumu_job2
1   1 Jane 1980  Worker    0         0
2   1 Jane 1981 Manager    1         1
3   1 Jane 1982 Manager    1         2
4   1 Jane 1983 Manager    1         3
5   1 Jane 1984 Manager    1         4
6   1 Jane 1985 Manager    1         5
7   1 Jane 1986    Boss    0         0
8   2  Bob 1985  Worker    0         0
9   2  Bob 1986  Worker    0         0
10  2  Bob 1987 Manager    1         1
11  2  Bob 1988    Boss    0         0

Explanation

Take the dataset
Group by name and job
Filter each group based on condition
Add cumu_job2 column.

Vector of cumulative sums in R

Please correct me if I'm misunderstanding, but I believe you simply want this:

I <- cumsum(sqrt(1 - U^2))

It is unclear why you want to use for loops.

Cumulatively sum a portion of a previous value with its next value

I would use a for loop to do this. It's important to initialize a vector first, especially if you're working with a large data set.

# initialize
newx <- vector("numeric", length(df$x))
newx[1] <- df$x[1]

for(i in 2:length(df$x)){
  newx[i] <- df$x[i] + (0.8 * newx[i-1])
}

newx
# [1]  1.00000  2.80000  5.24000  8.19200 11.55360 15.24288 19.19430 23.35544 27.68435 32.14748

R cumulative sum with condition

A faster C++ version:

library(Rcpp)
Cpp_boundedCumsum <- cppFunction('NumericVector boundedCumsum(NumericVector x){
  int n = x.size();
  NumericVector out(n);
  double tmp;
  out[0] = x[0];
  for(int i = 1; i < n; ++i){
     tmp = out[i-1] + x[i];
     if(tmp < 0.0 || tmp > 1.0) 
        out[i] = out[i-1];
     else 
        out[i] = tmp;
  }
  return out;
}')

Comparison with R version:

R_boundedCumsum <- function(x){ 
    for (i in 2:length(x)){
        x[i] <- x[i-1]+x[i]
        if(x[i]<0 || x[i]>1) 
            x[i] <- x[i-1]
    }
    x
}

x <- runif(1000)
all.equal(R_boundedCumsum(x), Cpp_boundedCumsum(x))
[1] TRUE

library(microbenchmark)
microbenchmark(R_boundedCumsum(x), Cpp_boundedCumsum(x))
Unit: microseconds
                 expr      min        lq       mean   median       uq      max neval
   R_boundedCumsum(x) 2062.629 2262.2225 2460.65661 2319.358 2562.256 4112.540   100
 Cpp_boundedCumsum(x)    3.636    4.3475    7.06454    5.792    9.127   25.703   100

How can I cumulatively apply a custom function to a vector in R? In an efficient and idiomatic way?

You can use this approach:

set.seed(42)
df <- data.frame(measurement = rnorm(1000))

res <- sapply(seq(nrow(df)), function(x) 
  quantile(df[seq(x), "measurement"], c(.01, .99)))

It creates a matrix with nrow(df) columns and 2 rows, one row for the 1st percentile and one row for the 99th percentile.

You can add this information to you data frame df (as two olumns):

df <- setNames(cbind(df, t(res)), c(names(df), "lower", "upper"))

Calculating cumulative sum for each row

You want cumsum()

df <- within(df, acc_sum <- cumsum(count))

Cumulative vector in data table

This is Henrik's answer (and if they come back, I'll happy give this answer to them ... somehow):

dat[, res := .(Reduce(c, j, accumulate=TRUE)), by = gr]
#        j    gr         res
#    <num> <num>      <list>
# 1:     3     9           3
# 2:     8     9         3,8
# 3:     9     9       3,8,9
# 4:    11     9  3, 8, 9,11
# 5:    10    10          10
# 6:    28    10       10,28

Reduce is similar to sapply except that it operates on the current value and results of the previous operation. For instance, we can see

sapply(1:3, function(z) z*2)
# [1] 2 4 6

This, unrolled, equates to

1*2 # 2
2*2 # 4
3*2 # 6

That is, the calculation on one element of the vector/list is completely independent, never knowing the results from previous iterations.

However, Reduce is explicitly given the results of the previous calculation. By default, it will only return the last calculation, which would be analogous to tail(sapply(...), 1):

Reduce(function(prev, this) prev + this*2, 11:13)
# [1] 61

That seems a bit obscure ... let's look at all of the interim steps, where the answer above is the last:

Reduce(function(prev, this) prev + this*2, 11:13, accumulate = TRUE)
# [1] 11 35 61

In this case (without specifying init=, wait for it), the first result is just the first value in x=, not run through the function. If we unroll this, we'll see

11        # 11 is the first value in x
   _________/
  /
 v
11 + 12*2 # 35
35 + 13*2 # 61

Sometimes we need the first value in x= to be run through the function, with a starting condition (a first-time value for prev when we don't have a previous iteration to use). For that, we can use init=; we can think of the use of init= by looking at two perfectly-equivalent calls:

Reduce(function(prev, this) prev + this*2, 11:13, accumulate = TRUE)
Reduce(function(prev, this) prev + this*2, 12:13, init = 11, accumulate = TRUE)
# [1] 11 35 61

(Without init=, Reduce will take the first element of x= and assign it to init= and remove it from x=.)

Now let's say we want the starting condition (injected "previous" value) to be 0, then we would do

Reduce(function(prev, this) prev + this*2, 11:13, init = 0, accumulate = TRUE)
# [1]  0 22 46 72

### unrolled
 0        # 0 is the init= value
   ________/
  /
 v
 0 + 11*2 # 22
22 + 12*2 # 46
46 + 13*2 # 72

Let's bring that back to this question and this data. I'll inject a browser() and change the function a little so that we can look at all intermediate values.

> dat[, res := .(Reduce(function(prev, this) { browser(); c(prev, this); }, j, accumulate=TRUE)), by = gr]
Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev                                    # group `gr=9`, row 2
[1] 3
Browse[2]> this
[1] 8
Browse[2]> c(prev, this)
[1] 3 8
Browse[2]> c                                       # 'c'ontinue

Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev                                    # group `gr=9`, row 3
[1] 3 8
Browse[2]> this
[1] 9
Browse[2]> c(prev, this)
[1] 3 8 9
Browse[2]> c                                       # 'c'ontinue

Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev                                    # group `gr=9`, row 4
[1] 3 8 9
Browse[2]> this
[1] 11
Browse[2]> c(prev, this)
[1]  3  8  9 11
Browse[2]> c                                       # 'c'ontinue

Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev                                    # group `gr=10`, row 6
[1] 10
Browse[2]> this
[1] 28
Browse[2]> c(prev, this)
[1] 10 28
Browse[2]> c                                       # 'c'ontinue

Notice how we didn't "see" rows 1 or 5, since they were the init= conditions for the reduction (the first prev value seen in each group).

Reduce can be a difficult function to visualize and work with. When I use it, I almost always pre-insert browser() into the anon-function and walk through the first three steps: the first to ensure the init= is correct, the second to make sure the anon-function is doing what I think I want with the init and next value, and the third to make sure that it continues properly. This is similar to "Proof by Deduction": the nth calc will be correct because we know the (n-1)th calc is correct.

Calculate cumulative sum (cumsum) by group

df$csum <- ave(df$value, df$id, FUN=cumsum)

ave is the "go-to" function if you want a by-group vector of equal length to an existing vector and it can be computed from those sub vectors alone. If you need by-group processing based on multiple "parallel" values, the base strategy is do.call(rbind, by(dfrm, grp, FUN)).

How to calculate cumulative sum?

# replace the second column for the cumsum of the initial second column
data[, 2] <- cumsum(data[, 2])

How to Cumulatively Add Values in One Vector in R