how to cumulatively add values in one vector in R
Here is the succinct dplyr
solution for the same problem.
NOTE: Make sure that stringsAsFactors = FALSE
while reading in the data.
library(dplyr)
dat %>%
group_by(name, job) %>%
filter(job != "Boss" | year == min(year)) %>%
mutate(cumu_job2 = cumsum(job2))
Output:
id name year job job2 cumu_job2
1 1 Jane 1980 Worker 0 0
2 1 Jane 1981 Manager 1 1
3 1 Jane 1982 Manager 1 2
4 1 Jane 1983 Manager 1 3
5 1 Jane 1984 Manager 1 4
6 1 Jane 1985 Manager 1 5
7 1 Jane 1986 Boss 0 0
8 2 Bob 1985 Worker 0 0
9 2 Bob 1986 Worker 0 0
10 2 Bob 1987 Manager 1 1
11 2 Bob 1988 Boss 0 0
Explanation
- Take the dataset
- Group by name and job
- Filter each group based on condition
- Add
cumu_job2
column.
Vector of cumulative sums in R
Please correct me if I'm misunderstanding, but I believe you simply want this:
I <- cumsum(sqrt(1 - U^2))
It is unclear why you want to use for
loops.
Cumulatively sum a portion of a previous value with its next value
I would use a for
loop to do this. It's important to initialize a vector first, especially if you're working with a large data set.
# initialize
newx <- vector("numeric", length(df$x))
newx[1] <- df$x[1]
for(i in 2:length(df$x)){
newx[i] <- df$x[i] + (0.8 * newx[i-1])
}
newx
# [1] 1.00000 2.80000 5.24000 8.19200 11.55360 15.24288 19.19430 23.35544 27.68435 32.14748
R cumulative sum with condition
A faster C++ version:
library(Rcpp)
Cpp_boundedCumsum <- cppFunction('NumericVector boundedCumsum(NumericVector x){
int n = x.size();
NumericVector out(n);
double tmp;
out[0] = x[0];
for(int i = 1; i < n; ++i){
tmp = out[i-1] + x[i];
if(tmp < 0.0 || tmp > 1.0)
out[i] = out[i-1];
else
out[i] = tmp;
}
return out;
}')
Comparison with R version:
R_boundedCumsum <- function(x){
for (i in 2:length(x)){
x[i] <- x[i-1]+x[i]
if(x[i]<0 || x[i]>1)
x[i] <- x[i-1]
}
x
}
x <- runif(1000)
all.equal(R_boundedCumsum(x), Cpp_boundedCumsum(x))
[1] TRUE
library(microbenchmark)
microbenchmark(R_boundedCumsum(x), Cpp_boundedCumsum(x))
Unit: microseconds
expr min lq mean median uq max neval
R_boundedCumsum(x) 2062.629 2262.2225 2460.65661 2319.358 2562.256 4112.540 100
Cpp_boundedCumsum(x) 3.636 4.3475 7.06454 5.792 9.127 25.703 100
How can I cumulatively apply a custom function to a vector in R? In an efficient and idiomatic way?
You can use this approach:
set.seed(42)
df <- data.frame(measurement = rnorm(1000))
res <- sapply(seq(nrow(df)), function(x)
quantile(df[seq(x), "measurement"], c(.01, .99)))
It creates a matrix with nrow(df)
columns and 2 rows, one row for the 1st percentile and one row for the 99th percentile.
You can add this information to you data frame df
(as two olumns):
df <- setNames(cbind(df, t(res)), c(names(df), "lower", "upper"))
Calculating cumulative sum for each row
You want cumsum()
df <- within(df, acc_sum <- cumsum(count))
Cumulative vector in data table
This is Henrik's answer (and if they come back, I'll happy give this answer to them ... somehow):
dat[, res := .(Reduce(c, j, accumulate=TRUE)), by = gr]
# j gr res
# <num> <num> <list>
# 1: 3 9 3
# 2: 8 9 3,8
# 3: 9 9 3,8,9
# 4: 11 9 3, 8, 9,11
# 5: 10 10 10
# 6: 28 10 10,28
Reduce
is similar to sapply
except that it operates on the current value and results of the previous operation. For instance, we can see
sapply(1:3, function(z) z*2)
# [1] 2 4 6
This, unrolled, equates to
1*2 # 2
2*2 # 4
3*2 # 6
That is, the calculation on one element of the vector/list is completely independent, never knowing the results from previous iterations.
However, Reduce
is explicitly given the results of the previous calculation. By default, it will only return the last calculation, which would be analogous to tail(sapply(...), 1)
:
Reduce(function(prev, this) prev + this*2, 11:13)
# [1] 61
That seems a bit obscure ... let's look at all of the interim steps, where the answer above is the last:
Reduce(function(prev, this) prev + this*2, 11:13, accumulate = TRUE)
# [1] 11 35 61
In this case (without specifying init=
, wait for it), the first result is just the first value in x=
, not run through the function. If we unroll this, we'll see
11 # 11 is the first value in x
_________/
/
v
11 + 12*2 # 35
35 + 13*2 # 61
Sometimes we need the first value in x=
to be run through the function, with a starting condition (a first-time value for prev
when we don't have a previous iteration to use). For that, we can use init=
; we can think of the use of init=
by looking at two perfectly-equivalent calls:
Reduce(function(prev, this) prev + this*2, 11:13, accumulate = TRUE)
Reduce(function(prev, this) prev + this*2, 12:13, init = 11, accumulate = TRUE)
# [1] 11 35 61
(Without init=
, Reduce will take the first element of x=
and assign it to init=
and remove it from x=
.)
Now let's say we want the starting condition (injected "previous" value) to be 0, then we would do
Reduce(function(prev, this) prev + this*2, 11:13, init = 0, accumulate = TRUE)
# [1] 0 22 46 72
### unrolled
0 # 0 is the init= value
________/
/
v
0 + 11*2 # 22
22 + 12*2 # 46
46 + 13*2 # 72
Let's bring that back to this question and this data. I'll inject a browser()
and change the function a little so that we can look at all intermediate values.
> dat[, res := .(Reduce(function(prev, this) { browser(); c(prev, this); }, j, accumulate=TRUE)), by = gr]
Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev # group `gr=9`, row 2
[1] 3
Browse[2]> this
[1] 8
Browse[2]> c(prev, this)
[1] 3 8
Browse[2]> c # 'c'ontinue
Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev # group `gr=9`, row 3
[1] 3 8
Browse[2]> this
[1] 9
Browse[2]> c(prev, this)
[1] 3 8 9
Browse[2]> c # 'c'ontinue
Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev # group `gr=9`, row 4
[1] 3 8 9
Browse[2]> this
[1] 11
Browse[2]> c(prev, this)
[1] 3 8 9 11
Browse[2]> c # 'c'ontinue
Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev # group `gr=10`, row 6
[1] 10
Browse[2]> this
[1] 28
Browse[2]> c(prev, this)
[1] 10 28
Browse[2]> c # 'c'ontinue
Notice how we didn't "see" rows 1 or 5, since they were the init=
conditions for the reduction (the first prev
value seen in each group).
Reduce
can be a difficult function to visualize and work with. When I use it, I almost always pre-insert browser()
into the anon-function and walk through the first three steps: the first to ensure the init=
is correct, the second to make sure the anon-function is doing what I think I want with the init and next value, and the third to make sure that it continues properly. This is similar to "Proof by Deduction": the n
th calc will be correct because we know the (n-1)th
calc is correct.
Calculate cumulative sum (cumsum) by group
df$csum <- ave(df$value, df$id, FUN=cumsum)
ave
is the "go-to" function if you want a by-group vector of equal length to an existing vector and it can be computed from those sub vectors alone. If you need by-group processing based on multiple "parallel" values, the base strategy is do.call(rbind, by(dfrm, grp, FUN))
.
How to calculate cumulative sum?
# replace the second column for the cumsum of the initial second column
data[, 2] <- cumsum(data[, 2])
Related Topics
Change the Position of the Strip Label in Ggplot from the Top to the Bottom
Can't Loop with R's Leaflet Package to Produce Multiple Maps
Format Numbers to Significant Figures Nicely in R
How to Add Rtools\Bin to the System Path in R
What Evaluates to True/False in R
R Shiny Table Not Rendering HTML
How to Combine Ggplot and Dplyr into a Function
Plotting During a Loop in Rstudio
Ggplot2 Make Missing Value in Geom_Tile Not Blank
How to Use Cast or Another Function to Create a Binary Table in R
Find All Unique Values in Column Separated by Comma
R: How to Sum Columns Grouped by a Factor
Converting Data Frame Column from Character to Numeric