How to Add a Cumulative Column to an R Dataframe Using Dplyr

How to add a cumulative column to an R dataframe using dplyr?

Like this?

df <- data.frame(id = rep(1:3, each = 5),
hour = rep(1:5, 3),
value = sample(1:15))

mutate(group_by(df,id), csum=cumsum(value))

Or if you use the dplyr's piping operator:

df %>% group_by(id) %>% mutate(csum = cumsum(value))

Result in both cases:

Source: local data frame [15 x 4]
Groups: id

id hour value csum
1 1 1 4 4
2 1 2 14 18
3 1 3 8 26
4 1 4 2 28
5 1 5 3 31
6 2 1 10 10
7 2 2 7 17
8 2 3 5 22
9 2 4 12 34
10 2 5 9 43
11 3 1 6 6
12 3 2 15 21
13 3 3 1 22
14 3 4 13 35
15 3 5 11 46

a function to add the cumulative sum of multiple columns

You can rewrite colColsum like this:

colCumsum <- function(x) {
check <- sapply(x, is.numeric)
x[paste0(names(x)[check], "_cumsum")] <- lapply(x[check], cumsum)
x
}

Here it is used on your sample data:

colCumsum(df)
# name sex born v4 v5 v6 v7 v4_cumsum v5_cumsum v6_cumsum v7_cumsum
# 1 tim male 1985 5 10 1 0 5 10 1 0
# 2 tom male 1986 4 20 2 0 9 30 3 0
# 3 ben male 1985 3 600 3 20 12 630 6 20
# 4 mary female 1986 2 20 4 4 14 650 10 24
# 5 jane female 1984 1 5 5 60 15 655 15 84

For reference, you can rewrite your loop to just focus on the numeric columns to get it to work:

colCumsum2 <- function(x) { 
for (i in 1:ncol(x)) {
if (is.numeric(x[, i])) {
x[, paste0(names(x)[i], "_cumsum")] <- cumsum(x[, i])
}
}
x
}

Create series of variables that are cumulative sums of other variables using dplyr

You could take rowSums incrementing one column at a time.

df[letters[5:8]] <- do.call(cbind, lapply(seq(ncol(df)), 
function(x) rowSums(df[1:x])))
df

# a b c d e f g h
#1 10 30 5 5 10 40 45 50
#2 10 50 5 5 10 60 65 70
#3 20 60 5 5 20 80 85 90
#4 20 20 1 3 20 40 41 44
#5 30 10 10 5 30 40 50 55

Or if you are interested in a tidyverse solution :

library(dplyr)

df %>%
bind_cols(purrr::map_dfc(seq(ncol(df)),
~df %>% select(1:.x) %>% rowSums) %>% setNames(letters[5:8]))



Related Topics



Leave a reply



Submit