Summing Multiple Columns in an R Data-Frame Quickly

summing multiple columns in an R data-frame quickly

Here's an alternative approach using tidyverse:

library(tidyverse)

# input columns of interest
cols = c("mpg", "cyl", "disp", "hp", "drat")

mtcars %>%
group_by(id = row_number()) %>% # for each row
nest(cols) %>% # nest selected columns
mutate(SUM = map_dbl(data, sum)) # calculate the sum of those columns

# # A tibble: 32 x 3
# id data SUM
# <int> <list> <dbl>
# 1 1 <tibble [1 x 5]> 301.
# 2 2 <tibble [1 x 5]> 301.
# 3 3 <tibble [1 x 5]> 232.
# 4 4 <tibble [1 x 5]> 398.
# 5 5 <tibble [1 x 5]> 565.
# 6 6 <tibble [1 x 5]> 357.
# 7 7 <tibble [1 x 5]> 631.
# 8 8 <tibble [1 x 5]> 241.
# 9 9 <tibble [1 x 5]> 267.
# 10 10 <tibble [1 x 5]> 320.
# # ... with 22 more rows

The output here is a data frame containing the row id (id), the data used at each row (data) and the calculated sum (SUM).

You can get a vector of the calculated SUM if you add ... %>% pull(SUM).

Sum across multiple columns with dplyr

dplyr >= 1.0.0 using across

sum up each row using rowSums (rowwise works for any aggreation, but is slower)

df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))

sum down each column

df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))

dplyr < 1.0.0

sum up each row

df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))

sum down each column using superseeded summarise_all:

df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))

How to sum multiple columns in two data frames in r

Here's a base R option :

tmp <- cbind(df1, df2)
data.frame(sapply(split.default(tmp, names(tmp)), rowSums))

# V1 V2 V3 V4 V5
#1 4 8 5 5 4
#2 6 10 7 7 0

data

df1 < -structure(list(V1 = 2:3, V2 = 4:5, V3 = c(5L, 7L)), 
class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(V1 = 2:3, V5 = c(4L, 0L), V2 = 4:5, V4 = c(5L,
7L)), class = "data.frame", row.names = c(NA, -2L))

sum across multiple columns of a data frame based on multiple patterns R

We can reshape to 'long' format with pivot_longer, and get the sum while reshaping back to 'wide'

library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = starts_with("X"), names_to = "name1") %>%
mutate(name1 = str_remove(name1, "\\.\\d+$")) %>%
pivot_wider(names_from = name1, values_from = value,
values_fn = ~ sum(.x, na.rm = TRUE))

-output

# A tibble: 4 × 3
name X1990 X1991
<chr> <dbl> <dbl>
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15

Or in base R, use split.default to split the data into a list of datasets based on the column name pattern, get the rowSums and cbind with the first column

cbind(df[1], sapply(split.default(df[-1], 
trimws(names(df)[-1], whitespace = "\\.\\d+")), rowSums, na.rm = TRUE))
name X1990 X1991
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15

How to sum data from multiple rows and columns according to unique variable in R?

dat <- tibble::tribble(
~Employee, ~Day, ~Car_Sales, ~Van_Sales, ~Truck_Sales,
"Tim", "1/1", 5, 2, 3,
"Tim", "1/2", 4, 2, 3,
"Tim", "1/3", 7, 1, 3,
"Craig", "1/1", 2, 6, 1,
"Craig", "1/2", 5, 7, 2,
"Samantha", "1/1", 4, 3, 2)

dat %>% mutate(sales = Car_Sales + Van_Sales + Truck_Sales) %>% group_by(Employee) %>% summarise(Avg = mean(sales))

Output should look like this:























EmployeeAvg
Craig11.5
Samantha9.0
Tim10.0

Sum every 3 columns of a dataframe to form new columns

In base R you can do something like this:

num_cols <- df[-c(1:2)]

cbind(df[1:2], do.call(cbind,
lapply(setNames(seq(1,length(num_cols), 3),
paste0("sum", seq(length(num_cols)/3))), \(a) {
apply(num_cols[a:(a + 2)], 1, \(b) sum(as.numeric(gsub(",", "", b))))

})))

Because there are commas, I used gsub to remove them,
setNames is used to give each column a dynamic name,
apply is used within lapply to summarise each row

   ID. Type. sum1 sum2
1 ob1, 1, 6 15
2 ob1, 2, 12 14

sum two columns in R

The sum function will add all numbers together to produce a single number, not a vector (well, at least not a vector of length greater than 1).

It looks as though at least one of your columns is a factor. You could convert them into numeric vectors by checking this

head(as.numeric(data$col1))  # make sure this gives you the right output

And if that looks right, do

data$col1 <- as.numeric(data$col1)
data$col2 <- as.numeric(data$col2)

You might have to convert them into characters first. In which case do

data$col1 <- as.numeric(as.character(data$col1))
data$col2 <- as.numeric(as.character(data$col2))

It's hard to tell which you should do without being able to see your data.

Once the columns are numeric, you just have to do

data$col3 <- data$col1 + data$col2

R: How to sum multiple columns of data frames in a list?

You should not name your list ls, because ls is a function.

lapply(myList, function(x) data.frame(c=x$c, new = rowSums(x[,c("a", "b", "d")], na.rm=T))) 

Here is a solution where you specify the dropped columns only (after edit):

dropped <- c("a", "b", "d")
lapply(myList, function(x) {
x$new <- rowSums(x[,dropped], na.rm=T)
x[!names(x) %in% dropped]
})


Related Topics



Leave a reply



Submit