summing multiple columns in an R data-frame quickly
Here's an alternative approach using tidyverse
:
library(tidyverse)
# input columns of interest
cols = c("mpg", "cyl", "disp", "hp", "drat")
mtcars %>%
group_by(id = row_number()) %>% # for each row
nest(cols) %>% # nest selected columns
mutate(SUM = map_dbl(data, sum)) # calculate the sum of those columns
# # A tibble: 32 x 3
# id data SUM
# <int> <list> <dbl>
# 1 1 <tibble [1 x 5]> 301.
# 2 2 <tibble [1 x 5]> 301.
# 3 3 <tibble [1 x 5]> 232.
# 4 4 <tibble [1 x 5]> 398.
# 5 5 <tibble [1 x 5]> 565.
# 6 6 <tibble [1 x 5]> 357.
# 7 7 <tibble [1 x 5]> 631.
# 8 8 <tibble [1 x 5]> 241.
# 9 9 <tibble [1 x 5]> 267.
# 10 10 <tibble [1 x 5]> 320.
# # ... with 22 more rows
The output here is a data frame containing the row id (id
), the data used at each row (data
) and the calculated sum (SUM
).
You can get a vector of the calculated SUM
if you add ... %>% pull(SUM)
.
Sum across multiple columns with dplyr
dplyr >= 1.0.0 using across
sum up each row using rowSums
(rowwise
works for any aggreation, but is slower)
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))
sum down each column
df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))
dplyr < 1.0.0
sum up each row
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))
sum down each column using superseeded summarise_all
:
df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))
How to sum multiple columns in two data frames in r
Here's a base R option :
tmp <- cbind(df1, df2)
data.frame(sapply(split.default(tmp, names(tmp)), rowSums))
# V1 V2 V3 V4 V5
#1 4 8 5 5 4
#2 6 10 7 7 0
data
df1 < -structure(list(V1 = 2:3, V2 = 4:5, V3 = c(5L, 7L)),
class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(V1 = 2:3, V5 = c(4L, 0L), V2 = 4:5, V4 = c(5L,
7L)), class = "data.frame", row.names = c(NA, -2L))
sum across multiple columns of a data frame based on multiple patterns R
We can reshape to 'long' format with pivot_longer
, and get the sum
while reshaping back to 'wide'
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = starts_with("X"), names_to = "name1") %>%
mutate(name1 = str_remove(name1, "\\.\\d+$")) %>%
pivot_wider(names_from = name1, values_from = value,
values_fn = ~ sum(.x, na.rm = TRUE))
-output
# A tibble: 4 × 3
name X1990 X1991
<chr> <dbl> <dbl>
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15
Or in base R
, use split.default
to split the data into a list
of datasets based on the column name pattern, get the rowSums
and cbind
with the first column
cbind(df[1], sapply(split.default(df[-1],
trimws(names(df)[-1], whitespace = "\\.\\d+")), rowSums, na.rm = TRUE))
name X1990 X1991
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15
How to sum data from multiple rows and columns according to unique variable in R?
dat <- tibble::tribble(
~Employee, ~Day, ~Car_Sales, ~Van_Sales, ~Truck_Sales,
"Tim", "1/1", 5, 2, 3,
"Tim", "1/2", 4, 2, 3,
"Tim", "1/3", 7, 1, 3,
"Craig", "1/1", 2, 6, 1,
"Craig", "1/2", 5, 7, 2,
"Samantha", "1/1", 4, 3, 2)
dat %>% mutate(sales = Car_Sales + Van_Sales + Truck_Sales) %>% group_by(Employee) %>% summarise(Avg = mean(sales))
Output should look like this:
Employee | Avg |
---|---|
Craig | 11.5 |
Samantha | 9.0 |
Tim | 10.0 |
Sum every 3 columns of a dataframe to form new columns
In base R you can do something like this:
num_cols <- df[-c(1:2)]
cbind(df[1:2], do.call(cbind,
lapply(setNames(seq(1,length(num_cols), 3),
paste0("sum", seq(length(num_cols)/3))), \(a) {
apply(num_cols[a:(a + 2)], 1, \(b) sum(as.numeric(gsub(",", "", b))))
})))
Because there are commas, I used gsub
to remove them,setNames
is used to give each column a dynamic name,apply
is used within lapply
to summarise each row
ID. Type. sum1 sum2
1 ob1, 1, 6 15
2 ob1, 2, 12 14
sum two columns in R
The sum
function will add all numbers together to produce a single number, not a vector (well, at least not a vector of length greater than 1).
It looks as though at least one of your columns is a factor. You could convert them into numeric vectors by checking this
head(as.numeric(data$col1)) # make sure this gives you the right output
And if that looks right, do
data$col1 <- as.numeric(data$col1)
data$col2 <- as.numeric(data$col2)
You might have to convert them into characters first. In which case do
data$col1 <- as.numeric(as.character(data$col1))
data$col2 <- as.numeric(as.character(data$col2))
It's hard to tell which you should do without being able to see your data.
Once the columns are numeric, you just have to do
data$col3 <- data$col1 + data$col2
R: How to sum multiple columns of data frames in a list?
You should not name your list ls, because ls is a function.
lapply(myList, function(x) data.frame(c=x$c, new = rowSums(x[,c("a", "b", "d")], na.rm=T)))
Here is a solution where you specify the dropped columns only (after edit):
dropped <- c("a", "b", "d")
lapply(myList, function(x) {
x$new <- rowSums(x[,dropped], na.rm=T)
x[!names(x) %in% dropped]
})
Related Topics
How to Fix Degree Symbol Not Showing Correctly in R on Linux/Fedora 31
How to Keep Track of Total Transaction Amount Sent from an Account Each Last 6 Month
Change Distance Between X-Axis Ticks in Ggplot2
How to Rotate 3D Plotly Continuous for R Shiny App
Data.Table Objects Aren't Updated in Rstudio Environment Panel
How Could I Find The Growth Rate of Gdp
How Could I Find The Growth Rate of Gdp
Horizontal Rule in R Markdown/Bookdown Causing Errors
Can't Install Any R Packages on Linux Server
How to Apply Histogram on Dependent Data in R
Split Line by Multiple Points Using Sf Package
Using Mutate Rowwise Over a Subset of Columns
How to Split a Dataframe Column by The First Instance of a Character in Its Values