Sum Across Multiple Columns With Dplyr

Sum across multiple columns with dplyr

dplyr >= 1.0.0 using across

sum up each row using rowSums (rowwise works for any aggreation, but is slower)

df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))

sum down each column

df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))

dplyr < 1.0.0

sum up each row

df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))

sum down each column using superseeded summarise_all:

df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))

Conditional sum across multiple columns using dplyr?

Get the data in long format, for each location, season and Species sum the values and remove rows which have 0 values.

library(dplyr)

df %>%
tidyr::pivot_longer(cols = cat:dog, names_to = 'Species') %>%
group_by(location, season, Species) %>%
summarise(value = sum(value)) %>%
ungroup %>%
filter(value > 0)

# location season Species value
# <chr> <chr> <chr> <dbl>
# 1 A 2 cat 2
# 2 A 3 cat 1
# 3 A 3 dog 1
# 4 A 4 cat 1
# 5 A 4 dog 1
# 6 B 2 dog 1
# 7 B 3 cat 1
# 8 B 3 dog 1
# 9 C 1 cat 1
#10 C 2 cat 1
#11 C 2 dog 1
#12 D 4 cat 1
#13 D 4 dog 2

R: Summing a sequence of columns row-wise with dplyr

You can use rowSums to do that:

# option 1
df_abc %>% mutate(sum_1 = rowSums(.[3:6]))
# option 2
df_abc %>% mutate(sum_1 = rowSums(select(.,orfOiRFj:DJHhhjhF)))

The result:

# A tibble: 100 x 9
FJDFjdfF FfdfFxfj orfOiRFj xDGHdj jfdIDFF DJHhhjhF KhjhjFlFLF IgiGJIJFG sum_1
<int> <int> <int> <int> <int> <int> <int> <int> <dbl>
1 1 1 1 1 1 1 1 1 4
2 2 2 2 2 2 2 2 2 8
3 3 3 3 3 3 3 3 3 12
4 4 4 4 4 4 4 4 4 16
5 5 5 5 5 5 5 5 5 20
6 6 6 6 6 6 6 6 6 24
7 7 7 7 7 7 7 7 7 28
8 8 8 8 8 8 8 8 8 32
9 9 9 9 9 9 9 9 9 36
10 10 10 10 10 10 10 10 10 40
# ... with 90 more rows

sum of multiple columns using group_by function

You can use across() inside summarise instead of summarise_at.

From ?summarize_at we can see that:

Scoped verbs (_if, _at, _all) have been superseded by the use of across() in an existing verb. See vignette("colwise") for details.

library(dplyr)

CPUE_deelgebied %>%
group_by(Gebied, Datum) %>%
summarise(across(c(`Som vangtuigen`, `Som van Aantal`), sum))

# A tibble: 24 x 4
# Groups: Gebied [2]
Gebied Datum `Som vangtuigen` `Som van Aantal`
<chr> <date> <dbl> <dbl>
1 Oost 2021-02-25 100 321.
2 Oost 2021-03-25 100 267
3 Oost 2021-05-07 92 139
4 Oost 2021-06-02 100 245
5 Oost 2021-06-25 96 256.
6 Oost 2021-07-23 100 213
7 Oost 2021-08-27 100 295.
8 Oost 2021-09-24 96 335.
9 Oost 2021-10-29 98 264
10 Oost 2021-11-23 100 151.
# ... with 14 more rows

Add multiple columns with the same group and sum

If I have understood you well, this will solve your problem:

narc_auth_total <- 
narc_auth %>%
group_by(Full.Name) %>%
summarise(
`2019_words` = sum(`2019`),
`2020_words` = sum(`2020`)
) %>%
left_join(totaltweetsyear, ., by = "Full.Name")

Mutate across multiple columns using dplyr

Two possibilities using dplyr:

library(dplyr)

mtcars %>%
rowwise() %>%
mutate(varmean = mean(c_across(mpg:vs)))

This returns

# A tibble: 32 x 12
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb varmean
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 40.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 40.1
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 31.7
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 52.8
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 73.2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 47.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 81.2
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 33.1
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 36.7
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 42.8
# ... with 22 more rows

and without rowwise() and using base Rs rowMeans():

mtcars %>% 
mutate(varmean = rowMeans(across(mpg:vs)))

returns

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb  varmean
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 39.99750
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 40.09938
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 31.69750
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 52.76687
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 73.16375
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 47.69250
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 81.24000
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 33.12250
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 36.69625
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 42.80750

summing multiple columns in an R data-frame quickly

Here's an alternative approach using tidyverse:

library(tidyverse)

# input columns of interest
cols = c("mpg", "cyl", "disp", "hp", "drat")

mtcars %>%
group_by(id = row_number()) %>% # for each row
nest(cols) %>% # nest selected columns
mutate(SUM = map_dbl(data, sum)) # calculate the sum of those columns

# # A tibble: 32 x 3
# id data SUM
# <int> <list> <dbl>
# 1 1 <tibble [1 x 5]> 301.
# 2 2 <tibble [1 x 5]> 301.
# 3 3 <tibble [1 x 5]> 232.
# 4 4 <tibble [1 x 5]> 398.
# 5 5 <tibble [1 x 5]> 565.
# 6 6 <tibble [1 x 5]> 357.
# 7 7 <tibble [1 x 5]> 631.
# 8 8 <tibble [1 x 5]> 241.
# 9 9 <tibble [1 x 5]> 267.
# 10 10 <tibble [1 x 5]> 320.
# # ... with 22 more rows

The output here is a data frame containing the row id (id), the data used at each row (data) and the calculated sum (SUM).

You can get a vector of the calculated SUM if you add ... %>% pull(SUM).



Related Topics



Leave a reply



Submit