Sum across multiple columns with dplyr
dplyr >= 1.0.0 using across
sum up each row using rowSums
(rowwise
works for any aggreation, but is slower)
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))
sum down each column
df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))
dplyr < 1.0.0
sum up each row
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))
sum down each column using superseeded summarise_all
:
df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))
Conditional sum across multiple columns using dplyr?
Get the data in long format, for each location
, season
and Species
sum
the values and remove rows which have 0 values.
library(dplyr)
df %>%
tidyr::pivot_longer(cols = cat:dog, names_to = 'Species') %>%
group_by(location, season, Species) %>%
summarise(value = sum(value)) %>%
ungroup %>%
filter(value > 0)
# location season Species value
# <chr> <chr> <chr> <dbl>
# 1 A 2 cat 2
# 2 A 3 cat 1
# 3 A 3 dog 1
# 4 A 4 cat 1
# 5 A 4 dog 1
# 6 B 2 dog 1
# 7 B 3 cat 1
# 8 B 3 dog 1
# 9 C 1 cat 1
#10 C 2 cat 1
#11 C 2 dog 1
#12 D 4 cat 1
#13 D 4 dog 2
R: Summing a sequence of columns row-wise with dplyr
You can use rowSums
to do that:
# option 1
df_abc %>% mutate(sum_1 = rowSums(.[3:6]))
# option 2
df_abc %>% mutate(sum_1 = rowSums(select(.,orfOiRFj:DJHhhjhF)))
The result:
# A tibble: 100 x 9
FJDFjdfF FfdfFxfj orfOiRFj xDGHdj jfdIDFF DJHhhjhF KhjhjFlFLF IgiGJIJFG sum_1
<int> <int> <int> <int> <int> <int> <int> <int> <dbl>
1 1 1 1 1 1 1 1 1 4
2 2 2 2 2 2 2 2 2 8
3 3 3 3 3 3 3 3 3 12
4 4 4 4 4 4 4 4 4 16
5 5 5 5 5 5 5 5 5 20
6 6 6 6 6 6 6 6 6 24
7 7 7 7 7 7 7 7 7 28
8 8 8 8 8 8 8 8 8 32
9 9 9 9 9 9 9 9 9 36
10 10 10 10 10 10 10 10 10 40
# ... with 90 more rows
sum of multiple columns using group_by function
You can use across()
inside summarise
instead of summarise_at
.
From ?summarize_at
we can see that:
Scoped verbs (_if, _at, _all) have been superseded by the use of across() in an existing verb. See vignette("colwise") for details.
library(dplyr)
CPUE_deelgebied %>%
group_by(Gebied, Datum) %>%
summarise(across(c(`Som vangtuigen`, `Som van Aantal`), sum))
# A tibble: 24 x 4
# Groups: Gebied [2]
Gebied Datum `Som vangtuigen` `Som van Aantal`
<chr> <date> <dbl> <dbl>
1 Oost 2021-02-25 100 321.
2 Oost 2021-03-25 100 267
3 Oost 2021-05-07 92 139
4 Oost 2021-06-02 100 245
5 Oost 2021-06-25 96 256.
6 Oost 2021-07-23 100 213
7 Oost 2021-08-27 100 295.
8 Oost 2021-09-24 96 335.
9 Oost 2021-10-29 98 264
10 Oost 2021-11-23 100 151.
# ... with 14 more rows
Add multiple columns with the same group and sum
If I have understood you well, this will solve your problem:
narc_auth_total <-
narc_auth %>%
group_by(Full.Name) %>%
summarise(
`2019_words` = sum(`2019`),
`2020_words` = sum(`2020`)
) %>%
left_join(totaltweetsyear, ., by = "Full.Name")
Mutate across multiple columns using dplyr
Two possibilities using dplyr
:
library(dplyr)
mtcars %>%
rowwise() %>%
mutate(varmean = mean(c_across(mpg:vs)))
This returns
# A tibble: 32 x 12
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb varmean
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 40.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 40.1
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 31.7
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 52.8
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 73.2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 47.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 81.2
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 33.1
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 36.7
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 42.8
# ... with 22 more rows
and without rowwise()
and using base R
s rowMeans()
:
mtcars %>%
mutate(varmean = rowMeans(across(mpg:vs)))
returns
mpg cyl disp hp drat wt qsec vs am gear carb varmean
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 39.99750
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 40.09938
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 31.69750
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 52.76687
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 73.16375
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 47.69250
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 81.24000
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 33.12250
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 36.69625
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 42.80750
summing multiple columns in an R data-frame quickly
Here's an alternative approach using tidyverse
:
library(tidyverse)
# input columns of interest
cols = c("mpg", "cyl", "disp", "hp", "drat")
mtcars %>%
group_by(id = row_number()) %>% # for each row
nest(cols) %>% # nest selected columns
mutate(SUM = map_dbl(data, sum)) # calculate the sum of those columns
# # A tibble: 32 x 3
# id data SUM
# <int> <list> <dbl>
# 1 1 <tibble [1 x 5]> 301.
# 2 2 <tibble [1 x 5]> 301.
# 3 3 <tibble [1 x 5]> 232.
# 4 4 <tibble [1 x 5]> 398.
# 5 5 <tibble [1 x 5]> 565.
# 6 6 <tibble [1 x 5]> 357.
# 7 7 <tibble [1 x 5]> 631.
# 8 8 <tibble [1 x 5]> 241.
# 9 9 <tibble [1 x 5]> 267.
# 10 10 <tibble [1 x 5]> 320.
# # ... with 22 more rows
The output here is a data frame containing the row id (id
), the data used at each row (data
) and the calculated sum (SUM
).
You can get a vector of the calculated SUM
if you add ... %>% pull(SUM)
.
Related Topics
Convert Multiple Columns of Numeric Data to Dates in R
Easier Way to Use Grepl and Ifelse Across Multiple Columns
Minimum (Or Maximum) Value of Each Row Across Multiple Columns
Quickly Reading Very Large Tables as Dataframes
Apply Several Summary Functions on Several Variables by Group in One Call
Relative Frequencies/Proportions With Dplyr
Replace Values in a Dataframe Based on Lookup Table
How Can Two Strings Be Concatenated
Removing Duplicate Combinations (Irrespective of Order)
Force R Not to Use Exponential Notation (E.G. E+10)
Adding Some Space Between the X-Axis and the Bars, in Ggplot
Splitting a Large Data Frame into Smaller Segments
How to Generate the First N Terms in the Series:
Faster Ways to Calculate Frequencies and Cast from Long to Wide
Pass a Data.Frame Column Name to a Function
Replace Missing Values (Na) With Most Recent Non-Na by Group
Geographic/Geospatial Distance Between 2 Lists of Lat/Lon Points (Coordinates)
Order Data Frame Rows According to Vector With Specific Order