How to Sum a Variable by Group

How to sum a variable by group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))

R sum a variable by two groups

You can group_by ID and Year then use sum within summarise

library(dplyr)

txt <- "ID Year Amount
3 2000 45
3 2000 55
3 2002 10
3 2002 10
3 2004 30
4 2000 25
4 2002 40
4 2002 15
4 2004 45
4 2004 50"

df <- read.table(text = txt, header = TRUE)

df %>%
group_by(ID, Year) %>%
summarise(Total = sum(Amount, na.rm = TRUE))
#> # A tibble: 6 x 3
#> # Groups: ID [?]
#> ID Year Total
#> <int> <int> <int>
#> 1 3 2000 100
#> 2 3 2002 20
#> 3 3 2004 30
#> 4 4 2000 25
#> 5 4 2002 55
#> 6 4 2004 95

If you have more than one Amount column & want to apply more than one function, you can use either summarise_if or summarise_all

df %>% 
group_by(ID, Year) %>%
summarise_if(is.numeric, funs(sum, mean))
#> # A tibble: 6 x 4
#> # Groups: ID [?]
#> ID Year sum mean
#> <int> <int> <int> <dbl>
#> 1 3 2000 100 50
#> 2 3 2002 20 10
#> 3 3 2004 30 30
#> 4 4 2000 25 25
#> 5 4 2002 55 27.5
#> 6 4 2004 95 47.5

df %>%
group_by(ID, Year) %>%
summarise_all(funs(sum, mean, max, min))
#> # A tibble: 6 x 6
#> # Groups: ID [?]
#> ID Year sum mean max min
#> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 3 2000 100 50 55 45
#> 2 3 2002 20 10 10 10
#> 3 3 2004 30 30 30 30
#> 4 4 2000 25 25 25 25
#> 5 4 2002 55 27.5 40 15
#> 6 4 2004 95 47.5 50 45

Created on 2018-09-19 by the reprex package (v0.2.1.9000)

How to sum a variable on other aggregated variables, whilst keeping remaining variables in R?

It works for me when literally specifying that you want the first value, i.e.:

library(tidyverse)
df %>%
group_by(set1, set2) %>%
summarize(y = sum(y),
row = row[1],
set3 = set3[1])

A tibble: 5 x 5
# Groups: set1 [3]
set1 set2 y row set3
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 1 1
2 1 2 6 4 2
3 2 1 6 7 4
4 2 2 3 9 5
5 3 1 4 10 5

Edit: To keep every other column without specifying, you can make use of across() and indicate that you want to apply this aggregation to every column except one.

df %>%
group_by(set1, set2) %>%
summarize(
across(!y, ~ .x[1]),
y = sum(y)
)

# A tibble: 5 x 5
# Groups: set1 [3]
set1 set2 row set3 y
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 3
2 1 2 4 2 6
3 2 1 7 4 6
4 2 2 9 5 3
5 3 1 10 5 4

How to sum a variable by group with NA?

One option is to convert the blanks to NA, then fill replace the NA elements with non-NA adjacent elements above in 'ID', grouped by 'ID', get the sum of 'Number'

library(tidyverse)
df1 %>%
mutate(ID = na_if(ID, "")) %>%
fill(ID) %>%
group_by(ID) %>%
summarise(Number = sum(Number))
# A tibble: 4 x 2
# ID Number
# <chr> <int>
#1 104 61
#2 153 120
#3 254 31
#4 266 31

Or without using fill, create a grouping variable with a logical expression and cumsum, and then do the sum

df1 %>% 
group_by(grp = cumsum(ID != "")) %>%
summarise(ID = first(ID), Number = sum(Number)) %>%
select(-grp)

data

df1 <- structure(list(ID = c("153", "", "", "", "104", "", "254", "266"
), Number = c(31L, 28L, 31L, 30L, 31L, 30L, 31L, 31L)), row.names = c(NA,
-8L), class = "data.frame")


Related Topics



Leave a reply



Submit