How to Sum a Variable by Group

How to sum a variable by group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
  Category  x
1    First 30
2   Second  5
3    Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
 First Second  Third 
    30      5     34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                      "Third", "Third", "Second")), 
                    Frequency=c(10,15,5,2,14,20,3))

R sum a variable by two groups

You can group_by ID and Year then use sum within summarise

library(dplyr)

txt <- "ID Year Amount
3 2000 45
3 2000 55
3 2002 10
3 2002 10
3 2004 30
4 2000 25
4 2002 40
4 2002 15
4 2004 45
4 2004 50"

df <- read.table(text = txt, header = TRUE)

df %>% 
  group_by(ID, Year) %>% 
  summarise(Total = sum(Amount, na.rm = TRUE))
#> # A tibble: 6 x 3
#> # Groups:   ID [?]
#>      ID  Year Total
#>   <int> <int> <int>
#> 1     3  2000   100
#> 2     3  2002    20
#> 3     3  2004    30
#> 4     4  2000    25
#> 5     4  2002    55
#> 6     4  2004    95

If you have more than one Amount column & want to apply more than one function, you can use either summarise_if or summarise_all

df %>% 
  group_by(ID, Year) %>% 
  summarise_if(is.numeric, funs(sum, mean))
#> # A tibble: 6 x 4
#> # Groups:   ID [?]
#>      ID  Year   sum  mean
#>   <int> <int> <int> <dbl>
#> 1     3  2000   100  50  
#> 2     3  2002    20  10  
#> 3     3  2004    30  30  
#> 4     4  2000    25  25  
#> 5     4  2002    55  27.5
#> 6     4  2004    95  47.5

df %>% 
  group_by(ID, Year) %>% 
  summarise_all(funs(sum, mean, max, min))
#> # A tibble: 6 x 6
#> # Groups:   ID [?]
#>      ID  Year   sum  mean   max   min
#>   <int> <int> <int> <dbl> <dbl> <dbl>
#> 1     3  2000   100  50      55    45
#> 2     3  2002    20  10      10    10
#> 3     3  2004    30  30      30    30
#> 4     4  2000    25  25      25    25
#> 5     4  2002    55  27.5    40    15
#> 6     4  2004    95  47.5    50    45

^{Created on 2018-09-19 by the reprex package (v0.2.1.9000)}

How to sum a variable on other aggregated variables, whilst keeping remaining variables in R?

It works for me when literally specifying that you want the first value, i.e.:

library(tidyverse)
df %>%
  group_by(set1, set2) %>%
  summarize(y = sum(y),
            row = row[1],
            set3 = set3[1])

 A tibble: 5 x 5
# Groups:   set1 [3]
   set1  set2     y   row  set3
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     3     1     1
2     1     2     6     4     2
3     2     1     6     7     4
4     2     2     3     9     5
5     3     1     4    10     5

Edit: To keep every other column without specifying, you can make use of across() and indicate that you want to apply this aggregation to every column except one.

df %>%
  group_by(set1, set2) %>%
  summarize(
    across(!y, ~ .x[1]), 
    y = sum(y)
  )

# A tibble: 5 x 5
# Groups:   set1 [3]
   set1  set2   row  set3     y
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     1     1     3
2     1     2     4     2     6
3     2     1     7     4     6
4     2     2     9     5     3
5     3     1    10     5     4