How to sum a variable by group
Using aggregate
:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
In the example above, multiple dimensions can be specified in the list
. Multiple aggregated metrics of the same data type can be incorporated via cbind
:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(embedding @thelatemail comment), aggregate
has a formula interface too
aggregate(Frequency ~ Category, x, sum)
Or if you want to aggregate multiple columns, you could use the .
notation (works for one column too)
aggregate(. ~ Category, x, sum)
or tapply
:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
Using this data:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
R: how to sum columns grouped by a factor?
You can use dplyr
for this:
library(dplyr)
df = data.frame(
user = c("a", "a", "b", "b", "c"),
v1 = c(1, 1, 1, 2, 1),
v2 = c(0, 0, 0, 0, 1),
v3 = c(0, 1, 0, 3, 1))
group_by(df, user) %>%
summarize(v1_sum = sum(v1),
v2_sum = sum(v2),
v3_sum = sum(v3))
If you're not familiar with the %>%
notation, it is basically like piping from bash. It takes the output from group_by()
and puts it into summarize()
. The same thing would be accomplished this way:
by_user = group_by(df, user)
df_summarized = summarize(by_user,
v1_sum = sum(v1),
v2_sum = sum(v2),
v3_sum = sum(v3))
How to group by factor levels from two columns and output new column that shows sum of each level in R?
Instead of grouping by 'RawDate', group by 'ID', 'YEAR' and get the sum
on a logical vector
library(dplyr)
complete_df %>%
group_by(ID, YEAR) %>%
mutate(TotalWon = sum(Renewal == 'WON'), TotalLost = sum(Renewal == 'LOST'))
If we need a summarised output, use summarise
instead of mutate
Summing values in a column and grouping by another column in R
summarise_all
is your friend here.
summarise_all(group_by(df, Dept), sum)
# # A tibble: 2 x 4
# Dept Mike Steve Tom
# <chr> <dbl> <dbl> <dbl>
# 1 Dept1 2 2 1
# 2 Dept2 0 3 2
R sum a variable by two groups
You can group_by
ID
and Year
then use sum
within summarise
library(dplyr)
txt <- "ID Year Amount
3 2000 45
3 2000 55
3 2002 10
3 2002 10
3 2004 30
4 2000 25
4 2002 40
4 2002 15
4 2004 45
4 2004 50"
df <- read.table(text = txt, header = TRUE)
df %>%
group_by(ID, Year) %>%
summarise(Total = sum(Amount, na.rm = TRUE))
#> # A tibble: 6 x 3
#> # Groups: ID [?]
#> ID Year Total
#> <int> <int> <int>
#> 1 3 2000 100
#> 2 3 2002 20
#> 3 3 2004 30
#> 4 4 2000 25
#> 5 4 2002 55
#> 6 4 2004 95
If you have more than one Amount
column & want to apply more than one function, you can use either summarise_if
or summarise_all
df %>%
group_by(ID, Year) %>%
summarise_if(is.numeric, funs(sum, mean))
#> # A tibble: 6 x 4
#> # Groups: ID [?]
#> ID Year sum mean
#> <int> <int> <int> <dbl>
#> 1 3 2000 100 50
#> 2 3 2002 20 10
#> 3 3 2004 30 30
#> 4 4 2000 25 25
#> 5 4 2002 55 27.5
#> 6 4 2004 95 47.5
df %>%
group_by(ID, Year) %>%
summarise_all(funs(sum, mean, max, min))
#> # A tibble: 6 x 6
#> # Groups: ID [?]
#> ID Year sum mean max min
#> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 3 2000 100 50 55 45
#> 2 3 2002 20 10 10 10
#> 3 3 2004 30 30 30 30
#> 4 4 2000 25 25 25 25
#> 5 4 2002 55 27.5 40 15
#> 6 4 2004 95 47.5 50 45
Created on 2018-09-19 by the reprex package (v0.2.1.9000)
R: Sum specific columns grouped by a particular column
1.Minimal reproducible example data:
df <- structure(list(Col1 = c(10L, 10L, 30L, 45L, 45L),
Col2 = c("A", "A", "B", "C", "C"),
Col3 = c(5L, 6L, 2L, 5L, 2L),
Col4 = c(4L, 3L, 7L, 1L, 1L)),
row.names = c(NA, -5L), class = "data.frame")
2.Solution using dplyr
library(dplyr)
df %>%
group_by(Col1, Col2) %>%
summarise(Col3 = sum(Col3),
Col4 = sum(Col4))
Returns:
Col1 Col2 Col3 Col4
<int> <chr> <int> <int>
1 10 A 11 7
2 30 B 2 7
3 45 C 7 2
How to sum total of elements in several columns of factor type that are not empty?
What am I doing wrong with dplyr's code block?
It's because there are NA
s. Try
library(dplyr)
df2 = df %>%
select(Group, A_n, B_n) %>%
group_by(Group) %>%
summarise_all(sum, na.rm=TRUE)
instead.
Output on my machine:
# A tibble: 2 x 3
Group A_n B_n
<fctr> <dbl> <dbl>
1 Group1 2 1
2 Group2 1 1
I'm afraid my approach ... is too verbose and maybe overkill
You can just do this:
df <- data.frame(list(Group = c("Group1", "Group1", "Group2", "Group2"),
A=c("Some text", "Text here too", "Some other text", NA),
B=c(NA, "Some random text", NA, "Random here too")))
library(dplyr)
df2 = df %>%
group_by(Group) %>%
summarise_all(.funs=function(x) length(na.omit(x)))
Output on my machine:
# A tibble: 2 x 3
Group A B
<fctr> <int> <int>
1 Group1 2 1
2 Group2 1 1
A little explanation
If you look at help(summarise_all)
, you'll see its arguments are .tbl
, .funs
, and ...
(which we won't worry about the ellipses for now). So, we feed df
into group_by()
using the pipe %>%
, then feed that into summarise_all()
, again using the pipe %>%
. That takes care of the .tbl
argument. The .funs
argument is how you specify what function(s) should be used to summarise to all non-grouping columns in .tbl
. Here we want to know how many elements of each column is not NA
, which we can do (as one approach) by applying length(na.omit(x))
to each non-grouping column x
in .tbl
.
My best suggestion for a resource to learn about dplyr
is Chapter 5 of R for Data Science, a book by Hadley Wickham, who wrote the dplyr
package (among many others).
Related Topics
The Simplest Way to Convert a List with Various Length Vectors to a Data.Frame in R
Shiny Leaflet Ploygon Click Event
Converting Nested List (Unequal Length) to Data Frame
Get Width of Plot Area in Ggplot2
R Script - How to Continue Code Execution on Error
Example Needed: Change the Default Print Method of an Object
Return Df with a Columns Values That Occur More Than Once
In R, What Does a Negative Index Do
Why the Built-In Lm Function Is So Slow in R
Shiny: Merge Cells in Dt::Datatable
Combining New Lines and Italics in Facet Labels with Ggplot2
Identify Records in Data Frame a Not Contained in Data Frame B
How to Resolve the "No Font Name" Issue When Importing Fonts into R Using Extrafont