How to use Aggregate function in R
aggregate(FLOWS ~ SPORT, dat, function(x) sum(as.numeric(x)))
where dat
is the name of your matrix.
Here, the function is.numeric
is necessary to transform the second column into numbers.
How to use aggregate and summary function to get unique columns in a dataframe?
Since aggregate
's simplify
parameter defaults to TRUE
, it's simplifying the results of calling the function (here, summary
) to a matrix. You can reconstruct the data.frame, coercing the column into its own data.frame:
df <- data.frame(Result = c(1,1,2,100,50,30,45,20, 10, 8),
Location = c("Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha"))
Agg <- aggregate(df$Result, list(df$Location), summary)
data.frame(Location = Agg$Group.1, Agg$x)
#> Location Min. X1st.Qu. Median Mean X3rd.Qu. Max.
#> 1 Alpha 1 6.25 26.5 38.50000 58.75 100
#> 2 Beta 1 10.50 20.0 23.66667 35.00 50
#> 3 Gamma 2 6.00 10.0 14.00000 20.00 30
Alternately, dplyr's summarise
family of functions can handle multiple summary statistics well:
library(dplyr)
df %>% group_by(Location) %>% summarise_all(funs(min, median, max))
#> # A tibble: 3 x 4
#> Location min median max
#> <fct> <dbl> <dbl> <dbl>
#> 1 Alpha 1. 26.5 100.
#> 2 Beta 1. 20.0 50.
#> 3 Gamma 2. 10.0 30.
If you really want all of summary
, you can use broom::tidy
to turn each group's results into a data frame in a list column, which can be unnest
ed:
df %>%
group_by(Location) %>%
summarise(x = list(broom::tidy(summary(Result)))) %>%
tidyr::unnest()
#> # A tibble: 3 x 7
#> Location minimum q1 median mean q3 maximum
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Alpha 1. 6.25 26.5 38.5 58.8 100.
#> 2 Beta 1. 10.5 20.0 23.7 35.0 50.
#> 3 Gamma 2. 6.00 10.0 14.0 20.0 30.
Aggregate multiple columns at once
We can use the formula method of aggregate
. The variables on the 'rhs' of ~
are the grouping variables while the .
represents all other variables in the 'df1' (from the example, we assume that we need the mean
for all the columns except the grouping), specify the dataset and the function (mean
).
aggregate(.~id1+id2, df1, mean)
Or we can use summarise_each
from dplyr
after grouping (group_by
)
library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))
Or using summarise
with across
(dplyr
devel version - ‘0.8.99.9000’
)
df1 %>%
group_by(id1, id2) %>%
summarise(across(starts_with('val'), mean))
Or another option is data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
, grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD
) and get the mean
.
library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]
data
df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b",
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
R - Aggregate variables with min function and find index related to the min value
Consider ave
to subset dataframe and return all rows of corresponding Value matches:
Data[Data$Value == ave(Data$Value, Data$ID, FUN=min),]
# ID Position Value
# 1 A 10 0
# 5 B 6 1
R won't compute means correctly with aggregate function
Update:
There is no need for anonymous function (Credits to Gregor Thomas, see comments). We could use:
summarise(across(where(is.numeric), mean, na.rm = TRUE))
First answer:
Thanks to Gregor Thomas colMeans
won't work here.
We could use dplyr
package summarise
and across
library(dplyr)
df %>%
group_by(cultivar) %>%
summarise(across(where(is.numeric),~ mean(., na.rm = TRUE)))
Output:
cultivar replication width height
<chr> <dbl> <dbl> <dbl>
1 BOF 2.5 11 14.5
How to select which variables to drop using aggregate function in r
You can use complete
from tidyr
:
library(dplyr)
df %>%
select(-Donor) %>%
group_by(Recipient, time) %>%
tidyr::complete(location = unique(df$location))
# Recipient time location value
# <chr> <dbl> <chr> <dbl>
# 1 r1 2000 in 2
# 2 r1 2000 out 5
# 3 r1 2000 undefined NA
# 4 r1 2002 in 4
# 5 r1 2002 out NA
# 6 r1 2002 undefined NA
# 7 r2 2002 in NA
# 8 r2 2002 out 3
# 9 r2 2002 undefined 1
#10 r3 2004 in 4
#11 r3 2004 out 3
#12 r3 2004 undefined NA
Multiple functions in aggregate
With dplyr, you could do this:
library(dplyr)
group_by(d,Branch) %>%
summarize(Number_of_loans = n(),
Loan_Amount = sum(Loan_Amount),
TAT = sum(TAT))
output
Source: local data frame [2 x 4]
Branch Number_of_loans Loan_Amount TAT
(fctr) (int) (int) (dbl)
1 A 3 520 15.0
2 B 2 350 3.5
data
d <- read.table(text="Branch Loan_Amount TAT
A 100 2.0
A 120 4.0
A 300 9.0
B 150 1.5
B 200 2.0",head=TRUE)
How to show multiple value columns in aggregate function in R
Alternativ to aggregate()
(solution provided by Taufi). Here is a dplyr
solution where you can calculate the sum of all numeric columns:
library(dplyr)
TableA %>%
group_by(Product) %>%
summarise(across(where(is.numeric), ~sum(.x, na.rm = TRUE)))
Output:
Product East West North South
<chr> <int> <int> <int> <int>
1 Airpod 8 16 36 54
2 iPhone 8 16 36 54
3 Macbook 12 24 54 81
Error for NA using group_by or aggregate function [aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate]
Here is a way to create the wanted data.frame. I think your solution has one error in row 2 (Sheep), where mean(NA, 10) is equal to 5 and not 10.
library(dplyr)
Using aggregate
Data %>%
aggregate(.~Year+Farms,., FUN=mean, na.rm=T, na.action=NULL) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))
Using summarize
Data %>%
group_by(Year, Farms) %>%
summarize(MeanCow = mean(Cow, na.rm=T),
MeanDuck = mean(Duck, na.rm=T),
MeanChicken = mean(Chicken, na.rm=T),
MeanSheep = mean(Sheep, na.rm=T),
MeanHorse = mean(Horse, na.rm=T)) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))
Solution for both
Year Farms Cow Duck Chicken Sheep Horse
1 2020 Farm 1 22.0 12.0 110 25.0 22.5
2 2019 Farm 1 14.0 6.0 65 10.0 13.5
3 2018 Farm 1 8.0 NA 10 14.5 12.0
4 2020 Farm 2 31.0 20.5 29 15.0 14.0
5 2019 Farm 2 11.5 40.5 43 18.5 42.5
6 2018 Farm 2 36.5 26.5 28 30.0 11.0
7 2020 Farm 3 38.5 9.0 37 30.0 42.0
8 2019 Farm 3 NA 10.5 NA 20.0 11.5
9 2018 Farm 3 NA 7.0 24 38.0 42.0
Related Topics
Creating a Monthly/Yearly Calendar Image with Ggplot2
Beginner Tips on Using Plyr to Calculate Year-Over-Year Change Across Groups
Reshape Data from Long to Wide, with Time in New Wide Variable Name
Select Random Element in a List of R
How to Use Spell Check in Rmarkdown
What Is R's Crossproduct Function
How to Test If Object Is a Vector
Formatting Number Output of Sliderinput in Shiny
R Subsetting a Data Frame into Multiple Data Frames Based on Multiple Column Values
How to Reorder Factor Levels in a Tidy Way
Can Ggplot Make 2D Summaries of Data
Raster Image Goes Below Base Layer, While Markers Stay Above: Xindex Is Ignored
Transfer Values from One Dataframe to Another
R Memory Management Advice (Caret, Model Matrices, Data Frames)
Run Asynchronous Function in R