How to Use Aggregate Function in R

How to use Aggregate function in R

aggregate(FLOWS ~ SPORT, dat, function(x) sum(as.numeric(x)))

where dat is the name of your matrix.

Here, the function is.numeric is necessary to transform the second column into numbers.

How to use aggregate and summary function to get unique columns in a dataframe?

Since aggregate's simplify parameter defaults to TRUE, it's simplifying the results of calling the function (here, summary) to a matrix. You can reconstruct the data.frame, coercing the column into its own data.frame:

df <- data.frame(Result = c(1,1,2,100,50,30,45,20, 10, 8),
Location = c("Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha"))

Agg <- aggregate(df$Result, list(df$Location), summary)

data.frame(Location = Agg$Group.1, Agg$x)
#> Location Min. X1st.Qu. Median Mean X3rd.Qu. Max.
#> 1 Alpha 1 6.25 26.5 38.50000 58.75 100
#> 2 Beta 1 10.50 20.0 23.66667 35.00 50
#> 3 Gamma 2 6.00 10.0 14.00000 20.00 30

Alternately, dplyr's summarise family of functions can handle multiple summary statistics well:

library(dplyr)

df %>% group_by(Location) %>% summarise_all(funs(min, median, max))
#> # A tibble: 3 x 4
#> Location min median max
#> <fct> <dbl> <dbl> <dbl>
#> 1 Alpha 1. 26.5 100.
#> 2 Beta 1. 20.0 50.
#> 3 Gamma 2. 10.0 30.

If you really want all of summary, you can use broom::tidy to turn each group's results into a data frame in a list column, which can be unnested:

df %>% 
group_by(Location) %>%
summarise(x = list(broom::tidy(summary(Result)))) %>%
tidyr::unnest()
#> # A tibble: 3 x 7
#> Location minimum q1 median mean q3 maximum
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Alpha 1. 6.25 26.5 38.5 58.8 100.
#> 2 Beta 1. 10.5 20.0 23.7 35.0 50.
#> 3 Gamma 2. 6.00 10.0 14.0 20.0 30.

Aggregate multiple columns at once

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
group_by(id1, id2) %>%
summarise(across(starts_with('val'), mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))

R - Aggregate variables with min function and find index related to the min value

Consider ave to subset dataframe and return all rows of corresponding Value matches:

Data[Data$Value == ave(Data$Value, Data$ID, FUN=min),]

# ID Position Value
# 1 A 10 0
# 5 B 6 1

R won't compute means correctly with aggregate function

Update:
There is no need for anonymous function (Credits to Gregor Thomas, see comments). We could use:

summarise(across(where(is.numeric), mean, na.rm = TRUE))

First answer:
Thanks to Gregor Thomas colMeanswon't work here.
We could use dplyr package summarise and across

library(dplyr)

df %>%
group_by(cultivar) %>%
summarise(across(where(is.numeric),~ mean(., na.rm = TRUE)))

Output:

  cultivar replication width height
<chr> <dbl> <dbl> <dbl>
1 BOF 2.5 11 14.5

How to select which variables to drop using aggregate function in r

You can use complete from tidyr :

library(dplyr)
df %>%
select(-Donor) %>%
group_by(Recipient, time) %>%
tidyr::complete(location = unique(df$location))

# Recipient time location value
# <chr> <dbl> <chr> <dbl>
# 1 r1 2000 in 2
# 2 r1 2000 out 5
# 3 r1 2000 undefined NA
# 4 r1 2002 in 4
# 5 r1 2002 out NA
# 6 r1 2002 undefined NA
# 7 r2 2002 in NA
# 8 r2 2002 out 3
# 9 r2 2002 undefined 1
#10 r3 2004 in 4
#11 r3 2004 out 3
#12 r3 2004 undefined NA

Multiple functions in aggregate

With dplyr, you could do this:

library(dplyr)
group_by(d,Branch) %>%
summarize(Number_of_loans = n(),
Loan_Amount = sum(Loan_Amount),
TAT = sum(TAT))

output

Source: local data frame [2 x 4]

Branch Number_of_loans Loan_Amount TAT
(fctr) (int) (int) (dbl)
1 A 3 520 15.0
2 B 2 350 3.5

data

d <- read.table(text="Branch Loan_Amount TAT
A 100 2.0
A 120 4.0
A 300 9.0
B 150 1.5
B 200 2.0",head=TRUE)

How to show multiple value columns in aggregate function in R

Alternativ to aggregate() (solution provided by Taufi). Here is a dplyr solution where you can calculate the sum of all numeric columns:

library(dplyr)
TableA %>%
group_by(Product) %>%
summarise(across(where(is.numeric), ~sum(.x, na.rm = TRUE)))

Output:

  Product  East  West North South
<chr> <int> <int> <int> <int>
1 Airpod 8 16 36 54
2 iPhone 8 16 36 54
3 Macbook 12 24 54 81

Error for NA using group_by or aggregate function [aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate]

Here is a way to create the wanted data.frame. I think your solution has one error in row 2 (Sheep), where mean(NA, 10) is equal to 5 and not 10.

library(dplyr)

Using aggregate

 Data %>% 
aggregate(.~Year+Farms,., FUN=mean, na.rm=T, na.action=NULL) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))

Using summarize

Data %>% 
group_by(Year, Farms) %>%
summarize(MeanCow = mean(Cow, na.rm=T),
MeanDuck = mean(Duck, na.rm=T),
MeanChicken = mean(Chicken, na.rm=T),
MeanSheep = mean(Sheep, na.rm=T),
MeanHorse = mean(Horse, na.rm=T)) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))

Solution for both

      Year  Farms  Cow Duck Chicken Sheep Horse
1 2020 Farm 1 22.0 12.0 110 25.0 22.5
2 2019 Farm 1 14.0 6.0 65 10.0 13.5
3 2018 Farm 1 8.0 NA 10 14.5 12.0
4 2020 Farm 2 31.0 20.5 29 15.0 14.0
5 2019 Farm 2 11.5 40.5 43 18.5 42.5
6 2018 Farm 2 36.5 26.5 28 30.0 11.0
7 2020 Farm 3 38.5 9.0 37 30.0 42.0
8 2019 Farm 3 NA 10.5 NA 20.0 11.5
9 2018 Farm 3 NA 7.0 24 38.0 42.0


Related Topics



Leave a reply



Submit