How to Use Aggregate Function in R

How to use Aggregate function in R

aggregate(FLOWS ~ SPORT, dat, function(x) sum(as.numeric(x)))

where dat is the name of your matrix.

Here, the function is.numeric is necessary to transform the second column into numbers.

How to use aggregate and summary function to get unique columns in a dataframe?

Since aggregate's simplify parameter defaults to TRUE, it's simplifying the results of calling the function (here, summary) to a matrix. You can reconstruct the data.frame, coercing the column into its own data.frame:

df <- data.frame(Result = c(1,1,2,100,50,30,45,20, 10, 8),
                 Location = c("Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha"))

Agg <- aggregate(df$Result, list(df$Location), summary)

data.frame(Location = Agg$Group.1, Agg$x)
#>   Location Min. X1st.Qu. Median     Mean X3rd.Qu. Max.
#> 1    Alpha    1     6.25   26.5 38.50000    58.75  100
#> 2     Beta    1    10.50   20.0 23.66667    35.00   50
#> 3    Gamma    2     6.00   10.0 14.00000    20.00   30

Alternately, dplyr's summarise family of functions can handle multiple summary statistics well:

library(dplyr)

df %>% group_by(Location) %>% summarise_all(funs(min, median, max))
#> # A tibble: 3 x 4
#>   Location   min median   max
#>   <fct>    <dbl>  <dbl> <dbl>
#> 1 Alpha       1.   26.5  100.
#> 2 Beta        1.   20.0   50.
#> 3 Gamma       2.   10.0   30.

If you really want all of summary, you can use broom::tidy to turn each group's results into a data frame in a list column, which can be unnested:

df %>% 
    group_by(Location) %>% 
    summarise(x = list(broom::tidy(summary(Result)))) %>% 
    tidyr::unnest()
#> # A tibble: 3 x 7
#>   Location minimum    q1 median  mean    q3 maximum
#>   <fct>      <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#> 1 Alpha         1.  6.25   26.5  38.5  58.8    100.
#> 2 Beta          1. 10.5    20.0  23.7  35.0     50.
#> 3 Gamma         2.  6.00   10.0  14.0  20.0     30.

Aggregate multiple columns at once

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

library(dplyr)
df1 %>%
    group_by(id1, id2) %>% 
    summarise_each(funs(mean))

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
    group_by(id1, id2) %>%
    summarise(across(starts_with('val'), mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), 
val1 = c(1L, 
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

R - Aggregate variables with min function and find index related to the min value

Consider ave to subset dataframe and return all rows of corresponding Value matches:

Data[Data$Value == ave(Data$Value, Data$ID, FUN=min),]

#   ID Position Value
# 1  A       10     0
# 5  B        6     1

R won't compute means correctly with aggregate function

Update:
There is no need for anonymous function (Credits to Gregor Thomas, see comments). We could use:

summarise(across(where(is.numeric), mean, na.rm = TRUE))

First answer:
Thanks to Gregor Thomas colMeanswon't work here.
We could use dplyr package summarise and across

library(dplyr)

df %>% 
  group_by(cultivar) %>% 
  summarise(across(where(is.numeric),~ mean(., na.rm = TRUE)))

Output:

  cultivar replication width height
  <chr>          <dbl> <dbl>  <dbl>
1 BOF              2.5    11   14.5

How to select which variables to drop using aggregate function in r

You can use complete from tidyr :

library(dplyr)
df %>%
  select(-Donor) %>%
  group_by(Recipient, time) %>%
  tidyr::complete(location = unique(df$location))

#   Recipient  time location  value
#   <chr>     <dbl> <chr>     <dbl>
# 1 r1         2000 in            2
# 2 r1         2000 out           5
# 3 r1         2000 undefined    NA
# 4 r1         2002 in            4
# 5 r1         2002 out          NA
# 6 r1         2002 undefined    NA
# 7 r2         2002 in           NA
# 8 r2         2002 out           3
# 9 r2         2002 undefined     1
#10 r3         2004 in            4
#11 r3         2004 out           3
#12 r3         2004 undefined    NA

Multiple functions in aggregate

With dplyr, you could do this:

library(dplyr)
group_by(d,Branch) %>% 
  summarize(Number_of_loans = n(),
            Loan_Amount = sum(Loan_Amount),
            TAT = sum(TAT))

output

Source: local data frame [2 x 4]

  Branch Number_of_loans Loan_Amount   TAT
  (fctr)           (int)       (int) (dbl)
1      A               3         520  15.0
2      B               2         350   3.5

data

d <- read.table(text="Branch Loan_Amount TAT
A         100 2.0
A         120 4.0
A         300 9.0
B         150 1.5
B         200 2.0",head=TRUE)

How to show multiple value columns in aggregate function in R

Alternativ to aggregate() (solution provided by Taufi). Here is a dplyr solution where you can calculate the sum of all numeric columns:

library(dplyr)
TableA %>% 
  group_by(Product) %>% 
  summarise(across(where(is.numeric), ~sum(.x, na.rm = TRUE)))

Output:

  Product  East  West North South
  <chr>   <int> <int> <int> <int>
1 Airpod      8    16    36    54
2 iPhone      8    16    36    54
3 Macbook    12    24    54    81

Error for NA using group_by or aggregate function [aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate]

Here is a way to create the wanted data.frame. I think your solution has one error in row 2 (Sheep), where mean(NA, 10) is equal to 5 and not 10.

library(dplyr)

Using aggregate

 Data %>% 
  aggregate(.~Year+Farms,., FUN=mean, na.rm=T, na.action=NULL) %>% 
  arrange(Farms, desc(Year)) %>% 
  as.data.frame() %>%  
  mutate_at(names(.), ~replace(., is.nan(.), NA))

Using summarize

Data %>% 
  group_by(Year, Farms) %>% 
  summarize(MeanCow = mean(Cow, na.rm=T),
            MeanDuck =  mean(Duck, na.rm=T),
            MeanChicken = mean(Chicken, na.rm=T),
            MeanSheep = mean(Sheep, na.rm=T),
            MeanHorse = mean(Horse, na.rm=T)) %>% 
  arrange(Farms, desc(Year)) %>% 
  as.data.frame() %>% 
  mutate_at(names(.), ~replace(., is.nan(.), NA))

Solution for both

      Year  Farms  Cow Duck Chicken Sheep Horse
1 2020 Farm 1 22.0 12.0     110  25.0  22.5
2 2019 Farm 1 14.0  6.0      65  10.0  13.5
3 2018 Farm 1  8.0   NA      10  14.5  12.0
4 2020 Farm 2 31.0 20.5      29  15.0  14.0
5 2019 Farm 2 11.5 40.5      43  18.5  42.5
6 2018 Farm 2 36.5 26.5      28  30.0  11.0
7 2020 Farm 3 38.5  9.0      37  30.0  42.0
8 2019 Farm 3   NA 10.5      NA  20.0  11.5
9 2018 Farm 3   NA  7.0      24  38.0  42.0

How to Use Aggregate Function in R