R: How to Aggregate Some Columns While Keeping Other Columns

How to aggregate some columns while keeping other columns in R?

Assuming that your data frame is named df.

aggregate(no~id+age, df, sum)
#   id age no
# 1  1  23  9
# 2  3  23  7
# 3  2  25  5

R: How to aggregate some columns while keeping other columns

I understand that you seek a base R solution, but in the meanwhile, here is a dplyr one:

library(dplyr)

data %>% 
  group_by(Date, Exercise) %>% 
  slice(which.max(EstMax))

# # A tibble: 6 x 8
# # Groups:   Date, Exercise [6]
#      Date    Exercise Category Weight  Reps   EstMax RepxWeight          Note
#    <fctr>      <fctr>   <fctr>  <int> <int>    <dbl>     <fctr>        <fctr>
# 1 4/18/16    Deadlift     Legs    155     8 196.2920      8x155             …
# 2  4/2/16 Bench Press    Chest    135     2 143.9910      2x135      not hard
# 3  4/2/16    Deadlift     Legs    135     7 166.4685      7x135          easy
# 4  4/9/16 Bench Press    Chest    135     2 143.9910      2x135 a little hard
# 5  5/8/16 Bench Press    Chest    115     4 130.3180      4x115          easy
# 6  5/8/16    Deadlift     Legs    185     3 203.4815      3x185      good day

Edit

data.table is not my forte, but for the sake of completeness, here's my attempt at it:

library(data.table)

setDT(data)[, .SD[which.max(EstMax)], by = .(Date, Exercise)]

#       Date    Exercise Category Weight Reps   EstMax RepxWeight          Note
# 1:  4/2/16    Deadlift     Legs    135    7 166.4685      7x135          easy
# 2:  4/2/16 Bench Press    Chest    135    2 143.9910      2x135      not hard
# 3:  4/9/16 Bench Press    Chest    135    2 143.9910      2x135 a little hard
# 4: 4/18/16    Deadlift     Legs    155    8 196.2920      8x155             …
# 5:  5/8/16    Deadlift     Legs    185    3 203.4815      3x185      good day
# 6:  5/8/16 Bench Press    Chest    115    4 130.3180      4x115          easy

Aggregate by multiple columns, sum one column and keep other columns? Create new column based on aggregated values?

In data.table:

library(data.table)

setDT(df)[, .(Amount = sum(Amount, na.rm = TRUE),
              UniqueStores = uniqueN(Store, na.rm = TRUE)), 
          by = .(ProductID, Day, Product)
          ]

Output:

   ProductID       Day Product Amount UniqueStores
1:         1    Monday    Food     10            1
2:         1   Tuesday    Food     10            2
3:         2 Wednesday    Toys     15            2
4:         2    Friday    Toys      7            1

How to keep other columns when using dplyr?

You have to specify how to summariz the variable b:

df %>%
  group_by(a) %>%
  summarise(max = max(c), sum = sum(c), b = max(b[c == max(c)]))

# # A tibble: 2 x 4
#   a       max   sum     b
#   <chr> <dbl> <dbl> <dbl>
# 1 a        10    15   400
# 2 b         4     6   300

How to sum a variable on other aggregated variables, whilst keeping remaining variables in R?

It works for me when literally specifying that you want the first value, i.e.:

library(tidyverse)
df %>%
  group_by(set1, set2) %>%
  summarize(y = sum(y),
            row = row[1],
            set3 = set3[1])

 A tibble: 5 x 5
# Groups:   set1 [3]
   set1  set2     y   row  set3
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     3     1     1
2     1     2     6     4     2
3     2     1     6     7     4
4     2     2     3     9     5
5     3     1     4    10     5

Edit: To keep every other column without specifying, you can make use of across() and indicate that you want to apply this aggregation to every column except one.

df %>%
  group_by(set1, set2) %>%
  summarize(
    across(!y, ~ .x[1]), 
    y = sum(y)
  )

# A tibble: 5 x 5
# Groups:   set1 [3]
   set1  set2   row  set3     y
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     1     1     3
2     1     2     4     2     6
3     2     1     7     4     6
4     2     2     9     5     3
5     3     1    10     5     4

aggregate on multiple columns - keeping the original column names and structure

I would suggest to use dplyr for such long chained operations. There are lot of benefits with it.

You can do all the transformation/manipulation and reshaping code with it in the single pipe without creating intermediate variables like x_agg_hist and x_agg_hist_agg_sum. So you don't have to remember/manage them.

The first few steps of your code code can be translated as :

library(dplyr)

x %>%
  group_by(strand, name) %>%
  summarise(res = hist(value, breaks = seq(0.5, 6.5),plot= FALSE)$counts) %>%
  left_join(y, by = 'name') %>%
  mutate(division = factor(ifelse(value > 0.5, 1, 2)))  %>%
  ungroup

Use pivot_wider to cast the data into wide format which will maintain the names of the data.

How to use something like tapply but keeping other columns in R

We can slice after grouping by 'Date'

library(dplyr)
df1 %>%
   group_by(Date) %>%
   slice(which.min(Value1))

Or with filter

df1 %>%
  group_by(Date) %>%
  filter(Value1 == min(Value1))

In base R, this can be one with ave

df1[with(df1, Value1 == ave(Value1, Date, FUN = min)),]

How to aggregate and average various rows based on multiple groups while keeping other columns intact

It sounds like you want to collapse the rows in each group into one row (??).

With data.table:

library(data.table)
##
#
setDT(df)[, .(
  Temp    = mean(Temp),
  DO_mgL  = mean(DO_mgL),
  secchi  = mean(secchi),
  d.lept  = sum(d.lept),
  d.byths = sum(d.byths),
  d.daph  = sum(d.daph)
), by=.(month, day, Site)]

##     month day Site    Temp DO_mgL secchi     d.lept     d.byths     d.daph
##  1:    11   4   11 12.8250 10.495   1.25 0.00000000 0.000000000 1.15013000
##  2:    11   4    6 14.2445 10.140   2.25 0.00000000 0.000000000 4.39251677
##  3:    11   5    9 12.9650 10.395   1.50 0.00000000 0.000000000 2.44219242
##  4:     7  20   10 23.8040  7.175   2.70 0.16327841 0.007392425 0.06098894
##  5:     7  27   13 23.8950  8.085   2.10 0.15374424 0.000000000 0.13673195
##  6:     7  27    2 24.4100  9.140   2.75 0.05392177 0.000000000 0.02450989
##  7:     7  27    3 24.0400  1.290   1.25 0.01239111 0.000000000 0.04956445
##  8:     8  16    4 23.9150  2.440   2.80 0.06933887 0.000000000 0.10400831
##  9:     8  16    5 24.0045  2.465   3.00 0.03602739 0.058286872 0.21616433
## 10:     8  16    6 23.9570  2.540   3.25 0.04666109 0.015553698 0.06221479

setDT(df) converts your df to a data.table (no need for a tibble). The by=.(...) clause defines the groups, and the clause .(...) does the aggregating.