R: How to Aggregate Some Columns While Keeping Other Columns

How to aggregate some columns while keeping other columns in R?

Assuming that your data frame is named df.

aggregate(no~id+age, df, sum)
# id age no
# 1 1 23 9
# 2 3 23 7
# 3 2 25 5

R: How to aggregate some columns while keeping other columns

I understand that you seek a base R solution, but in the meanwhile, here is a dplyr one:

library(dplyr)

data %>%
group_by(Date, Exercise) %>%
slice(which.max(EstMax))

# # A tibble: 6 x 8
# # Groups: Date, Exercise [6]
# Date Exercise Category Weight Reps EstMax RepxWeight Note
# <fctr> <fctr> <fctr> <int> <int> <dbl> <fctr> <fctr>
# 1 4/18/16 Deadlift Legs 155 8 196.2920 8x155 …
# 2 4/2/16 Bench Press Chest 135 2 143.9910 2x135 not hard
# 3 4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy
# 4 4/9/16 Bench Press Chest 135 2 143.9910 2x135 a little hard
# 5 5/8/16 Bench Press Chest 115 4 130.3180 4x115 easy
# 6 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day

Edit

data.table is not my forte, but for the sake of completeness, here's my attempt at it:

library(data.table)

setDT(data)[, .SD[which.max(EstMax)], by = .(Date, Exercise)]

# Date Exercise Category Weight Reps EstMax RepxWeight Note
# 1: 4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy
# 2: 4/2/16 Bench Press Chest 135 2 143.9910 2x135 not hard
# 3: 4/9/16 Bench Press Chest 135 2 143.9910 2x135 a little hard
# 4: 4/18/16 Deadlift Legs 155 8 196.2920 8x155 …
# 5: 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day
# 6: 5/8/16 Bench Press Chest 115 4 130.3180 4x115 easy

Aggregate by multiple columns, sum one column and keep other columns? Create new column based on aggregated values?

In data.table:

library(data.table)

setDT(df)[, .(Amount = sum(Amount, na.rm = TRUE),
UniqueStores = uniqueN(Store, na.rm = TRUE)),
by = .(ProductID, Day, Product)
]

Output:

   ProductID       Day Product Amount UniqueStores
1: 1 Monday Food 10 1
2: 1 Tuesday Food 10 2
3: 2 Wednesday Toys 15 2
4: 2 Friday Toys 7 1

How to keep other columns when using dplyr?

You have to specify how to summariz the variable b:

df %>%
group_by(a) %>%
summarise(max = max(c), sum = sum(c), b = max(b[c == max(c)]))

# # A tibble: 2 x 4
# a max sum b
# <chr> <dbl> <dbl> <dbl>
# 1 a 10 15 400
# 2 b 4 6 300

How to sum a variable on other aggregated variables, whilst keeping remaining variables in R?

It works for me when literally specifying that you want the first value, i.e.:

library(tidyverse)
df %>%
group_by(set1, set2) %>%
summarize(y = sum(y),
row = row[1],
set3 = set3[1])

A tibble: 5 x 5
# Groups: set1 [3]
set1 set2 y row set3
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 1 1
2 1 2 6 4 2
3 2 1 6 7 4
4 2 2 3 9 5
5 3 1 4 10 5

Edit: To keep every other column without specifying, you can make use of across() and indicate that you want to apply this aggregation to every column except one.

df %>%
group_by(set1, set2) %>%
summarize(
across(!y, ~ .x[1]),
y = sum(y)
)

# A tibble: 5 x 5
# Groups: set1 [3]
set1 set2 row set3 y
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 3
2 1 2 4 2 6
3 2 1 7 4 6
4 2 2 9 5 3
5 3 1 10 5 4

aggregate on multiple columns - keeping the original column names and structure

I would suggest to use dplyr for such long chained operations. There are lot of benefits with it.

You can do all the transformation/manipulation and reshaping code with it in the single pipe without creating intermediate variables like x_agg_hist and x_agg_hist_agg_sum. So you don't have to remember/manage them.

The first few steps of your code code can be translated as :

library(dplyr)

x %>%
group_by(strand, name) %>%
summarise(res = hist(value, breaks = seq(0.5, 6.5),plot= FALSE)$counts) %>%
left_join(y, by = 'name') %>%
mutate(division = factor(ifelse(value > 0.5, 1, 2))) %>%
ungroup

Use pivot_wider to cast the data into wide format which will maintain the names of the data.

How to use something like tapply but keeping other columns in R

We can slice after grouping by 'Date'

library(dplyr)
df1 %>%
group_by(Date) %>%
slice(which.min(Value1))

Or with filter

df1 %>%
group_by(Date) %>%
filter(Value1 == min(Value1))

In base R, this can be one with ave

df1[with(df1, Value1 == ave(Value1, Date, FUN = min)),]

How to aggregate and average various rows based on multiple groups while keeping other columns intact

It sounds like you want to collapse the rows in each group into one row (??).

With data.table:

library(data.table)
##
#
setDT(df)[, .(
Temp = mean(Temp),
DO_mgL = mean(DO_mgL),
secchi = mean(secchi),
d.lept = sum(d.lept),
d.byths = sum(d.byths),
d.daph = sum(d.daph)
), by=.(month, day, Site)]

## month day Site Temp DO_mgL secchi d.lept d.byths d.daph
## 1: 11 4 11 12.8250 10.495 1.25 0.00000000 0.000000000 1.15013000
## 2: 11 4 6 14.2445 10.140 2.25 0.00000000 0.000000000 4.39251677
## 3: 11 5 9 12.9650 10.395 1.50 0.00000000 0.000000000 2.44219242
## 4: 7 20 10 23.8040 7.175 2.70 0.16327841 0.007392425 0.06098894
## 5: 7 27 13 23.8950 8.085 2.10 0.15374424 0.000000000 0.13673195
## 6: 7 27 2 24.4100 9.140 2.75 0.05392177 0.000000000 0.02450989
## 7: 7 27 3 24.0400 1.290 1.25 0.01239111 0.000000000 0.04956445
## 8: 8 16 4 23.9150 2.440 2.80 0.06933887 0.000000000 0.10400831
## 9: 8 16 5 24.0045 2.465 3.00 0.03602739 0.058286872 0.21616433
## 10: 8 16 6 23.9570 2.540 3.25 0.04666109 0.015553698 0.06221479

setDT(df) converts your df to a data.table (no need for a tibble). The by=.(...) clause defines the groups, and the clause .(...) does the aggregating.



Related Topics



Leave a reply



Submit