Using R Statistics Add a Group Sum to Each Row

How to sum a variable by group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))

Sum of group but keep the same value for each row in r

As short as:

df$sumx <- with(df,ave(x,ID,Group,FUN = sum))
df$sumy <- with(df,ave(y,ID,Group,FUN = sum))

How to sum rows based on group and sub-group using dplyr in R?

First group by Country and then mutate with sum:

library(dplyr)

transportation %>%
group_by(Country) %>%
mutate(country_sum = sum(Energy))

 Country Mode  Energy country_sum
<chr> <chr> <dbl> <dbl>
1 A Car 10000 39000
2 A Train 9000 39000
3 A Plane 20000 39000
4 B Car 200000 810000
5 B Train 160000 810000
6 B Plane 450000 810000

Group rows into a new row and sum in r

Here's one way you could do it:

library(tidyverse)

df <- df %>%
group_by(Week) %>%
arrange(desc(Total_Amount), .by_group = TRUE) %>%
mutate(id = row_number()) %>%
mutate(Person = case_when(id > 3 ~ "Others",
TRUE ~ as.character(Person)))

Then remove the $ sign so we can sum the Total_Amount:

df$Total_Amount <- as.numeric(gsub("\\$", "", df$Total_Amount))

Finally, sum the Total_Amount by group and add the $ sign to bring everything back:

df %>% 
group_by(Week, Person) %>%
summarise(Total_Amount = sum(Total_Amount)) %>%
mutate(Total_Amount = paste0("$", Total_Amount)) %>%
select(Week, Total_Amount, Person)

Which returns:

# A tibble: 8 x 3
# Groups: Week [2]
Week Total_Amount Person
<int> <chr> <chr>
1 1 $5 A
2 1 $5 B
3 1 $4 C
4 1 $3 Others
5 2 $5 A
6 2 $5 C
7 2 $5 F
8 2 $5 Others

How to sum values in multiple rows to a new column in R?

Update II on new request:

library(dplyr)

df %>%
group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
  Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6

Update: on new request of OP. This solution is inspired fully by PaulS solution (credits to him):

library(dplyr)

df %>%
group_by(grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
  Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Blueberry 2 0.1 0.7
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.4
5 Eggplant 5 0.1 0.7
6 Fruits 6 0.5 0.7

First answer:
We could sum Gamma after identifying odd and even rows in an ifelse statement:
In this case row_number could be replaced by Topic

library(dplyr)

df %>%
mutate(new_variable = ifelse(row_number() %% 2 == 1,
sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
sum(Gamma[row_number() %% 2 == 0])) # even 2,4
)
  Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4

data:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))

Microbenchmark: AndrewGB's base R is fastest

Sample Image

Ratio of row value to sum of rows in a group using r data.table

You can use prop.table to get ratio for value in each year and quarter.

library(data.table)

dt[, pct_byQtrYr := prop.table(value), .(year, quarter)]
dt

# ID year quarter value pct_byQtrYr
# 1: A 2020 4 4.0 0.1951220
# 2: B 2020 4 10.5 0.5121951
# 3: C 2020 4 6.0 0.2926829
# 4: A 2021 1 6.6 0.2933333
# 5: B 2021 1 15.0 0.6666667
# 6: C 2021 1 0.9 0.0400000
# 7: A 2021 2 6.2 0.1980831
# 8: B 2021 2 9.8 0.3130990
# 9: C 2021 2 15.3 0.4888179
#10: A 2021 3 5.0 0.5263158
#11: B 2021 3 3.4 0.3578947
#12: C 2021 3 1.1 0.1157895

This is similar to dividing value by sum of the group.

dt[, pct_byQtrYr := value/sum(value), .(year, quarter)]


Related Topics



Leave a reply



Submit