How to sum a variable by group
Using aggregate
:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
In the example above, multiple dimensions can be specified in the list
. Multiple aggregated metrics of the same data type can be incorporated via cbind
:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(embedding @thelatemail comment), aggregate
has a formula interface too
aggregate(Frequency ~ Category, x, sum)
Or if you want to aggregate multiple columns, you could use the .
notation (works for one column too)
aggregate(. ~ Category, x, sum)
or tapply
:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
Using this data:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
Sum of group but keep the same value for each row in r
As short as:
df$sumx <- with(df,ave(x,ID,Group,FUN = sum))
df$sumy <- with(df,ave(y,ID,Group,FUN = sum))
How to sum rows based on group and sub-group using dplyr in R?
First group by Country
and then mutate
with sum
:
library(dplyr)
transportation %>%
group_by(Country) %>%
mutate(country_sum = sum(Energy))
Country Mode Energy country_sum
<chr> <chr> <dbl> <dbl>
1 A Car 10000 39000
2 A Train 9000 39000
3 A Plane 20000 39000
4 B Car 200000 810000
5 B Train 160000 810000
6 B Plane 450000 810000
Group rows into a new row and sum in r
Here's one way you could do it:
library(tidyverse)
df <- df %>%
group_by(Week) %>%
arrange(desc(Total_Amount), .by_group = TRUE) %>%
mutate(id = row_number()) %>%
mutate(Person = case_when(id > 3 ~ "Others",
TRUE ~ as.character(Person)))
Then remove the $ sign so we can sum the Total_Amount
:
df$Total_Amount <- as.numeric(gsub("\\$", "", df$Total_Amount))
Finally, sum the Total_Amount
by group and add the $ sign to bring everything back:
df %>%
group_by(Week, Person) %>%
summarise(Total_Amount = sum(Total_Amount)) %>%
mutate(Total_Amount = paste0("$", Total_Amount)) %>%
select(Week, Total_Amount, Person)
Which returns:
# A tibble: 8 x 3
# Groups: Week [2]
Week Total_Amount Person
<int> <chr> <chr>
1 1 $5 A
2 1 $5 B
3 1 $4 C
4 1 $3 Others
5 2 $5 A
6 2 $5 C
7 2 $5 F
8 2 $5 Others
How to sum values in multiple rows to a new column in R?
Update II on new request:
library(dplyr)
df %>%
group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6
Update: on new request of OP. This solution is inspired fully by PaulS solution (credits to him):
library(dplyr)
df %>%
group_by(grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Blueberry 2 0.1 0.7
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.4
5 Eggplant 5 0.1 0.7
6 Fruits 6 0.5 0.7
First answer:
We could sum Gamma
after identifying odd and even rows in an ifelse statement:
In this case row_number
could be replaced by Topic
library(dplyr)
df %>%
mutate(new_variable = ifelse(row_number() %% 2 == 1,
sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
sum(Gamma[row_number() %% 2 == 0])) # even 2,4
)
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4
data:
structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))
Microbenchmark: AndrewGB's base R is fastest
Ratio of row value to sum of rows in a group using r data.table
You can use prop.table
to get ratio for value
in each year
and quarter
.
library(data.table)
dt[, pct_byQtrYr := prop.table(value), .(year, quarter)]
dt
# ID year quarter value pct_byQtrYr
# 1: A 2020 4 4.0 0.1951220
# 2: B 2020 4 10.5 0.5121951
# 3: C 2020 4 6.0 0.2926829
# 4: A 2021 1 6.6 0.2933333
# 5: B 2021 1 15.0 0.6666667
# 6: C 2021 1 0.9 0.0400000
# 7: A 2021 2 6.2 0.1980831
# 8: B 2021 2 9.8 0.3130990
# 9: C 2021 2 15.3 0.4888179
#10: A 2021 3 5.0 0.5263158
#11: B 2021 3 3.4 0.3578947
#12: C 2021 3 1.1 0.1157895
This is similar to dividing value
by sum
of the group.
dt[, pct_byQtrYr := value/sum(value), .(year, quarter)]
Related Topics
How to Match by Nearest Date from Two Data Frames
How to Copy and Paste Data into R from the Clipboard
How to Use a List as a Hash in R? If So, Why Is It So Slow
How to Change Order of Array Dimensions
Protect/Encrypt R Package Code for Distribution
Ggplot2, Axis Not Showing After Using Theme(Axis.Line=Element_Line())
Typeof Returns Integer for Something That Is Clearly a Factor
How to Extract Certain Columns from a List of Data Frames
For the Same Code, Labels (Q1, Median) Appear on One Computer But Don't Appear on Another Computer
Producing a Vector Graphics Image (I.E. Metafile) in R Suitable for Printing in Word 2007
Randomly Insert Nas into Dataframe Proportionaly
Moving Columns Within a Data.Frame() Without Retyping
Using Parallel's Parlapply: Unable to Access Variables Within Parallel Code
Control the Height in Fluidrow in R Shiny
How to Set the Default Language of Date in R
Reason Behind Speed of Fread in Data.Table Package in R