Take Sum of a Variable if Combination of Values in Two Other Columns are Unique
We could either use the base R
method by first sort
ing the first two columns by row. We use apply
with MARGIN=1
to do that, transpose the output, convert to 'data.frame' to create 'df1', use the formula method of aggregate
to get the sum
of 'num_email' grouped by the first two columns of the transformed dataset.
df1 <- data.frame(t(apply(df[1:2], 1, sort)), df[3])
aggregate(num_email~., df1, FUN=sum)
# X1 X2 num_email
# 1 Beth Mable 2
# 2 Beth Susan 3
# 3 Mable Susan 1
Or using data.table
, we convert the first two columns to character
class, unname
to change the column names of the first two columns to the default 'V1', 'V2', and convert to 'data.table'. Using the lexicographic ordering of character columns, we create the logical index for i (V1 > V2
), assign (:=
) the columns that meet the condition by reversing the order of columns (.(V2, V1)
), and get the sum
of 'num_email' grouped by 'V1', 'V2'.
library(data.table)
dt = do.call(data.table, c(lapply(unname(df[1:2]), as.character), df[3]))
dt[V1 > V2, c("V1", "V2") := .(V2, V1)]
dt[, .(num_email = sum(num_email)), by= .(V1, V2)]
# V1 V2 num_email
# 1: Beth Mable 2
# 2: Beth Susan 3
# 3: Mable Susan 1
Or using dplyr
, we use mutate_each
to convert the columns to character
class, then reverse the order with pmin
and pmax
, group by 'V1', 'V2' and get the sum
of 'num_email'.
library(dplyr)
df %>%
mutate_each(funs(as.character), senders, receivers) %>%
mutate( V1 = pmin(senders, receivers),
V2 = pmax(senders, receivers) ) %>%
group_by(V1, V2) %>%
summarise(num_email=sum(num_email))
# V1 V2 num_email
# (chr) (chr) (dbl)
# 1 Beth Mable 2
# 2 Beth Susan 3
# 3 Mable Susan 1
NOTE: The data.table
solution was updated by @Frank.
In R, take sum of multiple variables if combination of values in two other columns are unique
You can use dplyr::summarise
and across
after group_by
.
library(dplyr)
df %>%
group_by(Locations, seasons) %>%
summarise(across(starts_with("ani"), ~sum(.x, na.rm = TRUE))) %>%
ungroup()
Another option is to reshape the data to long format using functions from the tidyr
package. This avoids the issue of having to select columns 3 onwards.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -c(Locations, seasons)) %>%
group_by(Locations, seasons, name) %>%
summarise(Sum = sum(value, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = "name", values_from = "Sum")
Result:
# A tibble: 9 x 4
Locations seasons ani1 ani2
<chr> <int> <int> <int>
1 A 2 2 0
2 A 3 1 1
3 A 4 1 1
4 B 2 0 1
5 B 3 1 1
6 C 1 1 0
7 C 2 1 1
8 D 2 0 0
9 D 4 1 2
Sum values of column based on the unique values of another column
I believe you're looking for groupby
. You can find documentation here
df.groupby('Column1')['Column2'].sum()
Column1 Column2
1 44
2 65
3 30
4 18
Sum a rows within a column for each unique combination r
Suggest to try dplyr
. Quite a workhorse in data manipulation. From the desired output, you seem to try to get cumulative sum based on Week.
df = read.table(text="Week Day Value
1 1 1
1 2 3
1 3 4
2 1 2
2 2 2
2 3 3", header=T)
library(dplyr)
df %>% group_by(Week) %>% mutate(Sum = cumsum(Value))
# you get
Source: local data frame [6 x 4]
Groups: Week
Week Day Value Sum
1 1 1 1 1
2 1 2 3 4
3 1 3 4 8
4 2 1 2 2
5 2 2 2 4
6 2 3 3 7
Or you could try data.table
, another tool which is great for data of larger size. Fast and memory efficient.
setDT(df)[, Sum := cumsum(Value), by = Week][]
Week Day Value Sum
1: 1 1 1 1
2: 1 2 3 4
3: 1 3 4 8
4: 2 1 2 2
5: 2 2 2 4
6: 2 3 3 7
Sum rows of each unique combination of variables in r
We can do a rowSums
and convert to data.frame
, set the names
of the 'output' and cbind
with the original dataset.
output <- as.data.frame(combn(ncol(df1), 3, FUN =function(x) rowSums(df1[x])))
names(output) <- paste0("sum_", combn(names(df1), 3, FUN = paste, collapse="_"))
cbind(df1, output)
Extracting unique column combination and finding sum and count in R
We can use dplyr
library(dplyr)
df1 %>%
group_by(Origin, Destination, Airline) %>%
dplyr::summarise(count = n(), TotalPassengers = sum(Passengers))
# Groups: Origin, Destination [2]
# Origin Destination Airline count TotalPassengers
# <chr> <chr> <chr> <int> <dbl>
#1 ABE ATL 9A 2 3
#2 ABE ATL DL 1 5
#3 NYC SFA AA 3 21
#4 NYC SFA DL 1 5
data
df1 <- data.frame(Origin = rep(c("ABE", "NYC"), c(3, 4)),
Destination = rep(c("ATL", "SFA"), c(3, 4)),
Airline = c("9A", "9A", "DL", "AA", "AA", "AA", "DL"),
Passengers = c(2, 1, 5, 4, 10, 7, 5))
Sum for unique combinations of variables in a data table
Use pmin
and pmax
..
require(data.table) # v1.9.6
dt = fread("Country1 Country2 Value Category
A A 4 1
A B 2 1
A C 9 1
B A 3 2
B D 4 1
C A 2 2
D C 7 2")
dt[, .(total = sum(Value)),
by=.(Country1 = pmin(Country1, Country2),
Country2 = pmax(Country1, Country2))]
# Country1 Country2 total
# 1: A A 4
# 2: A B 5
# 3: A C 11
# 4: B D 4
# 5: C D 7
If you want this within Category
, just add it as well to by
.
SUM(DISTINCT) Based on Other Columns
select sum (rate)
from yourTable
group by first_name, last_name
Edit
If you want to get all sum of those little "sums
", you will get a sum of all table..
Select sum(rate) from YourTable
but, if for some reason are differents (if you use a where
, for example)
and you need a sum for that select above, just do.
select sum(SumGrouped) from
( select sum (rate) as 'SumGrouped'
from yourTable
group by first_name, last_name) T1
sum columns with different combinations in R?
Counting concurrent 1s in column pairs, we can use matrix muliplication:
xs = grep("X", names(df), value = T)
ys = grep("Y", names(df), value = T)
xm = as.matrix(df[xs])
ym = as.matrix(df[ys])
t(ym) %*% (xm)
# X_0 X_1 X_3 X_6 X_12
# Y_0 1 2 1 0 0
# Y_1 0 2 1 0 0
# Y_3 0 0 1 0 0
# Y_6 0 0 0 0 0
# Y_12 0 0 1 0 0
Counting all 1s in column pairs:
xs = grep("X", names(df), value = T)
ys = grep("Y", names(df), value = T)
sums = colSums(df)
t(outer(setNames(xs, xs), setNames(ys, ys), FUN = function(x, y) sums[x] + sums[y]))
# X_0 X_1 X_3 X_6 X_12
# Y_0 11 12 11 10 10
# Y_1 8 9 8 7 7
# Y_3 7 8 7 6 6
# Y_6 4 5 4 3 3
# Y_12 4 5 4 3 3
Using this data:
df = read.table(text = 'X_0 X_1 X_3 X_6 X_12 Y_0 Y_1 Y_3 Y_6 Y_12
0 1 0 0 0 1 1 0 0 0
0 0 0 0 0 1 1 1 0 1
0 1 0 0 0 1 1 0 0 0
1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 1 1 0 0
0 0 0 0 0 1 1 1 1 0
0 0 0 0 0 1 1 1 1 0
0 0 0 0 0 1 0 1 1 1
0 0 1 0 0 1 1 1 0 1 ', header = T)
Related Topics
Install.Packages Fails in Knitr Document: "Trying to Use Cran Without Setting a Mirror"
Function to Split a Matrix into Sub-Matrices in R
Merge Panel Data to Get Balanced Panel Data
R Matrix to Rownames Colnames Values
Specifying Formula in R with Glm Without Explicit Declaration of Each Covariate
Overlay Two Ggplot2 Stat_Density2D Plots with Alpha Channels
Using R Statistics Add a Group Sum to Each Row
How to Deal with "Data of Class Uneval" Error from Ggplot2
Set Only Lower Bound of a Limit for Ggplot
R: How to Run Some Code on Load of Package
Rstudio Shiny Error: There Is No Package Called "Shinydashboard"
How to Jitter/Dodge Geom_Segments So They Remain Parallel
In R, How to Add a Max by Group
Calling an R Function Using Inline and Rcpp Is Still Just as Slow as Original R Code