Take Sum of a Variable If Combination of Values in Two Other Columns Are Unique

Take Sum of a Variable if Combination of Values in Two Other Columns are Unique

We could either use the base R method by first sorting the first two columns by row. We use apply with MARGIN=1 to do that, transpose the output, convert to 'data.frame' to create 'df1', use the formula method of aggregate to get the sum of 'num_email' grouped by the first two columns of the transformed dataset.

df1 <- data.frame(t(apply(df[1:2], 1, sort)), df[3])
aggregate(num_email~., df1, FUN=sum)

# X1 X2 num_email
# 1 Beth Mable 2
# 2 Beth Susan 3
# 3 Mable Susan 1

Or using data.table, we convert the first two columns to character class, unname to change the column names of the first two columns to the default 'V1', 'V2', and convert to 'data.table'. Using the lexicographic ordering of character columns, we create the logical index for i (V1 > V2), assign (:=) the columns that meet the condition by reversing the order of columns (.(V2, V1)), and get the sum of 'num_email' grouped by 'V1', 'V2'.

library(data.table)
dt = do.call(data.table, c(lapply(unname(df[1:2]), as.character), df[3]))
dt[V1 > V2, c("V1", "V2") := .(V2, V1)]
dt[, .(num_email = sum(num_email)), by= .(V1, V2)]

# V1 V2 num_email
# 1: Beth Mable 2
# 2: Beth Susan 3
# 3: Mable Susan 1

Or using dplyr, we use mutate_each to convert the columns to character class, then reverse the order with pmin and pmax, group by 'V1', 'V2' and get the sum of 'num_email'.

library(dplyr)
df %>%
mutate_each(funs(as.character), senders, receivers) %>%
mutate( V1 = pmin(senders, receivers),
V2 = pmax(senders, receivers) ) %>%
group_by(V1, V2) %>%
summarise(num_email=sum(num_email))

# V1 V2 num_email
# (chr) (chr) (dbl)
# 1 Beth Mable 2
# 2 Beth Susan 3
# 3 Mable Susan 1

NOTE: The data.table solution was updated by @Frank.

In R, take sum of multiple variables if combination of values in two other columns are unique

You can use dplyr::summarise and across after group_by.

library(dplyr)

df %>%
group_by(Locations, seasons) %>%
summarise(across(starts_with("ani"), ~sum(.x, na.rm = TRUE))) %>%
ungroup()

Another option is to reshape the data to long format using functions from the tidyr package. This avoids the issue of having to select columns 3 onwards.

library(dplyr)
library(tidyr)

df %>%
pivot_longer(cols = -c(Locations, seasons)) %>%
group_by(Locations, seasons, name) %>%
summarise(Sum = sum(value, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = "name", values_from = "Sum")

Result:

# A tibble: 9 x 4
Locations seasons ani1 ani2
<chr> <int> <int> <int>
1 A 2 2 0
2 A 3 1 1
3 A 4 1 1
4 B 2 0 1
5 B 3 1 1
6 C 1 1 0
7 C 2 1 1
8 D 2 0 0
9 D 4 1 2

Sum values of column based on the unique values of another column

I believe you're looking for groupby. You can find documentation here

df.groupby('Column1')['Column2'].sum()
Column1 Column2
1 44
2 65
3 30
4 18

Sum a rows within a column for each unique combination r

Suggest to try dplyr. Quite a workhorse in data manipulation. From the desired output, you seem to try to get cumulative sum based on Week.

df = read.table(text="Week  Day  Value
1 1 1
1 2 3
1 3 4
2 1 2
2 2 2
2 3 3", header=T)

library(dplyr)
df %>% group_by(Week) %>% mutate(Sum = cumsum(Value))

# you get
Source: local data frame [6 x 4]
Groups: Week

Week Day Value Sum
1 1 1 1 1
2 1 2 3 4
3 1 3 4 8
4 2 1 2 2
5 2 2 2 4
6 2 3 3 7

Or you could try data.table, another tool which is great for data of larger size. Fast and memory efficient.

setDT(df)[, Sum := cumsum(Value), by = Week][]
Week Day Value Sum
1: 1 1 1 1
2: 1 2 3 4
3: 1 3 4 8
4: 2 1 2 2
5: 2 2 2 4
6: 2 3 3 7

Sum rows of each unique combination of variables in r

We can do a rowSums and convert to data.frame, set the names of the 'output' and cbind with the original dataset.

output <- as.data.frame(combn(ncol(df1), 3, FUN =function(x) rowSums(df1[x])))
names(output) <- paste0("sum_", combn(names(df1), 3, FUN = paste, collapse="_"))
cbind(df1, output)

Extracting unique column combination and finding sum and count in R

We can use dplyr

library(dplyr)
df1 %>%
group_by(Origin, Destination, Airline) %>%
dplyr::summarise(count = n(), TotalPassengers = sum(Passengers))
# Groups: Origin, Destination [2]
# Origin Destination Airline count TotalPassengers
# <chr> <chr> <chr> <int> <dbl>
#1 ABE ATL 9A 2 3
#2 ABE ATL DL 1 5
#3 NYC SFA AA 3 21
#4 NYC SFA DL 1 5

data

df1 <- data.frame(Origin = rep(c("ABE", "NYC"), c(3, 4)),
Destination = rep(c("ATL", "SFA"), c(3, 4)),
Airline = c("9A", "9A", "DL", "AA", "AA", "AA", "DL"),
Passengers = c(2, 1, 5, 4, 10, 7, 5))

Sum for unique combinations of variables in a data table

Use pmin and pmax..

require(data.table) # v1.9.6
dt = fread("Country1 Country2 Value Category
A A 4 1
A B 2 1
A C 9 1
B A 3 2
B D 4 1
C A 2 2
D C 7 2")
dt[, .(total = sum(Value)),
by=.(Country1 = pmin(Country1, Country2),
Country2 = pmax(Country1, Country2))]
# Country1 Country2 total
# 1: A A 4
# 2: A B 5
# 3: A C 11
# 4: B D 4
# 5: C D 7

If you want this within Category, just add it as well to by.

SUM(DISTINCT) Based on Other Columns


select sum (rate)
from yourTable
group by first_name, last_name

Edit

If you want to get all sum of those little "sums", you will get a sum of all table..

Select sum(rate) from YourTable

but, if for some reason are differents (if you use a where, for example)
and you need a sum for that select above, just do.

select sum(SumGrouped) from 
( select sum (rate) as 'SumGrouped'
from yourTable
group by first_name, last_name) T1

sum columns with different combinations in R?

Counting concurrent 1s in column pairs, we can use matrix muliplication:

xs = grep("X", names(df), value = T)
ys = grep("Y", names(df), value = T)

xm = as.matrix(df[xs])
ym = as.matrix(df[ys])
t(ym) %*% (xm)
# X_0 X_1 X_3 X_6 X_12
# Y_0 1 2 1 0 0
# Y_1 0 2 1 0 0
# Y_3 0 0 1 0 0
# Y_6 0 0 0 0 0
# Y_12 0 0 1 0 0

Counting all 1s in column pairs:

xs = grep("X", names(df), value = T)
ys = grep("Y", names(df), value = T)

sums = colSums(df)

t(outer(setNames(xs, xs), setNames(ys, ys), FUN = function(x, y) sums[x] + sums[y]))
# X_0 X_1 X_3 X_6 X_12
# Y_0 11 12 11 10 10
# Y_1 8 9 8 7 7
# Y_3 7 8 7 6 6
# Y_6 4 5 4 3 3
# Y_12 4 5 4 3 3

Using this data:

df = read.table(text = 'X_0 X_1 X_3 X_6 X_12 Y_0 Y_1 Y_3 Y_6 Y_12 
0 1 0 0 0 1 1 0 0 0
0 0 0 0 0 1 1 1 0 1
0 1 0 0 0 1 1 0 0 0
1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 1 1 0 0
0 0 0 0 0 1 1 1 1 0
0 0 0 0 0 1 1 1 1 0
0 0 0 0 0 1 0 1 1 1
0 0 1 0 0 1 1 1 0 1 ', header = T)


Related Topics



Leave a reply



Submit