Calculate Sum of a List of Variables by Group

Calculate sum of a list of variables by group

Your test data doesn't match the example you gave, but regardless - you can take advantage of the fact that data.table() has an operator named .SD for "subset of data. So this should work:

x[, lapply(.SD, sum), by = ID]
#----
ID Count Count2 Count3
1: 210 13 5 5
2: 3917 5 5 5

This is actually covered in the FAQ: type vignette("datatable-faq", package="data.table") or find it online.

How to sum a variable by group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))

Grouping list by variables and counting sums of object variables

Assuming your input is:

List<Product> products = Arrays.asList(
new Product(100.0, 1.0),
new Product(100.0, 1.0),
new Product(100.0, 1.23),
new Product(100.0, 1.23),
new Product(100.0, 1.23),
new Product(100.0, 1.08),
new Product(100.0, 1.08)
);

You can do:

Map<Double, Product> vatToReducedProduct = products.stream()
.collect(Collectors.toMap(Product::getVat,
Function.identity(),
(p1, p2) -> new Product(p1.getNetPrice() + p2.getNetPrice(), p1.getVat())));

Output:

{
1.0=Product{netPrice=200.0, vat=1.0, vatAmount=0.0, grossPrice=200.0},
1.23=Product{netPrice=300.0, vat=1.23, vatAmount=69.0, grossPrice=369.0},
1.08=Product{netPrice=200.0, vat=1.08, vatAmount=16.0, grossPrice=216.0}
}

Group by multiple variables by shortest value and sum in Java

The SQL query seems to be missing aggregation function MIN applied to orderId:

SELECT MIN(orderId), itemId, itemName, itemGenre, SUM(number) as number
FROM item
GROUP BY itemId, itemName, itemGenre;

To implement similar functionality using Stream API Collectors.toMap with merge function should to be used where merge function selects a min of orderId and sums up number. It may be also better to use LinkedHashMap to maintain insertion order.

Also, a copy constructor should be implemented in Item class or clone the items from items list when selecting a value to be placed to the intermediate map.

Then the values of this map are converted into ArrayList.

List<Item> summary = new ArrayList<>(items
.stream()
.collect(Collectors.toMap(
// compound "group by" key using fields for brevity
i -> String.join("|", i.itemId, i.itemName, i.itemGenre),
i -> i.clone(), // or Item::new if copy constructor is implemented
// or verbose i -> new Item(i.orderId, i.itemId, ...)
(i1, i2) -> {
if (i1.orderId.compareToIgnoreCase(i2.orderId) < 0) {
i1.setOrderId(i2.orderId);
}
i1.setNumber(i1.number + i2.number);
return i1;
},
LinkedHashMap::new
),
)
.values() // Collection<Item>
);

Or, a new object may be created in merge function:

List<Item> summary = new ArrayList<>(items
.stream()
.collect(Collectors.toMap(
// compound "group by" key using fields for brevity
i -> String.join("|", i.itemId, i.itemName, i.itemGenre),
i -> i, // or Function.identity()
(i1, i2) -> new Item( // merge function
i1.orderId.compareToIgnoreCase(i2.orderId) <= 0 ? i1.orderId : i2.orderId,
i1.itemId, i1.itemName, i1.itemGenre, // "group by" fields
i1.number + i2.number
),
LinkedHashMap::new
))
.values() // Collection<Item>
);

Divide group sum by total sum

You can use the following code:

df <- read.table(text="Group    count
A 20
A 10
B 30
B 35
C 50
C 60", header = TRUE)

library(dplyr)
df %>%
group_by(Group) %>%
summarise(avg = mean(count)) %>%
ungroup() %>%
mutate(prcnt_of_total = prop.table(avg))
#> # A tibble: 3 × 3
#> Group avg prcnt_of_total
#> <chr> <dbl> <dbl>
#> 1 A 15 0.146
#> 2 B 32.5 0.317
#> 3 C 55 0.537

Created on 2022-07-14 by the reprex package (v2.0.1)

How to sum rows based on group and sub-group using dplyr in R?

First group by Country and then mutate with sum:

library(dplyr)

transportation %>%
group_by(Country) %>%
mutate(country_sum = sum(Energy))

 Country Mode  Energy country_sum
<chr> <chr> <dbl> <dbl>
1 A Car 10000 39000
2 A Train 9000 39000
3 A Plane 20000 39000
4 B Car 200000 810000
5 B Train 160000 810000
6 B Plane 450000 810000


Related Topics



Leave a reply



Submit