Calculate sum of a list of variables by group
Your test data doesn't match the example you gave, but regardless - you can take advantage of the fact that data.table()
has an operator named .SD
for "subset of data. So this should work:
x[, lapply(.SD, sum), by = ID]
#----
ID Count Count2 Count3
1: 210 13 5 5
2: 3917 5 5 5
This is actually covered in the FAQ: type vignette("datatable-faq", package="data.table")
or find it online.
How to sum a variable by group
Using aggregate
:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
In the example above, multiple dimensions can be specified in the list
. Multiple aggregated metrics of the same data type can be incorporated via cbind
:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(embedding @thelatemail comment), aggregate
has a formula interface too
aggregate(Frequency ~ Category, x, sum)
Or if you want to aggregate multiple columns, you could use the .
notation (works for one column too)
aggregate(. ~ Category, x, sum)
or tapply
:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
Using this data:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
Grouping list by variables and counting sums of object variables
Assuming your input is:
List<Product> products = Arrays.asList(
new Product(100.0, 1.0),
new Product(100.0, 1.0),
new Product(100.0, 1.23),
new Product(100.0, 1.23),
new Product(100.0, 1.23),
new Product(100.0, 1.08),
new Product(100.0, 1.08)
);
You can do:
Map<Double, Product> vatToReducedProduct = products.stream()
.collect(Collectors.toMap(Product::getVat,
Function.identity(),
(p1, p2) -> new Product(p1.getNetPrice() + p2.getNetPrice(), p1.getVat())));
Output:
{
1.0=Product{netPrice=200.0, vat=1.0, vatAmount=0.0, grossPrice=200.0},
1.23=Product{netPrice=300.0, vat=1.23, vatAmount=69.0, grossPrice=369.0},
1.08=Product{netPrice=200.0, vat=1.08, vatAmount=16.0, grossPrice=216.0}
}
Group by multiple variables by shortest value and sum in Java
The SQL query seems to be missing aggregation function MIN
applied to orderId
:
SELECT MIN(orderId), itemId, itemName, itemGenre, SUM(number) as number
FROM item
GROUP BY itemId, itemName, itemGenre;
To implement similar functionality using Stream API Collectors.toMap
with merge function should to be used where merge function selects a min of orderId
and sums up number
. It may be also better to use LinkedHashMap
to maintain insertion order.
Also, a copy constructor should be implemented in Item
class or clone the items from items
list when selecting a value to be placed to the intermediate map.
Then the values of this map are converted into ArrayList
.
List<Item> summary = new ArrayList<>(items
.stream()
.collect(Collectors.toMap(
// compound "group by" key using fields for brevity
i -> String.join("|", i.itemId, i.itemName, i.itemGenre),
i -> i.clone(), // or Item::new if copy constructor is implemented
// or verbose i -> new Item(i.orderId, i.itemId, ...)
(i1, i2) -> {
if (i1.orderId.compareToIgnoreCase(i2.orderId) < 0) {
i1.setOrderId(i2.orderId);
}
i1.setNumber(i1.number + i2.number);
return i1;
},
LinkedHashMap::new
),
)
.values() // Collection<Item>
);
Or, a new object may be created in merge function:
List<Item> summary = new ArrayList<>(items
.stream()
.collect(Collectors.toMap(
// compound "group by" key using fields for brevity
i -> String.join("|", i.itemId, i.itemName, i.itemGenre),
i -> i, // or Function.identity()
(i1, i2) -> new Item( // merge function
i1.orderId.compareToIgnoreCase(i2.orderId) <= 0 ? i1.orderId : i2.orderId,
i1.itemId, i1.itemName, i1.itemGenre, // "group by" fields
i1.number + i2.number
),
LinkedHashMap::new
))
.values() // Collection<Item>
);
Divide group sum by total sum
You can use the following code:
df <- read.table(text="Group count
A 20
A 10
B 30
B 35
C 50
C 60", header = TRUE)
library(dplyr)
df %>%
group_by(Group) %>%
summarise(avg = mean(count)) %>%
ungroup() %>%
mutate(prcnt_of_total = prop.table(avg))
#> # A tibble: 3 × 3
#> Group avg prcnt_of_total
#> <chr> <dbl> <dbl>
#> 1 A 15 0.146
#> 2 B 32.5 0.317
#> 3 C 55 0.537
Created on 2022-07-14 by the reprex package (v2.0.1)
How to sum rows based on group and sub-group using dplyr in R?
First group by Country
and then mutate
with sum
:
library(dplyr)
transportation %>%
group_by(Country) %>%
mutate(country_sum = sum(Energy))
Country Mode Energy country_sum
<chr> <chr> <dbl> <dbl>
1 A Car 10000 39000
2 A Train 9000 39000
3 A Plane 20000 39000
4 B Car 200000 810000
5 B Train 160000 810000
6 B Plane 450000 810000
Related Topics
Replace Na with Groups Mean in a Non Specified Number of Columns
Can't Change Fonts in Ggplot/Geom_Text
Use Superscripts in R Axis Labels
Adjusting Width of Tables Made with Kable() in Rmarkdown Documents
How to Change the Resolution of a Raster Layer in R
Reshaping an Array to Data.Frame
Get the Number of Lines in a Text File Using R
How to Create Thiessen Polygons from Points Using R Packages
Linear Model and Dplyr - a Better Solution
Devtools::Install_Github Fails with Ca Cert Error
Get All the Rows with Rownames Starting with Abc111
Easier Way to Plot the Cumulative Frequency Distribution in Ggplot
Showing Different Axis Labels Using Ggplot2 with Facet_Wrap
Conditional Assignment of One Variable to the Value of One of Two Other Variables
Suppress Messages Displayed by "Print" Instead of "Message" or "Warning" in R
How to Put Exact Number of Decimal Places on Label Ggplot Bar Chart