R table function: how to sum instead of counting?
We can use xtabs
from base R
. By default, the xtabs
gets the sum
xtabs(Profit~Category+Mode, df)
# Mode
#Category K L M
# X 36 11 11
# Y 17 26 28
# Z 0 8 15
Or another base R
option that is more flexible to apply different FUN
is tapply
.
with(df, tapply(Profit, list(Category, Mode), FUN=sum))
# K L M
#X 36 11 11
#Y 17 26 28
#Z NA 8 15
Or we can use dcast
to convert from 'long' to 'wide' format. It is more flexible as we can specify the fun.aggregate
to sum
, mean
, median
etc.
library(reshape2)
dcast(df, Category~Mode, value.var='Profit', sum)
# Category K L M
#1 X 36 11 11
#2 Y 17 26 28
#3 Z 0 8 15
If you need it in the 'long' format, here is one option with data.table
. We convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'Category' and 'Mode', we get the sum
of 'Profit'.
library(data.table)
setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)]
Is there any way that i can perform sum instead of count using cut or any other function using R
I would probably try with :
df1$date_time <- as.character(df1$date_time, stirngAsFactors = F)
df1$date <- str_split_fixed(df1$date_time, " ")[,1]
df1$date <- as.Date(df1$date, "%d/%m/%Y")
df1$time <- str_split_fixed(df1$date_time, " ")[,2]
total_table <- aggregate(df1$value_column, by = list(df1$date, df1$time), FUN =sum)
Probably this is a bit big but I can use both date and time for any further analysis.
Add column sum to table
You could use cbind
and rowSums
afterwards:
tab <- table(df$Company,df$Marital)
tab <- cbind(tab, Total = rowSums(tab))
You can also use the built-in addmargins
function:
tab <- addmargins(table(df$Company,df$Marital), 2)
(The 2
means to add a sum column, but not a sum row- you can omit it and you'll get both).
R data.table. If column x then row count, else sum
Here is one possibility. You could combine the c
and lapply
functions together as follows (note that .N
is the row count for each group):
df[, c(.(user=.N), lapply(.SD, sum)), by=date, .SDcols=c("turnover", "profit")]
# date user turnover profit
# 1: 1 2 5 3
# 2: 2 2 9 7
How to sum a variable by group
Using aggregate
:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
In the example above, multiple dimensions can be specified in the list
. Multiple aggregated metrics of the same data type can be incorporated via cbind
:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(embedding @thelatemail comment), aggregate
has a formula interface too
aggregate(Frequency ~ Category, x, sum)
Or if you want to aggregate multiple columns, you could use the .
notation (works for one column too)
aggregate(. ~ Category, x, sum)
or tapply
:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
Using this data:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
Use data.table to count and aggregate / summarize a column
The post you are referring to gives a method on how to apply one aggregation method to several columns. If you want to apply different aggregation methods to different columns, you can do:
dat[, .(count = .N, var = sum(VAR)), by = MNTH]
this results in:
MNTH count var
1: 201501 4 2
2: 201502 3 0
3: 201503 5 2
4: 201504 4 2
You can also add these values to your existing dataset by updating your dataset by reference:
dat[, `:=` (count = .N, var = sum(VAR)), by = MNTH]
this results in:
> dat
MNTH VAR count var
1: 201501 1 4 2
2: 201501 1 4 2
3: 201501 0 4 2
4: 201501 0 4 2
5: 201502 0 3 0
6: 201502 0 3 0
7: 201502 0 3 0
8: 201503 0 5 2
9: 201503 0 5 2
10: 201503 1 5 2
11: 201503 1 5 2
12: 201503 0 5 2
13: 201504 1 4 2
14: 201504 0 4 2
15: 201504 1 4 2
16: 201504 0 4 2
For further reading about how to use data.table syntax, see the Getting started guides on the GitHub wiki.
How do I sum recurring values according to a level in a column and output a table of counts?
We can use table
to create a cross-tabulation of categories
and animals
, transpose, convert to data.frame, group_by
all categories
and count the frequency per combination:
library(dplyr)
library(tidyr)
as.data.frame.matrix(t(table(dat))) %>%
group_by_all() %>%
summarize(Count = n())
Result:
# A tibble: 5 x 4
# Groups: A, B [?]
A B C Count
<int> <int> <int> <int>
1 0 0 1 2
2 0 1 1 2
3 1 0 0 2
4 1 1 0 2
5 1 1 1 1
Edit (thanks to @C. Braun). Here is how to also include the zero A, B, C combinations:
as.data.frame.matrix(t(table(dat))) %>%
bind_rows(expand.grid(A = c(0,1), B = c(0,1), C = c(0,1))) %>%
group_by_all() %>%
summarize(Count = n()-1)
or with complete
, as suggested by @Ryan:
as.data.frame.matrix(t(table(dat))) %>%
mutate(non_missing = 1) %>%
complete(A, B, C) %>%
group_by(A, B, C) %>%
summarize(Count = sum(ifelse(is.na(non_missing), 0, 1)))
Result:
# A tibble: 8 x 4
# Groups: A, B [?]
A B C Count
<dbl> <dbl> <dbl> <dbl>
1 0 0 0 0
2 0 0 1 2
3 0 1 0 0
4 0 1 1 2
5 1 0 0 2
6 1 0 1 0
7 1 1 0 2
8 1 1 1 1
How can I aggregate data with sum, mean and count for each column respectively?
Such calculations are easy using dplyr
library(dplyr)
df %>%
group_by(product) %>%
summarise(frequency = n(),
reorder_rate = sum(reorder)/frequency,
mean_sequence = sum(order_sequence)/frequency)
# A tibble: 3 x 4
# product frequency reorder_rate mean_sequence
# <fct> <int> <dbl> <dbl>
#1 egg 4 0.75 1.25
#2 fruit 4 0.75 2.75
#3 meat 2 0.5 3
However, you can also use data.table
library(data.table)
setDT(df)[, .(frequency = .N, reorder_rate = sum(reorder)/.N,
mean_sequence = sum(order_sequence)/.N), by = product]
Related Topics
Match and Replace Multiple Strings in a Vector of Text Without Looping in R
How to Extend '==' Behavior to Vectors That Include Nas
Cumulative Sum for Positive Numbers Only
R Remove Parts of Column Name After Certain Characters
Pass String to Facet_Grid:Ggplot2
Group by Columns and Summarize a Column into a List
How to Get the Zoom Level from the Leaflet Map in R/Shiny
Rcpp Can't Find Rtools: "Error 1 Occurred Building Shared Library"
How to Read Data from Cassandra with R
Removing Rows in R Based on Values in a Single Column
Fit a No-Intercept Model in Caret
Weighted Pearson's Correlation
Install the Package That Has Been Removed from the Cran Repository Easily
About Gforce in Data.Table 1.9.2
Producing a Boxplot in Ggplot2 Using Summary Statistics