R Table Function: How to Sum Instead of Counting

R table function: how to sum instead of counting?

We can use xtabs from base R. By default, the xtabs gets the sum

xtabs(Profit~Category+Mode, df)
# Mode
#Category K L M
# X 36 11 11
# Y 17 26 28
# Z 0 8 15

Or another base R option that is more flexible to apply different FUN is tapply.

with(df, tapply(Profit, list(Category, Mode), FUN=sum))
# K L M
#X 36 11 11
#Y 17 26 28
#Z NA 8 15

Or we can use dcast to convert from 'long' to 'wide' format. It is more flexible as we can specify the fun.aggregate to sum, mean, median etc.

library(reshape2)
dcast(df, Category~Mode, value.var='Profit', sum)
# Category K L M
#1 X 36 11 11
#2 Y 17 26 28
#3 Z 0 8 15

If you need it in the 'long' format, here is one option with data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Category' and 'Mode', we get the sum of 'Profit'.

library(data.table)
setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)]

Is there any way that i can perform sum instead of count using cut or any other function using R

I would probably try with :

df1$date_time <- as.character(df1$date_time, stirngAsFactors = F) 

df1$date <- str_split_fixed(df1$date_time, " ")[,1]

df1$date <- as.Date(df1$date, "%d/%m/%Y")

df1$time <- str_split_fixed(df1$date_time, " ")[,2]

total_table <- aggregate(df1$value_column, by = list(df1$date, df1$time), FUN =sum)

Probably this is a bit big but I can use both date and time for any further analysis.

Add column sum to table

You could use cbind and rowSums afterwards:

tab <- table(df$Company,df$Marital)
tab <- cbind(tab, Total = rowSums(tab))

You can also use the built-in addmargins function:

tab <- addmargins(table(df$Company,df$Marital), 2)

(The 2 means to add a sum column, but not a sum row- you can omit it and you'll get both).

R data.table. If column x then row count, else sum

Here is one possibility. You could combine the c and lapply functions together as follows (note that .N is the row count for each group):

df[, c(.(user=.N), lapply(.SD, sum)), by=date, .SDcols=c("turnover", "profit")]

# date user turnover profit
# 1: 1 2 5 3
# 2: 2 2 9 7

How to sum a variable by group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))

Use data.table to count and aggregate / summarize a column

The post you are referring to gives a method on how to apply one aggregation method to several columns. If you want to apply different aggregation methods to different columns, you can do:

dat[, .(count = .N, var = sum(VAR)), by = MNTH]

this results in:

     MNTH count var
1: 201501 4 2
2: 201502 3 0
3: 201503 5 2
4: 201504 4 2

You can also add these values to your existing dataset by updating your dataset by reference:

dat[, `:=` (count = .N, var = sum(VAR)), by = MNTH]

this results in:

> dat
MNTH VAR count var
1: 201501 1 4 2
2: 201501 1 4 2
3: 201501 0 4 2
4: 201501 0 4 2
5: 201502 0 3 0
6: 201502 0 3 0
7: 201502 0 3 0
8: 201503 0 5 2
9: 201503 0 5 2
10: 201503 1 5 2
11: 201503 1 5 2
12: 201503 0 5 2
13: 201504 1 4 2
14: 201504 0 4 2
15: 201504 1 4 2
16: 201504 0 4 2

For further reading about how to use data.table syntax, see the Getting started guides on the GitHub wiki.

How do I sum recurring values according to a level in a column and output a table of counts?

We can use table to create a cross-tabulation of categories and animals, transpose, convert to data.frame, group_by all categories and count the frequency per combination:

library(dplyr)
library(tidyr)

as.data.frame.matrix(t(table(dat))) %>%
group_by_all() %>%
summarize(Count = n())

Result:

# A tibble: 5 x 4
# Groups: A, B [?]
A B C Count
<int> <int> <int> <int>
1 0 0 1 2
2 0 1 1 2
3 1 0 0 2
4 1 1 0 2
5 1 1 1 1

Edit (thanks to @C. Braun). Here is how to also include the zero A, B, C combinations:

as.data.frame.matrix(t(table(dat))) %>%
bind_rows(expand.grid(A = c(0,1), B = c(0,1), C = c(0,1))) %>%
group_by_all() %>%
summarize(Count = n()-1)

or with complete, as suggested by @Ryan:

as.data.frame.matrix(t(table(dat))) %>%
mutate(non_missing = 1) %>%
complete(A, B, C) %>%
group_by(A, B, C) %>%
summarize(Count = sum(ifelse(is.na(non_missing), 0, 1)))

Result:

# A tibble: 8 x 4
# Groups: A, B [?]
A B C Count
<dbl> <dbl> <dbl> <dbl>
1 0 0 0 0
2 0 0 1 2
3 0 1 0 0
4 0 1 1 2
5 1 0 0 2
6 1 0 1 0
7 1 1 0 2
8 1 1 1 1

How can I aggregate data with sum, mean and count for each column respectively?

Such calculations are easy using dplyr

library(dplyr)

df %>%
group_by(product) %>%
summarise(frequency = n(),
reorder_rate = sum(reorder)/frequency,
mean_sequence = sum(order_sequence)/frequency)

# A tibble: 3 x 4
# product frequency reorder_rate mean_sequence
# <fct> <int> <dbl> <dbl>
#1 egg 4 0.75 1.25
#2 fruit 4 0.75 2.75
#3 meat 2 0.5 3

However, you can also use data.table

library(data.table)

setDT(df)[, .(frequency = .N, reorder_rate = sum(reorder)/.N,
mean_sequence = sum(order_sequence)/.N), by = product]


Related Topics



Leave a reply



Submit