How to Get Xtabs to Calculate Means Instead of Sums in R

How can I get xtabs to calculate means instead of sums in R?

Use aggregate:

xtabs(hp~cyl+gear,aggregate(hp~cyl+gear,mtcars,mean))
gear
cyl 3 4 5
4 97.0000 76.0000 102.0000
6 107.5000 116.5000 175.0000
8 194.1667 0.0000 299.5000

R table function: how to sum instead of counting?

We can use xtabs from base R. By default, the xtabs gets the sum

xtabs(Profit~Category+Mode, df)
# Mode
#Category K L M
# X 36 11 11
# Y 17 26 28
# Z 0 8 15

Or another base R option that is more flexible to apply different FUN is tapply.

with(df, tapply(Profit, list(Category, Mode), FUN=sum))
# K L M
#X 36 11 11
#Y 17 26 28
#Z NA 8 15

Or we can use dcast to convert from 'long' to 'wide' format. It is more flexible as we can specify the fun.aggregate to sum, mean, median etc.

library(reshape2)
dcast(df, Category~Mode, value.var='Profit', sum)
# Category K L M
#1 X 36 11 11
#2 Y 17 26 28
#3 Z 0 8 15

If you need it in the 'long' format, here is one option with data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Category' and 'Mode', we get the sum of 'Profit'.

library(data.table)
setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)]

Calculating the mean of values in tables using formulae [R]

You can do it in one line using cast from the reshape library

cast(mtcars, cyl ~ gear, value = 'hp', fun = mean)

Margin totals in xtabs

Aniko mentioned this in a comment, but it was never provided as an answer.

I found this independently and then noticed it was here in a comment, so credit to Aniko for getting it first.

addmargins is the answer:

For a given table one can specify which of the classifying factors to
expand by one or more levels to hold margins to be calculated. One may
for example form sums and means over the first dimension and medians
over the second. The resulting table will then have two extra levels
for the first dimension and one extra level for the second. The
default is to sum over all margins in the table. Other possibilities
may give results that depend on the order in which the margins are
computed. This is flagged in the printed output from the function.

How to sum a variable by group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))

Tabulating dataframe in R with summary statistics

You can do this using this code:

library(reshape2)

dcast(df,bg ~ sex,fun.aggregate = mean,value.var='income')

## bg F M
##1 NW 70.5 110.0
##2 W 86.5 170.5

crosstab and xtabs generates zeros instead of NAs

Since the function used when you are aggregating with xtabs is sum you can use it with its default setting of na.rm=FALSE by using tapply:

> with(WW1_Data, tapply(WW1_Pct_2, list(Site_Name,  Year), sum )  )
1996 2000 2008 2009 2010 2011
Alnön 0.3076923077 0.26086956522 NA NA 0.08333333333 NA
Ammarnäs 0.7500000000 NA NA NA NA 0.80
Anjan NA NA NA 0.5200000000 0.50000000000 NA
Bäcksand NA 0.08333333333 NA NA 0.37500000000 NA
Fittjebodarna NA NA 0.4000000000 0.4230769231 NA NA
Flatruet NA NA 0.8500000000 0.4838709677 0.56000000000 0.58
Glen NA NA 0.7777777778 0.5555555556 0.52173913043 NA
Idre 0.4000000000 NA NA NA 0.00000000000 NA

There is an as.data.frame method for tables (which are a special sort of matrix) which is the object class that tapply returns. Your use of as.data.frame is superfluous since the result of reshape was already a dataframe.



Related Topics



Leave a reply



Submit