How to Get Xtabs to Calculate Means Instead of Sums in R

How can I get xtabs to calculate means instead of sums in R?

Use aggregate:

xtabs(hp~cyl+gear,aggregate(hp~cyl+gear,mtcars,mean))
   gear
cyl        3        4        5
  4  97.0000  76.0000 102.0000
  6 107.5000 116.5000 175.0000
  8 194.1667   0.0000 299.5000

R table function: how to sum instead of counting?

We can use xtabs from base R. By default, the xtabs gets the sum

xtabs(Profit~Category+Mode, df)
#           Mode
#Category  K  L  M
#       X 36 11 11
#       Y 17 26 28
#       Z  0  8 15

Or another base R option that is more flexible to apply different FUN is tapply.

with(df, tapply(Profit, list(Category, Mode), FUN=sum))
#  K  L  M
#X 36 11 11
#Y 17 26 28
#Z NA  8 15

Or we can use dcast to convert from 'long' to 'wide' format. It is more flexible as we can specify the fun.aggregate to sum, mean, median etc.

library(reshape2)
dcast(df, Category~Mode, value.var='Profit', sum)
# Category  K  L  M
#1        X 36 11 11
#2        Y 17 26 28
#3        Z  0  8 15

If you need it in the 'long' format, here is one option with data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Category' and 'Mode', we get the sum of 'Profit'.

library(data.table)
setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)]

Calculating the mean of values in tables using formulae [R]

You can do it in one line using cast from the reshape library

cast(mtcars, cyl ~ gear, value = 'hp', fun = mean)

Margin totals in xtabs

Aniko mentioned this in a comment, but it was never provided as an answer.

I found this independently and then noticed it was here in a comment, so credit to Aniko for getting it first.

addmargins is the answer:

For a given table one can specify which of the classifying factors to
expand by one or more levels to hold margins to be calculated. One may
for example form sums and means over the first dimension and medians
over the second. The resulting table will then have two extra levels
for the first dimension and one extra level for the second. The
default is to sum over all margins in the table. Other possibilities
may give results that depend on the order in which the margins are
computed. This is flagged in the printed output from the function.

How to sum a variable by group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
  Category  x
1    First 30
2   Second  5
3    Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
 First Second  Third 
    30      5     34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                      "Third", "Third", "Second")), 
                    Frequency=c(10,15,5,2,14,20,3))

Tabulating dataframe in R with summary statistics

You can do this using this code:

library(reshape2)

dcast(df,bg ~ sex,fun.aggregate = mean,value.var='income')

##  bg    F     M
##1 NW 70.5 110.0
##2  W 86.5 170.5

crosstab and xtabs generates zeros instead of NAs

Since the function used when you are aggregating with xtabs is sum you can use it with its default setting of na.rm=FALSE by using tapply:

> with(WW1_Data, tapply(WW1_Pct_2, list(Site_Name,  Year), sum )  )
                      1996          2000         2008         2009          2010 2011
Alnön         0.3076923077 0.26086956522           NA           NA 0.08333333333   NA
Ammarnäs      0.7500000000            NA           NA           NA            NA 0.80
Anjan                   NA            NA           NA 0.5200000000 0.50000000000   NA
Bäcksand                NA 0.08333333333           NA           NA 0.37500000000   NA
Fittjebodarna           NA            NA 0.4000000000 0.4230769231            NA   NA
Flatruet                NA            NA 0.8500000000 0.4838709677 0.56000000000 0.58
Glen                    NA            NA 0.7777777778 0.5555555556 0.52173913043   NA
Idre          0.4000000000            NA           NA           NA 0.00000000000   NA

There is an as.data.frame method for tables (which are a special sort of matrix) which is the object class that tapply returns. Your use of as.data.frame is superfluous since the result of reshape was already a dataframe.