How can I get xtabs to calculate means instead of sums in R?
Use aggregate
:
xtabs(hp~cyl+gear,aggregate(hp~cyl+gear,mtcars,mean))
gear
cyl 3 4 5
4 97.0000 76.0000 102.0000
6 107.5000 116.5000 175.0000
8 194.1667 0.0000 299.5000
R table function: how to sum instead of counting?
We can use xtabs
from base R
. By default, the xtabs
gets the sum
xtabs(Profit~Category+Mode, df)
# Mode
#Category K L M
# X 36 11 11
# Y 17 26 28
# Z 0 8 15
Or another base R
option that is more flexible to apply different FUN
is tapply
.
with(df, tapply(Profit, list(Category, Mode), FUN=sum))
# K L M
#X 36 11 11
#Y 17 26 28
#Z NA 8 15
Or we can use dcast
to convert from 'long' to 'wide' format. It is more flexible as we can specify the fun.aggregate
to sum
, mean
, median
etc.
library(reshape2)
dcast(df, Category~Mode, value.var='Profit', sum)
# Category K L M
#1 X 36 11 11
#2 Y 17 26 28
#3 Z 0 8 15
If you need it in the 'long' format, here is one option with data.table
. We convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'Category' and 'Mode', we get the sum
of 'Profit'.
library(data.table)
setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)]
Calculating the mean of values in tables using formulae [R]
You can do it in one line using cast
from the reshape
library
cast(mtcars, cyl ~ gear, value = 'hp', fun = mean)
Margin totals in xtabs
Aniko mentioned this in a comment, but it was never provided as an answer.
I found this independently and then noticed it was here in a comment, so credit to Aniko for getting it first.
addmargins
is the answer:
For a given table one can specify which of the classifying factors to
expand by one or more levels to hold margins to be calculated. One may
for example form sums and means over the first dimension and medians
over the second. The resulting table will then have two extra levels
for the first dimension and one extra level for the second. The
default is to sum over all margins in the table. Other possibilities
may give results that depend on the order in which the margins are
computed. This is flagged in the printed output from the function.
How to sum a variable by group
Using aggregate
:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
In the example above, multiple dimensions can be specified in the list
. Multiple aggregated metrics of the same data type can be incorporated via cbind
:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(embedding @thelatemail comment), aggregate
has a formula interface too
aggregate(Frequency ~ Category, x, sum)
Or if you want to aggregate multiple columns, you could use the .
notation (works for one column too)
aggregate(. ~ Category, x, sum)
or tapply
:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
Using this data:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
Tabulating dataframe in R with summary statistics
You can do this using this code:
library(reshape2)
dcast(df,bg ~ sex,fun.aggregate = mean,value.var='income')
## bg F M
##1 NW 70.5 110.0
##2 W 86.5 170.5
crosstab and xtabs generates zeros instead of NAs
Since the function used when you are aggregating with xtabs is sum
you can use it with its default setting of na.rm=FALSE by using tapply:
> with(WW1_Data, tapply(WW1_Pct_2, list(Site_Name, Year), sum ) )
1996 2000 2008 2009 2010 2011
Alnön 0.3076923077 0.26086956522 NA NA 0.08333333333 NA
Ammarnäs 0.7500000000 NA NA NA NA 0.80
Anjan NA NA NA 0.5200000000 0.50000000000 NA
Bäcksand NA 0.08333333333 NA NA 0.37500000000 NA
Fittjebodarna NA NA 0.4000000000 0.4230769231 NA NA
Flatruet NA NA 0.8500000000 0.4838709677 0.56000000000 0.58
Glen NA NA 0.7777777778 0.5555555556 0.52173913043 NA
Idre 0.4000000000 NA NA NA 0.00000000000 NA
There is an as.data.frame
method for tables (which are a special sort of matrix) which is the object class that tapply
returns. Your use of as.data.frame
is superfluous since the result of reshape
was already a dataframe.
Related Topics
How to Convert a String in a Function into an Object
How to Open an .Xlsb File in R
How to Plot Logit and Probit in Ggplot2
Ggplot Graphing of Proportions of Observations Within Categories
How to Leave the R Browser() Mode in the Console Window
How to Create Datatable with Complex Header in R Shiny
Wrap Long Text in Kable Table Column
How to Split a Data Frame by Rows, and Then Process the Blocks
Why Does Median Trip Up Data.Table (Integer Versus Double)
Using Grid and Ggplot2 to Create Join Plots Using R
Why Are Xs Added to Data Frame Variable Names When Using Read.Csv
Remove Parenthesis from a Character String
Shade Region Between Two Lines with Ggplot
Connecting Points with Lines in Ggplot2 in R
How to Unscale the Coefficients from an Lmer()-Model Fitted with a Scaled Response
R Ggplot Ordering Bars Within Groups
Knitr: How to Show Two Plots of Different Sizes Next to Each Other