aggregate prints column, but does not save it to global environment
Your problem is that aggregate
results in matrix columns e.g. when applying multiple FUN=
ctions. You need to additionally wrap a data.frame
method around it, that's all.
ag1 <- aggregate(age ~ group, dt, function(x) c(mean=mean(x), sd=sd(x)))
str(ag1)
# 'data.frame': 2 obs. of 2 variables:
# $ group: int 1 2
# $ age : num [1:2, 1:2] 9.06 11 3.28 4.8
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:2] "mean" "sd"
Make data frame:
res <- do.call(data.frame, ag1)
res
# group age.mean age.sd
# 1 1 9.061935 3.283173
# 2 2 10.998478 4.798354
str(res)
# 'data.frame': 2 obs. of 3 variables:
# $ group : int 1 2
# $ age.mean: num 9.06 11
# $ age.sd : num 3.28 4.8
All in one:
res <- do.call(data.frame, aggregate(age ~ group, dt, function(x)
c(mean=mean(x), sd=sd(x)))
Data:
dt <- data.frame(age=rchisq(20,10),group=sample(1:2,20,rep=T))
Matrix Transformation in R - from aggregate output to outer-like matrix
You can try tapply
with(mtcars, tapply(disp, list(cyl, gear), FUN=mean))
# 3 4 5
#4 120.1000 102.625 107.7
#6 241.5000 163.800 145.0
#8 357.6167 NA 326.0
If you are looking to reshape
the output of aggregate
, we can use acast
from reshape2
d1 <- aggregate(disp ~ cyl + gear, data = mtcars, FUN = mean )
acast(d1, cyl~gear, value.var='disp')
rearrange the output of the aggregate function into a new table
df1 <- data.frame(location,p,y)
library(reshape2)
dcast(df1, p ~ location, value.var = "y")
## p site1 site2 site3
## 1 A 1 1 1
## 2 B 2 2 2
## 3 C 3 NA 3
## 4 D NA NA 4
Apply several summary functions on several variables by group in one call
You can do it all in one step and get proper labeling:
> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
# id1 id2 val1.mn val1.n val2.mn val2.n
# 1 a x 1.5 2.0 6.5 2.0
# 2 b x 2.0 2.0 8.0 2.0
# 3 a y 3.5 2.0 7.0 2.0
# 4 b y 3.0 2.0 6.0 2.0
This creates a dataframe with two id columns and two matrix columns:
str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
'data.frame': 4 obs. of 4 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
$ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using do.call(data.frame, ...)
str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
)
'data.frame': 4 obs. of 6 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1.mn: num 1.5 2 3.5 3
$ val1.n : num 2 2 2 2
$ val2.mn: num 6.5 8 7 6
$ val2.n : num 2 2 2 2
This is the syntax for multiple variables on the LHS:
aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
Aggregate randomly sampled columns for iteratively larger bin sizes
Based on how many columns you want in your final output, we can modify the approach but currently this gives all possible combination.
#Get column names of the matrices
all_cols <- colnames(mat)
#Select bin value from 2:ncol(mat)
total_out <- lapply(seq_len(ncol(mat))[-1], function(j) {
#Create all combinations taking j items at a time
temp <- combn(all_cols, j, function(x)
#Take rowSums for the current combination
#Also paste column names to assign column names later
list(rowSums(mat[, x]), paste0(x, collapse = "_")), simplify = FALSE)
#Combine rowSums matrix
new_mat <- sapply(temp, `[[`, 1)
#Assign column names
colnames(new_mat) <- sapply(temp, `[[`, 2)
#Return new matrix
new_mat
})
The current output looks like
total_out
#[[1]]
# 1_2 1_3 1_4 1_5 1_6 2_3 2_4 2_5 2_6 3_4 3_5 3_6 4_5 4_6 5_6
#a 3 1 1 1 2 2 2 2 3 0 0 1 0 1 1
#c 0 0 1 0 1 0 1 0 1 1 0 1 1 2 1
#f 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0
#h 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0
#i 1 1 0 1 0 2 1 2 1 1 2 1 1 0 1
#j 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0
#l 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
#m 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
#p 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1
#q 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
#s 0 0 0 1 1 0 0 1 1 0 1 1 1 1 2
#t 0 0 0 0 2 0 0 0 2 0 0 2 0 2 2
#u 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1
#v 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
#x 3 2 2 2 2 1 1 1 1 0 0 0 0 0 0
#z 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
#...
#....
#....
#[[5]]
# 1_2_3_4_5_6
#a 4
#c 2
#f 1
#h 1
#i 3
#j 1
#l 1
#m 1
#p 1
#q 1
#s 2
#t 2
#u 1
#v 1
#x 3
#z 1
Note that, there are total 5 (ncol - 1
) matrices in total_out
with number of columns as
length(total_out)
#[1] 5
sapply(total_out, ncol)
#[1] 15 20 15 6 1
Since, we know that the last element in the list is going to be a one-column matrix we can remove them and select random nc/2
columns from the remaining matrix.
total_out <- total_out[-length(total_out)]
lapply(total_out, function(x) {
nc <- ncol(x)
x[, sample(nc, ceiling(nc/2))]
})
Create matrix after aggregate a table in R
If your dataframe really does look like that then there is a serious mismatch between your column names and your code.
dom <- data.frame(tipoE=sample(c(letters[1:4],NA), 30, rep=TRUE),
mun=rep(c(3200102,3200106,3200310) , each=10),
x=runif(30, 100,200) )
dom
This reworking succeeds:
a = aggregate(dom$x,
by = list(tipoE = addNA(dom$tipoE), mun =dom$
FUN = sum)
a
This use of xtabs
then gives your requests:
> aT <- xtabs( x ~ tipoE + mun, a)
> aT
mun
tipoE 3200102 3200106 3200310
a 340.7700 367.1412 180.0594
b 280.9851 485.8780 798.4880
c 280.7682 236.3637 165.2295
d 176.6967 125.0732 132.5339
<NA> 376.4278 117.1063 251.2514
Aggregate multiple columns at once
We can use the formula method of aggregate
. The variables on the 'rhs' of ~
are the grouping variables while the .
represents all other variables in the 'df1' (from the example, we assume that we need the mean
for all the columns except the grouping), specify the dataset and the function (mean
).
aggregate(.~id1+id2, df1, mean)
Or we can use summarise_each
from dplyr
after grouping (group_by
)
library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))
Or using summarise
with across
(dplyr
devel version - ‘0.8.99.9000’
)
df1 %>%
group_by(id1, id2) %>%
summarise(across(starts_with('val'), mean))
Or another option is data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
, grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD
) and get the mean
.
library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]
data
df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b",
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
Related Topics
Network Chord Diagram Woes in R
How to Remove Columns from a Data.Frame
Remove All Line Breaks (Enter Symbols) from the String Using R
R on Windows: Character Encoding Hell
Efficient Row-Wise Operations on a Data.Table
Add Line Break to Axis Labels and Ticks in Ggplot
How to Group My Date Variable into Month/Year in R
Change Both Legend Titles in a Ggplot with Two Legends
How to Convert Data.Frame to Transactions for Arules
Adding Greek Character to Axis Title
Adding Labels to Ggplot Bar Chart
Generate Markdown Comments Within for Loop
Without Root Access, Run R with Tuned Blas When It Is Linked with Reference Blas
Alternative to R's 'Memory.Size()' in Linux
R: How to Filter/Subset a Sequence of Dates
Error in Model.Frame.Default: Variable Lengths Differ
Change Background and Text of Strips Associated to Multiple Panels in R/Lattice