Aggregate() Puts Multiple Output Columns in a Matrix Instead

aggregate prints column, but does not save it to global environment

Your problem is that aggregate results in matrix columns e.g. when applying multiple FUN=ctions. You need to additionally wrap a data.frame method around it, that's all.

ag1 <- aggregate(age ~ group, dt, function(x) c(mean=mean(x), sd=sd(x)))
str(ag1)
# 'data.frame': 2 obs. of 2 variables:
# $ group: int 1 2
# $ age : num [1:2, 1:2] 9.06 11 3.28 4.8
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:2] "mean" "sd"

Make data frame:

res <- do.call(data.frame, ag1)
res
# group age.mean age.sd
# 1 1 9.061935 3.283173
# 2 2 10.998478 4.798354

str(res)
# 'data.frame': 2 obs. of 3 variables:
# $ group : int 1 2
# $ age.mean: num 9.06 11
# $ age.sd : num 3.28 4.8

All in one:

res <- do.call(data.frame, aggregate(age ~ group, dt, function(x)
c(mean=mean(x), sd=sd(x)))

Data:

dt <- data.frame(age=rchisq(20,10),group=sample(1:2,20,rep=T))

Matrix Transformation in R - from aggregate output to outer-like matrix

You can try tapply

with(mtcars, tapply(disp, list(cyl, gear), FUN=mean))
# 3 4 5
#4 120.1000 102.625 107.7
#6 241.5000 163.800 145.0
#8 357.6167 NA 326.0

If you are looking to reshape the output of aggregate, we can use acast from reshape2

d1 <- aggregate(disp ~ cyl + gear, data = mtcars, FUN = mean )
acast(d1, cyl~gear, value.var='disp')

rearrange the output of the aggregate function into a new table


df1 <- data.frame(location,p,y)
library(reshape2)
dcast(df1, p ~ location, value.var = "y")

## p site1 site2 site3
## 1 A 1 1 1
## 2 B 2 2 2
## 3 C 3 NA 3
## 4 D NA NA 4

Apply several summary functions on several variables by group in one call

You can do it all in one step and get proper labeling:

> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
# id1 id2 val1.mn val1.n val2.mn val2.n
# 1 a x 1.5 2.0 6.5 2.0
# 2 b x 2.0 2.0 8.0 2.0
# 3 a y 3.5 2.0 7.0 2.0
# 4 b y 3.0 2.0 6.0 2.0

This creates a dataframe with two id columns and two matrix columns:

str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
'data.frame': 4 obs. of 4 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
$ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"

As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using do.call(data.frame, ...)

str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) ) 
)
'data.frame': 4 obs. of 6 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1.mn: num 1.5 2 3.5 3
$ val1.n : num 2 2 2 2
$ val2.mn: num 6.5 8 7 6
$ val2.n : num 2 2 2 2

This is the syntax for multiple variables on the LHS:

aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )

Aggregate randomly sampled columns for iteratively larger bin sizes

Based on how many columns you want in your final output, we can modify the approach but currently this gives all possible combination.

#Get column names of the matrices
all_cols <- colnames(mat)

#Select bin value from 2:ncol(mat)
total_out <- lapply(seq_len(ncol(mat))[-1], function(j) {
#Create all combinations taking j items at a time
temp <- combn(all_cols, j, function(x)
#Take rowSums for the current combination
#Also paste column names to assign column names later
list(rowSums(mat[, x]), paste0(x, collapse = "_")), simplify = FALSE)
#Combine rowSums matrix
new_mat <- sapply(temp, `[[`, 1)
#Assign column names
colnames(new_mat) <- sapply(temp, `[[`, 2)
#Return new matrix
new_mat
})

The current output looks like

total_out
#[[1]]
# 1_2 1_3 1_4 1_5 1_6 2_3 2_4 2_5 2_6 3_4 3_5 3_6 4_5 4_6 5_6
#a 3 1 1 1 2 2 2 2 3 0 0 1 0 1 1
#c 0 0 1 0 1 0 1 0 1 1 0 1 1 2 1
#f 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0
#h 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0
#i 1 1 0 1 0 2 1 2 1 1 2 1 1 0 1
#j 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0
#l 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
#m 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
#p 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1
#q 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
#s 0 0 0 1 1 0 0 1 1 0 1 1 1 1 2
#t 0 0 0 0 2 0 0 0 2 0 0 2 0 2 2
#u 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1
#v 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
#x 3 2 2 2 2 1 1 1 1 0 0 0 0 0 0
#z 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
#...
#....
#....
#[[5]]
# 1_2_3_4_5_6
#a 4
#c 2
#f 1
#h 1
#i 3
#j 1
#l 1
#m 1
#p 1
#q 1
#s 2
#t 2
#u 1
#v 1
#x 3
#z 1

Note that, there are total 5 (ncol - 1) matrices in total_out with number of columns as

length(total_out)
#[1] 5

sapply(total_out, ncol)
#[1] 15 20 15 6 1

Since, we know that the last element in the list is going to be a one-column matrix we can remove them and select random nc/2 columns from the remaining matrix.

total_out <- total_out[-length(total_out)]

lapply(total_out, function(x) {
nc <- ncol(x)
x[, sample(nc, ceiling(nc/2))]
})

Create matrix after aggregate a table in R

If your dataframe really does look like that then there is a serious mismatch between your column names and your code.

dom <- data.frame(tipoE=sample(c(letters[1:4],NA), 30, rep=TRUE),
mun=rep(c(3200102,3200106,3200310) , each=10),
x=runif(30, 100,200) )
dom

This reworking succeeds:

a = aggregate(dom$x, 
by = list(tipoE = addNA(dom$tipoE), mun =dom$
FUN = sum)
a

This use of xtabs then gives your requests:

> aT <- xtabs( x ~ tipoE + mun, a)
> aT
mun
tipoE 3200102 3200106 3200310
a 340.7700 367.1412 180.0594
b 280.9851 485.8780 798.4880
c 280.7682 236.3637 165.2295
d 176.6967 125.0732 132.5339
<NA> 376.4278 117.1063 251.2514

Aggregate multiple columns at once

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
group_by(id1, id2) %>%
summarise(across(starts_with('val'), mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))


Related Topics



Leave a reply



Submit