Multiple Functions in a Single Tapply or Aggregate Statement

Multiple functions in a single tapply or aggregate statement

But these should have:

with(my.Data, aggregate(weight, list(age, sex), function(x) { c(MEAN=mean(x), SD=sd(x) )}))

with(my.Data, tapply(weight, list(age, sex), function(x) { c(mean(x) , sd(x) )} ))
# Not a nice structure but the results are in there

with(my.Data, aggregate(weight ~ age + sex, FUN = function(x) c( SD = sd(x), MN= mean(x) ) ) )
age sex weight.SD weight.MN
1 adult female 3.535534 97.500000
2 young female NA 80.000000
3 adult male NA 90.000000
4 young male NA 75.

The principle to be adhered to is to have your function return "one thing" which could be either a vector or a list but cannot be the successive invocation of two function calls.

Apply multiple functions to column using tapply

You can certainly do stuff like this using ddply from the plyr package:

dat <- data.frame(x = rep(letters[1:3],3),y = 1:9)

ddply(dat,.(x),summarise,total = NROW(piece), count = sum(y))
x total count
1 a 3 12
2 b 3 15
3 c 3 18

You can keep listing more summary functions, beyond just two, if you like. Note I'm being a little tricky here in calling NROW on an internal variable in ddply called piece. You could have just done something like length(y) instead. (And probably should; referencing the internal variable piece isn't guaranteed to work in future versions, I think. Do as I say, not as I do and just use length().)

using of multiple functions using apply family, aggregation, with etc

each in package plyr does the trick for you too:

library(plyr)
df <- matrix(data=rnorm(50), 10, 5)
aaply(df, 2, each(min, mean, max, median, sum))

If you want another input/output format, you can play with the different functions from dplyr.

Grouping functions (tapply, by, aggregate) and the *apply family

R has many *apply functions which are ably described in the help files (e.g. ?apply). There are enough of them, though, that beginning useRs may have difficulty deciding which one is appropriate for their situation or even remembering them all. They may have a general sense that "I should be using an *apply function here", but it can be tough to keep them all straight at first.

Despite the fact (noted in other answers) that much of the functionality of the *apply family is covered by the extremely popular plyr package, the base functions remain useful and worth knowing.

This answer is intended to act as a sort of signpost for new useRs to help direct them to the correct *apply function for their particular problem. Note, this is not intended to simply regurgitate or replace the R documentation! The hope is that this answer helps you to decide which *apply function suits your situation and then it is up to you to research it further. With one exception, performance differences will not be addressed.

  • apply - When you want to apply a function to the rows or columns
    of a matrix (and higher-dimensional analogues); not generally advisable for data frames as it will coerce to a matrix first.

     # Two dimensional matrix
    M <- matrix(seq(1,16), 4, 4)

    # apply min to rows
    apply(M, 1, min)
    [1] 1 2 3 4

    # apply max to columns
    apply(M, 2, max)
    [1] 4 8 12 16

    # 3 dimensional array
    M <- array( seq(32), dim = c(4,4,2))

    # Apply sum across each M[*, , ] - i.e Sum across 2nd and 3rd dimension
    apply(M, 1, sum)
    # Result is one-dimensional
    [1] 120 128 136 144

    # Apply sum across each M[*, *, ] - i.e Sum across 3rd dimension
    apply(M, c(1,2), sum)
    # Result is two-dimensional
    [,1] [,2] [,3] [,4]
    [1,] 18 26 34 42
    [2,] 20 28 36 44
    [3,] 22 30 38 46
    [4,] 24 32 40 48

    If you want row/column means or sums for a 2D matrix, be sure to
    investigate the highly optimized, lightning-quick colMeans,
    rowMeans, colSums, rowSums.

  • lapply - When you want to apply a function to each element of a
    list in turn and get a list back.

    This is the workhorse of many of the other *apply functions. Peel
    back their code and you will often find lapply underneath.

     x <- list(a = 1, b = 1:3, c = 10:100) 
    lapply(x, FUN = length)
    $a
    [1] 1
    $b
    [1] 3
    $c
    [1] 91
    lapply(x, FUN = sum)
    $a
    [1] 1
    $b
    [1] 6
    $c
    [1] 5005
  • sapply - When you want to apply a function to each element of a
    list in turn, but you want a vector back, rather than a list.

    If you find yourself typing unlist(lapply(...)), stop and consider
    sapply.

     x <- list(a = 1, b = 1:3, c = 10:100)
    # Compare with above; a named vector, not a list
    sapply(x, FUN = length)
    a b c
    1 3 91

    sapply(x, FUN = sum)
    a b c
    1 6 5005

    In more advanced uses of sapply it will attempt to coerce the
    result to a multi-dimensional array, if appropriate. For example, if our function returns vectors of the same length, sapply will use them as columns of a matrix:

     sapply(1:5,function(x) rnorm(3,x))

    If our function returns a 2 dimensional matrix, sapply will do essentially the same thing, treating each returned matrix as a single long vector:

     sapply(1:5,function(x) matrix(x,2,2))

    Unless we specify simplify = "array", in which case it will use the individual matrices to build a multi-dimensional array:

     sapply(1:5,function(x) matrix(x,2,2), simplify = "array")

    Each of these behaviors is of course contingent on our function returning vectors or matrices of the same length or dimension.

  • vapply - When you want to use sapply but perhaps need to
    squeeze some more speed out of your code or want more type safety.

    For vapply, you basically give R an example of what sort of thing
    your function will return, which can save some time coercing returned
    values to fit in a single atomic vector.

     x <- list(a = 1, b = 1:3, c = 10:100)
    #Note that since the advantage here is mainly speed, this
    # example is only for illustration. We're telling R that
    # everything returned by length() should be an integer of
    # length 1.
    vapply(x, FUN = length, FUN.VALUE = 0L)
    a b c
    1 3 91
  • mapply - For when you have several data structures (e.g.
    vectors, lists) and you want to apply a function to the 1st elements
    of each, and then the 2nd elements of each, etc., coercing the result
    to a vector/array as in sapply.

    This is multivariate in the sense that your function must accept
    multiple arguments.

     #Sums the 1st elements, the 2nd elements, etc. 
    mapply(sum, 1:5, 1:5, 1:5)
    [1] 3 6 9 12 15
    #To do rep(1,4), rep(2,3), etc.
    mapply(rep, 1:4, 4:1)
    [[1]]
    [1] 1 1 1 1

    [[2]]
    [1] 2 2 2

    [[3]]
    [1] 3 3

    [[4]]
    [1] 4
  • Map - A wrapper to mapply with SIMPLIFY = FALSE, so it is guaranteed to return a list.

     Map(sum, 1:5, 1:5, 1:5)
    [[1]]
    [1] 3

    [[2]]
    [1] 6

    [[3]]
    [1] 9

    [[4]]
    [1] 12

    [[5]]
    [1] 15
  • rapply - For when you want to apply a function to each element of a nested list structure, recursively.

    To give you some idea of how uncommon rapply is, I forgot about it when first posting this answer! Obviously, I'm sure many people use it, but YMMV. rapply is best illustrated with a user-defined function to apply:

     # Append ! to string, otherwise increment
    myFun <- function(x){
    if(is.character(x)){
    return(paste(x,"!",sep=""))
    }
    else{
    return(x + 1)
    }
    }

    #A nested list structure
    l <- list(a = list(a1 = "Boo", b1 = 2, c1 = "Eeek"),
    b = 3, c = "Yikes",
    d = list(a2 = 1, b2 = list(a3 = "Hey", b3 = 5)))

    # Result is named vector, coerced to character
    rapply(l, myFun)

    # Result is a nested list like l, with values altered
    rapply(l, myFun, how="replace")
  • tapply - For when you want to apply a function to subsets of a
    vector and the subsets are defined by some other vector, usually a
    factor.

    The black sheep of the *apply family, of sorts. The help file's use of
    the phrase "ragged array" can be a bit confusing, but it is actually
    quite simple.

    A vector:

     x <- 1:20

    A factor (of the same length!) defining groups:

     y <- factor(rep(letters[1:5], each = 4))

    Add up the values in x within each subgroup defined by y:

     tapply(x, y, sum)  
    a b c d e
    10 26 42 58 74

    More complex examples can be handled where the subgroups are defined
    by the unique combinations of a list of several factors. tapply is
    similar in spirit to the split-apply-combine functions that are
    common in R (aggregate, by, ave, ddply, etc.) Hence its
    black sheep status.

Apply several summary functions on several variables by group in one call

You can do it all in one step and get proper labeling:

> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
# id1 id2 val1.mn val1.n val2.mn val2.n
# 1 a x 1.5 2.0 6.5 2.0
# 2 b x 2.0 2.0 8.0 2.0
# 3 a y 3.5 2.0 7.0 2.0
# 4 b y 3.0 2.0 6.0 2.0

This creates a dataframe with two id columns and two matrix columns:

str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
'data.frame': 4 obs. of 4 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"
$ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mn" "n"

As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using do.call(data.frame, ...)

str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) ) 
)
'data.frame': 4 obs. of 6 variables:
$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
$ val1.mn: num 1.5 2 3.5 3
$ val1.n : num 2 2 2 2
$ val2.mn: num 6.5 8 7 6
$ val2.n : num 2 2 2 2

This is the syntax for multiple variables on the LHS:

aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )

Apply multiple functions within a single function to a DataFrame

We can place the sapply inside the function and pass the dataset as argument

multi.fun <- function(dat) {
sapply(dat, function(x) c(media = mean(x), desv.tip = sd(x),
fischer = sum((x-mean(x))^3)/(nrow(dat)*(sd(x))^3)))
}

multi.fun(df)
# x1 x2 x3
#media 19.0500000 1.5700000 9.7300000
#desv.tip 7.9560250 0.9967168 4.5660705
#fischer 0.9549109 0.5209099 0.4954127

Applying multiple function via sapply

We can use mgsub from library(qdap) to replace multiple patterns. Here, I am looping the first and second column using lapply and assign the results back to the crs_mat[,1:2]. Note that I am using lapply instead of sapply as lapply keeps the structure intact

library(qdap)
crs_mat[,1:2] <- lapply(crs_mat[,1:2], mgsub,
pattern=c('mpg', 'gear'), replacement=c('MPG', 'GeArr'))

Multiple functions in aggregate

With dplyr, you could do this:

library(dplyr)
group_by(d,Branch) %>%
summarize(Number_of_loans = n(),
Loan_Amount = sum(Loan_Amount),
TAT = sum(TAT))

output

Source: local data frame [2 x 4]

Branch Number_of_loans Loan_Amount TAT
(fctr) (int) (int) (dbl)
1 A 3 520 15.0
2 B 2 350 3.5

data

d <- read.table(text="Branch Loan_Amount TAT
A 100 2.0
A 120 4.0
A 300 9.0
B 150 1.5
B 200 2.0",head=TRUE)


Related Topics



Leave a reply



Submit