Why Doesn't Outer Work the Way I Think It Should (In R)

Why doesn't outer work the way I think it should (in R)?

outer(0:5, 0:6, sum) don't work because sum is not "vectorized" (in the sense of returning a vector of the same length as its two arguments). This example should explain the difference:

 sum(1:2,2:3)
8
1:2 + 2:3
[1] 3 5

You can vectorize sum using mapply for example:

identical(outer(0:5, 0:6, function(x,y)mapply(sum,x,y)),
outer(0:5, 0:6,'+'))
TRUE

PS: Generally before using outer I use browser to create my function in the debug mode:

outer(0:2, 1:3, function(x,y)browser())
Called from: FUN(X, Y, ...)
Browse[1]> x
[1] 0 1 2 0 1 2 0 1 2
Browse[1]> y
[1] 1 1 1 2 2 2 3 3 3
Browse[1]> sum(x,y)
[1] 27 ## this give an error
Browse[1]> x+y
[1] 1 2 3 2 3 4 3 4 5 ## this is vectorized

R: Use outer() on user-defined function

To use outer, some basic "requirements":

  1. the function will take two vectors all at once (as shown below); whether it chooses to do vectorized work on them or work on them individually is up to you;

  2. it must return a vector of the same length as x (and y); and

  3. you must expect the output as a matrix of dimensions length(x),length(y).

Interpreting that these are not all true for you, we move on. "The right function" depends on how you want the model to be run. A companion function to outer is expand.grid (and tidyr::crossing, the tidyverse version), in that it creates the same combinations of the supplied vectors. For instance:

outer(c(30,60,90), c(30, 60, 100), function(x,y) {browser();1;})
# Called from: FUN(X, Y, ...)
# Browse[2]>
x
# [1] 30 60 90 30 60 90 30 60 90
# Browse[2]>
y
# [1] 30 30 30 60 60 60 100 100 100

and

eg <- expand.grid(x=c(30,60,90), y=c(30, 60, 100))
eg
# x y
# 1 30 30
# 2 60 30
# 3 90 30
# 4 30 60
# 5 60 60
# 6 90 60
# 7 30 100
# 8 60 100
# 9 90 100

(which you can then access as eg$x and eg$y).

Some options:

  1. if you want your function to be called once (as with outer) with two arguments, and it will figure out what to do:

    eg <- expand.grid(x=c(30,60,90), y=c(30, 60, 100))
    do.call("myfunc", eg)

    Note that if given character arguments, it will (similar to data.frame) create factors by default. It does accept the stringsAsFactors=FALSE argument.

  2. if you want your function to be called for each pair of the vectors (so 9 times in this example), then do one either

    myfunc(eg$x, eg$y)

    if the number of vectors is known. If not, then using eg from above, then

    do.call("mapply", c(myfunc, eg))

    should work. Depending on the output, you can preclude it from "simplifying" the output (i.e., force a list output) with

    do.call("mapply", c(myfunc, eg, SIMPLIFY=FALSE))

Why is outer recycling a vector that should go unused and not throwing a warning?

This is expected behaviour based on R's recycling rules. It has nothing to do with outer as such, though it might be a surprise if you think outer is somehow applying a function across margins.

Instead, outer takes two vectors X and Y as its first two arguments. It takes Xand replicates it length(Y) times. Similarly, it takes Y and replicates it length(X) times. Then it just runs your function FUN on these two long vectors, passing the long X as the first argument and the long Y as the second argument. Any other arguments to FUN have to be passed directly as arguments to outer via ... (as you have done with c = 1:3).

The result is a single long vector which is turned into a matrix by writing its dim attribute as the original values of length(X) by length(Y).

Now, in the specific example you gave, X has 5 elements (1:5) and Y has 6 (5:10). Therefore your anonymous function is called on two length-30 vectors and a single length-3 vector. R's recycling rules dictate that if the recycled vector fits neatly into the longer vector without partial recycling, no warning is emitted.

To see this, take your anonymous function and try it outside outer with two length-30 vectors and one length-3 vector:

f <- function(a, b, c) 10*a + 100*b + 1000*c

f(1:30, 1:30, 1:3)
#> [1] 1110 2220 3330 1440 2550 3660 1770 2880 3990 2100 3210 4320 2430
#> [14] 3540 4650 2760 3870 4980 3090 4200 5310 3420 4530 5640 3750 4860
#> [27] 5970 4080 5190 6300

3 recycles nicely into 30, so there is no warning.

Conversely, if the product of the length of the two vectors you pass to outer is not a multiple of 3, you will get a warning:

outer(1:5,6:10,c=1:3,function(a,b,c) 10*a + 100*b + 1000*c)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1610 3710 2810 1910 4010
#> [2,] 2620 1720 3820 2920 2020
#> [3,] 3630 2730 1830 3930 3030
#> [4,] 1640 3740 2840 1940 4040
#> [5,] 2650 1750 3850 2950 2050
#> Warning message:
#> In 10 * a + 100 * b + 1000 * c :
#> longer object length is not a multiple of shorter object length

dims [product xx] do not match the length of object [xx] error in using R function `outer`

I often explain outer(x, y, FUN) when both x and y are vectors with the following:

xx <- rep(x, times = length(y))
yy <- rep(y, each = length(x))
zz <- FUN(xx, yy)
stopifnot(length(zz) == length(x) * length(y)) ## length = product?
z <- matrix(zz, length(x), length(y))

funError fails because zz has length 1, while funNoError does not because "recycling rule" has been applied when you paste a (a vector with length > 1) and class(a) (a length-1 vector).

This is illustrative as you will see why outer(1:5, 1:5, "+") works but outer(1:5, 1:5, sum) fails. Basically, FUN must be able to process xx and yy element-wise. Otherwise, wrap FUN with a sugar function called Vectorize. More details are given later.

Note that "list" is also a valid mode of a vector. So outer could be used to some non-standard things like How to perform pairwise operation like `%in%` and set operations for a list of vectors.


You can pass matrices / arrays to outer, too. Given that they are just vectors with an "dim" attribute (optionally with "dimnames"), how outer works does not change.

x <- matrix(1:4, 2, 2)  ## has "dim"
y <- matrix(1:9, 3, 3) ## has "dim"

xx <- rep(x, times = length(y)) ## xx <- rep(c(x), times = length(y))
yy <- rep(y, each = length(x)) ## yy <- rep(c(y), each = length(x))
zz <- "*"(xx, yy)
stopifnot(length(zz) == length(x) * length(y)) ## length = product?
z <- "dim<-"( zz, c(dim(x), dim(y)) )

z0 <- outer(x, y, "*")
all.equal(z, z0)
#[1] TRUE

?outer explains the code above in plain words.

 ‘X’ and ‘Y’ must be suitable arguments for ‘FUN’.  Each will be
extended by ‘rep’ to length the products of the lengths of ‘X’ and
‘Y’ before ‘FUN’ is called.

‘FUN’ is called with these two extended vectors as arguments (plus
any arguments in ‘...’). It must be a vectorized function (or the
name of one) expecting at least two arguments and returning a
value with the same length as the first (and the second).

Where they exist, the [dim]names of ‘X’ and ‘Y’ will be copied to
the answer, and a dimension assigned which is the concatenation of
the dimensions of ‘X’ and ‘Y’ (or lengths if dimensions do not
exist).

The word "vectorized" is NOT the most discussed one in R on performance. It means "vectorizing the action of a function":

## for FUN with a single argument
FUN( c(x1, x2, x3, x4) ) = c( FUN(x1), FUN(x2), FUN(x3), FUN(x4) )

## for FUN with two arguments
FUN( c(x1, x2, x3, x4), c(y1, y2, y3, y4) )
= c( FUN(x1, y1), FUN(x2, y2), FUN(x3, y3), FUN(x4, y4) )

Some functions say "+", "*", paste behave like this, but many others don't, say class, sum, prod. The *apply family functions in R are there to help you to vectorize function action, or you can write your own loop to achieve the same effect.


Another worth reading good-quality Q & A: Why doesn't outer work the way I think it should (in R)?

Using outer() with predict()

The problem is that outer expects its function to be vectorized (i.e., it will call predict ONCE with a vector of all the arguments it wants executed). Therefore, when predict is called once, returning its result (which happens to be of length 4), outer complains because it doesn't equal the expected 40.

One way to fix this is to use Vectorize. Untested code:

outer(X=alpha, Y=beta, FUN=Vectorize(predict, vectorize.args=c("alpha", "beta")), object=classifier.out, data=validation.data)


Related Topics



Leave a reply



Submit