Why doesn't outer work the way I think it should (in R)?
outer(0:5, 0:6, sum)
don't work because sum
is not "vectorized" (in the sense of returning a vector of the same length as its two arguments). This example should explain the difference:
sum(1:2,2:3)
8
1:2 + 2:3
[1] 3 5
You can vectorize sum
using mapply
for example:
identical(outer(0:5, 0:6, function(x,y)mapply(sum,x,y)),
outer(0:5, 0:6,'+'))
TRUE
PS: Generally before using outer
I use browser
to create my function in the debug mode:
outer(0:2, 1:3, function(x,y)browser())
Called from: FUN(X, Y, ...)
Browse[1]> x
[1] 0 1 2 0 1 2 0 1 2
Browse[1]> y
[1] 1 1 1 2 2 2 3 3 3
Browse[1]> sum(x,y)
[1] 27 ## this give an error
Browse[1]> x+y
[1] 1 2 3 2 3 4 3 4 5 ## this is vectorized
R: Use outer() on user-defined function
To use outer
, some basic "requirements":
the function will take two vectors all at once (as shown below); whether it chooses to do vectorized work on them or work on them individually is up to you;
it must return a vector of the same length as
x
(andy
); andyou must expect the output as a
matrix
of dimensionslength(x),length(y)
.
Interpreting that these are not all true for you, we move on. "The right function" depends on how you want the model to be run. A companion function to outer
is expand.grid
(and tidyr::crossing
, the tidyverse version), in that it creates the same combinations of the supplied vectors. For instance:
outer(c(30,60,90), c(30, 60, 100), function(x,y) {browser();1;})
# Called from: FUN(X, Y, ...)
# Browse[2]>
x
# [1] 30 60 90 30 60 90 30 60 90
# Browse[2]>
y
# [1] 30 30 30 60 60 60 100 100 100
and
eg <- expand.grid(x=c(30,60,90), y=c(30, 60, 100))
eg
# x y
# 1 30 30
# 2 60 30
# 3 90 30
# 4 30 60
# 5 60 60
# 6 90 60
# 7 30 100
# 8 60 100
# 9 90 100
(which you can then access as eg$x
and eg$y
).
Some options:
if you want your function to be called once (as with
outer
) with two arguments, and it will figure out what to do:eg <- expand.grid(x=c(30,60,90), y=c(30, 60, 100))
do.call("myfunc", eg)Note that if given
character
arguments, it will (similar todata.frame
) createfactor
s by default. It does accept thestringsAsFactors=FALSE
argument.if you want your function to be called for each pair of the vectors (so 9 times in this example), then do one either
myfunc(eg$x, eg$y)
if the number of vectors is known. If not, then using
eg
from above, thendo.call("mapply", c(myfunc, eg))
should work. Depending on the output, you can preclude it from "simplifying" the output (i.e., force a
list
output) withdo.call("mapply", c(myfunc, eg, SIMPLIFY=FALSE))
Why is outer recycling a vector that should go unused and not throwing a warning?
This is expected behaviour based on R's recycling rules. It has nothing to do with outer
as such, though it might be a surprise if you think outer
is somehow applying a function across margins.
Instead, outer
takes two vectors X
and Y
as its first two arguments. It takes X
and rep
licates it length(Y)
times. Similarly, it takes Y
and rep
licates it length(X)
times. Then it just runs your function FUN
on these two long vectors, passing the long X
as the first argument and the long Y
as the second argument. Any other arguments to FUN
have to be passed directly as arguments to outer via ...
(as you have done with c = 1:3
).
The result is a single long vector which is turned into a matrix by writing its dim
attribute as the original values of length(X)
by length(Y)
.
Now, in the specific example you gave, X
has 5 elements (1:5) and Y
has 6 (5:10). Therefore your anonymous function is called on two length-30 vectors and a single length-3 vector. R's recycling rules dictate that if the recycled vector fits neatly into the longer vector without partial recycling, no warning is emitted.
To see this, take your anonymous function and try it outside outer
with two length-30 vectors and one length-3 vector:
f <- function(a, b, c) 10*a + 100*b + 1000*c
f(1:30, 1:30, 1:3)
#> [1] 1110 2220 3330 1440 2550 3660 1770 2880 3990 2100 3210 4320 2430
#> [14] 3540 4650 2760 3870 4980 3090 4200 5310 3420 4530 5640 3750 4860
#> [27] 5970 4080 5190 6300
3 recycles nicely into 30, so there is no warning.
Conversely, if the product of the length of the two vectors you pass to outer
is not a multiple of 3, you will get a warning:
outer(1:5,6:10,c=1:3,function(a,b,c) 10*a + 100*b + 1000*c)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1610 3710 2810 1910 4010
#> [2,] 2620 1720 3820 2920 2020
#> [3,] 3630 2730 1830 3930 3030
#> [4,] 1640 3740 2840 1940 4040
#> [5,] 2650 1750 3850 2950 2050
#> Warning message:
#> In 10 * a + 100 * b + 1000 * c :
#> longer object length is not a multiple of shorter object length
dims [product xx] do not match the length of object [xx] error in using R function `outer`
I often explain outer(x, y, FUN)
when both x
and y
are vectors with the following:
xx <- rep(x, times = length(y))
yy <- rep(y, each = length(x))
zz <- FUN(xx, yy)
stopifnot(length(zz) == length(x) * length(y)) ## length = product?
z <- matrix(zz, length(x), length(y))
funError
fails because zz
has length 1, while funNoError
does not because "recycling rule" has been applied when you paste a
(a vector with length > 1) and class(a)
(a length-1 vector).
This is illustrative as you will see why outer(1:5, 1:5, "+")
works but outer(1:5, 1:5, sum)
fails. Basically, FUN
must be able to process xx
and yy
element-wise. Otherwise, wrap FUN
with a sugar function called Vectorize
. More details are given later.
Note that "list" is also a valid mode of a vector. So outer
could be used to some non-standard things like How to perform pairwise operation like `%in%` and set operations for a list of vectors.
You can pass matrices / arrays to outer
, too. Given that they are just vectors with an "dim" attribute (optionally with "dimnames"), how outer
works does not change.
x <- matrix(1:4, 2, 2) ## has "dim"
y <- matrix(1:9, 3, 3) ## has "dim"
xx <- rep(x, times = length(y)) ## xx <- rep(c(x), times = length(y))
yy <- rep(y, each = length(x)) ## yy <- rep(c(y), each = length(x))
zz <- "*"(xx, yy)
stopifnot(length(zz) == length(x) * length(y)) ## length = product?
z <- "dim<-"( zz, c(dim(x), dim(y)) )
z0 <- outer(x, y, "*")
all.equal(z, z0)
#[1] TRUE
?outer
explains the code above in plain words.
‘X’ and ‘Y’ must be suitable arguments for ‘FUN’. Each will be
extended by ‘rep’ to length the products of the lengths of ‘X’ and
‘Y’ before ‘FUN’ is called.
‘FUN’ is called with these two extended vectors as arguments (plus
any arguments in ‘...’). It must be a vectorized function (or the
name of one) expecting at least two arguments and returning a
value with the same length as the first (and the second).
Where they exist, the [dim]names of ‘X’ and ‘Y’ will be copied to
the answer, and a dimension assigned which is the concatenation of
the dimensions of ‘X’ and ‘Y’ (or lengths if dimensions do not
exist).
The word "vectorized" is NOT the most discussed one in R on performance. It means "vectorizing the action of a function":
## for FUN with a single argument
FUN( c(x1, x2, x3, x4) ) = c( FUN(x1), FUN(x2), FUN(x3), FUN(x4) )
## for FUN with two arguments
FUN( c(x1, x2, x3, x4), c(y1, y2, y3, y4) )
= c( FUN(x1, y1), FUN(x2, y2), FUN(x3, y3), FUN(x4, y4) )
Some functions say "+"
, "*"
, paste
behave like this, but many others don't, say class
, sum
, prod
. The *apply
family functions in R are there to help you to vectorize function action, or you can write your own loop to achieve the same effect.
Another worth reading good-quality Q & A: Why doesn't outer work the way I think it should (in R)?
Using outer() with predict()
The problem is that outer expects its function to be vectorized (i.e., it will call predict ONCE with a vector of all the arguments it wants executed). Therefore, when predict is called once, returning its result (which happens to be of length 4), outer complains because it doesn't equal the expected 40.
One way to fix this is to use Vectorize
. Untested code:
outer(X=alpha, Y=beta, FUN=Vectorize(predict, vectorize.args=c("alpha", "beta")), object=classifier.out, data=validation.data)
Related Topics
How to Define More Line Types for Graphs in R (Custom Linetype)
How to Coerce a List Object to Type 'Double'
Pattern Matching Using a Wildcard
Add a Horizontal Line to Plot and Legend in Ggplot2
Combine Points with Lines with Ggplot2
Make Readline Wait for Input in R
Create Column with Grouped Values Based on Another Column
Subset Xts Object by Time of Day
Add Text to Horizontal Barplot in R, Y-Axis at Different Scale
Fixing Cluttered Titles on Graphs
How to Delete Groups Containing Less Than 3 Rows of Data in R
Object Not Found Error with Ddply Inside a Function
How to Determine the Namespace of a Function
Join R Data.Tables Where Key Values Are Not Exactly Equal--Combine Rows with Closest Times