Under What Circumstances Does R Recycle

under what circumstances does R recycle?

Recycling works in your example:

> x <- seq(5)
> y <- seq(11)
> x+y
 [1]  2  4  6  8 10  7  9 11 13 15 12
Warning message:
In x + y : longer object length is not a multiple of shorter object length
> v <- 2*x +y +1 
Warning message:
In 2 * x + y :
  longer object length is not a multiple of shorter object length
> v
 [1]  4  7 10 13 16  9 12 15 18 21 14

The "error" that you reported is in fact a "warning" which means that R is notifying you that it is recycling but recycles anyway. You may have options(warn=2) turned on, which converts warnings into error messages.

In general, avoid relying on recycling. If you get in the habit of ignoring the warnings, some day it will bite you and your code will fail in some very hard to diagnose way.

Implementation of standard recycling rules

I've used this in the past,

expand_args <- function(...){
  dots <- list(...)
  max_length <- max(sapply(dots, length))
  lapply(dots, rep, length.out = max_length)
}

Vector recycling concept in R

You can use:

rep(c(2,4,6), 2) * rep(c(1,2), each=3)
#[1]  2  4  6  4  8 12

or with auto recycling:

c(2,4,6) * rep(c(1,2), each=3)
#[1]  2  4  6  4  8 12

Alternative outer could be used:

c(outer(c(2,4,6), c(1,2)))
#[1]  2  4  6  4  8 12

Also crossprod could be used:

c(crossprod(t(c(2,4,6)), c(1,2)))
#[1]  2  4  6  4  8 12

Or %*%:

c(c(2,4,6) %*% t(c(1,2)))
#[1]  2  4  6  4  8 12

r list force to recycle items

You want ifelse :

mylist <- list(a = 3, b = c(2,8), c = c(1,4))
sumit<-function(v){x<-v$a + v$b + v$c}
x<-sumit(mylist)

sumit<-function(v){
  x<-v$a + v$b + v$c
  x <- ifelse (v$c ==1, x*100,x)
}
x<-sumit(mylist)
x
# [1]  600 15

if works with single length logical, and takes the first element in other cases, ifelse works with vectors.

Recycling and assignment functions (`split-`)

The split gives the values dat[c(1,2,4)] and dat[c(3,5,6)] from the vector.

The assignment is equivalent to dat[c(1,2,4)] <- 100 ; dat[c(3,5,6)] <- 300 and this is where the recycling takes place.

Edited

As for what happens, and why a vector assignment results, see page 21 of the language definition manual (http://cran.r-project.org/doc/manuals/R-lang.pdf). The call:

split(def, f) <- Z

Is interpreted as:

‘*tmp*‘ <- def
def <- "split<-"(‘*tmp*‘, f, value=Z)
rm(‘*tmp*‘)

Note that split<-.default returns the modified vector.

R matrix values recycling?

I increased n_times to 10000 and can find no evidence of recycling. While that doesn't mean it isn't happening, it means that unfortunately without a clear setup, we are unfortunately going to be unable to reproduce the problem. So my suggestions here are unproven.

Option 1

Given that you found one such scenario that ends with all agents$state == "e", then I'll suggest a trick that will always find at least one "s" (actually, one of each value that you know about):

  out[k,] <- table(c("e", "s", agents$state)) - 1

I'm assuming that the only possible values are "e" and "s"; if there are others, this technique relies completely on the premise that we ensure every possible value is seen at least once, and then decrement everything. Since we "add one observation" for each possible value, subtracting one from the table is safe. With this trick, your check should then be

table(agents$state)
#       e 
#     100 
table(c("e", "s", agents$state))
#       e       s 
#     101       1
table(c("e", "s", agents$state)) - 1
#       e       s 
#     100       0

And therefore recycling should not be a factor.

Option 2

Another technique which is more robust (i.e., does not need to include all possible values) is to force the length, assuming we know with certainty what it should be (which I think we do here):

z <- table(agents$state)
z
#   s 
# 100 
length(z) <- 2
z
#   s     
# 100  NA

Since you "know" that the length should always be 2, you can hard-code the 2 in there.

Option 3

This method is even a little more robust in that you don't need to know the absolute length, they will all be extended to the length of the longest return.

First, reproducible sample data:

set.seed(2021)
agents <- data.frame(agent_no = 1,
                     state = "e",
                     mixing = runif(1,0,1))
# specify agent population
pop_size <- 100
# fill agent data
for(i in 2:pop_size){
  agent <- data.frame(agent_no = i,
                      state = "s",
                      mixing = runif(1,0,1))
  agents <- rbind(agents, agent)
}
head(agents)
#   agent_no state    mixing
# 1        1     e 0.4512674
# 2        2     s 0.7837798
# 3        3     s 0.7096822
# 4        4     s 0.3817443
# 5        5     s 0.6363238
# 6        6     s 0.7013460

Replace your for loop:

for (k in 1:n_times) {
}

with

out <- lapply(seq_len(n_times), function(k) {
  for(i in 1:pop_size){
    # likelihood to meet others
    likelihood <- agents$mixing[i]
    # how many agents will they meet (integer). Add 1 to make sure everybody meets somebody
    connect_with <- round(likelihood * 3, 0) + 1 
    # which agents will they probably meet (list of agents)
    which_others <- sample(1:pop_size, 
                           connect_with, 
                           replace = T, 
                           prob = agents$mixing)
    for(j in 1:length(which_others)){
      contacts <- agents[which_others[j],]
      # if exposed, change state
      if(contacts$state == "e"){
        urand <- runif(1,0,1)
        # control probability of state change
        if(urand < 0.5){
          agents$state[i] <- "e"
        }
      }
    }
  }
  table(agents$state)
})

At this point, you have a list, likely of length-2 vectors:

out[1:3]
# [[1]]
#  e  s 
#  1 99 
# [[2]]
#  e  s 
#  2 98 
# [[3]]
#  e  s 
#  3 97

Note that we can determine the length of all of them with

lengths(out)
#  [1] 2 2 2 2 2 2 2 2 2 2

Similar to option 2 where we force the length of a vector, we can do the same here:

maxlen <- max(lengths(out))
out <- lapply(out, `length<-`, maxlen)
## or more verbosely
out <- lapply(out, function(vec) { length(vec) <- maxlen; vec; })

You can confirm that they are all the same length with table(lengths(out)), should be 2 by n_times of 10.

From here, we can combine all of these vectors into a matrix with

out <- do.call(rbind, out)
out
#        e  s
#  [1,]  1 99
#  [2,]  2 98
#  [3,]  3 97
#  [4,]  2 98
#  [5,]  1 99
#  [6,] 20 80
#  [7,] 12 88
#  [8,]  1 99
#  [9,]  2 98
# [10,]  1 99

Why is outer recycling a vector that should go unused and not throwing a warning?

This is expected behaviour based on R's recycling rules. It has nothing to do with outer as such, though it might be a surprise if you think outer is somehow applying a function across margins.

Instead, outer takes two vectors X and Y as its first two arguments. It takes Xand replicates it length(Y) times. Similarly, it takes Y and replicates it length(X) times. Then it just runs your function FUN on these two long vectors, passing the long X as the first argument and the long Y as the second argument. Any other arguments to FUN have to be passed directly as arguments to outer via ... (as you have done with c = 1:3).

The result is a single long vector which is turned into a matrix by writing its dim attribute as the original values of length(X) by length(Y).

Now, in the specific example you gave, X has 5 elements (1:5) and Y has 6 (5:10). Therefore your anonymous function is called on two length-30 vectors and a single length-3 vector. R's recycling rules dictate that if the recycled vector fits neatly into the longer vector without partial recycling, no warning is emitted.

To see this, take your anonymous function and try it outside outer with two length-30 vectors and one length-3 vector:

f <- function(a, b, c) 10*a + 100*b + 1000*c

f(1:30, 1:30, 1:3)
#>  [1] 1110 2220 3330 1440 2550 3660 1770 2880 3990 2100 3210 4320 2430
#> [14] 3540 4650 2760 3870 4980 3090 4200 5310 3420 4530 5640 3750 4860
#> [27] 5970 4080 5190 6300

3 recycles nicely into 30, so there is no warning.

Conversely, if the product of the length of the two vectors you pass to outer is not a multiple of 3, you will get a warning:

outer(1:5,6:10,c=1:3,function(a,b,c) 10*a + 100*b + 1000*c)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,] 1610 3710 2810 1910 4010
#> [2,] 2620 1720 3820 2920 2020
#> [3,] 3630 2730 1830 3930 3030
#> [4,] 1640 3740 2840 1940 4040
#> [5,] 2650 1750 3850 2950 2050
#> Warning message:
#> In 10 * a + 100 * b + 1000 * c :
#>   longer object length is not a multiple of shorter object length

Vector recycling is not working when assigning to data.frame

It is because 2000 is not divisible by 7. Partial recycling doesn't work for data frame columns:

d <- data.frame(x=1:10)
d$x <- 1
d$x <- 1:2
d$x <- 1:3
# Error in `$<-.data.frame`(`*tmp*`, "x", value = 1:3) : 
#  replacement has 3 rows, data has 10

From the relevant help text ?[<-.data.frame, in the Arguments section:

"value: A suitable replacement value: it will be repeated a whole number of times if necessary"

Partial recycling works for vectors though:

x <- d$x
x[] <- 1:3
# Warning message:
# In x[] <- 1:3 :
#   number of items to replace is not a multiple of replacement length

x
# [1] 1 2 3 1 2 3 1 2 3 1

You can do the assignment to your data frame similarly (if you're sure it's what you want to do):

d$x[] <- 1:3
# Warning message:
# In d$x[] <- 1:3 :
#   number of items to replace is not a multiple of replacement length
d
#    x
# 1  1
# 2  2
# 3  3
# 4  1
# 5  2
# 6  3
# 7  1
# 8  2
# 9  3
# 10 1

Under What Circumstances Does R Recycle