Convert Vector to Matrix Without Recycling

Convert Vector to Matrix without Recycling

You can't turn recycling off, but you can do some manipulations to the vector before you form the matrix. We can extend the length of the vector based on what the dimensions of the matrix will be. The length<- replacement function will pad the vector with NA up to the desired length.

x <- 1:11
length(x) <- prod(dim(matrix(x, ncol = 2)))
## you will get a warning here unless suppressWarnings() is used
matrix(x, ncol = 2, byrow = TRUE)
# [,1] [,2]
# [1,] 1 2
# [2,] 3 4
# [3,] 5 6
# [4,] 7 8
# [5,] 9 10
# [6,] 11 NA

R: How to convert a vector into matrix without replicating the vector?

You could subset your vector to a multiple of the number of columns (so as to include all the elements). This will add necessary amount of NA to the vector. Then convert to matrix.

x = 1:15
matrix(x[1:(4 * ceiling(length(x)/4))], ncol = 4)
# [,1] [,2] [,3] [,4]
#[1,] 1 5 9 13
#[2,] 2 6 10 14
#[3,] 3 7 11 15
#[4,] 4 8 12 NA

If you want to replace NA with 0, you can do so using is.na() in another step

Converting a vector into a matrix (in R)

Try this:

> v <- c("state 4", "state 7")
> states <- c("state 1", "state 2", "state 3", "state 4",
+ "state 5", "state 6", "state 7", "state 8")
> m <- matrix(states, byrow = TRUE, nrow = 2, ncol = 8)
> m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] # [,8]
# [1,] "state 1" "state 2" "state 3" "state 4" "state 5" "state 6" "state 7" "state 8"
# [2,] "state 1" "state 2" "state 3" "state 4" "state 5" "state 6" "state 7" "state 8"
> v == m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE

In R, a matrix is basically a vector under the hood. When m is created above, the matrix function "recycles" its argument spaces because it needs to create a matrix with 16 elements. In other words, the following two function calls produce the same result:

> matrix(states, byrow = TRUE, nrow = 2, ncol = 8)
> matrix(rep(states, 2), byrow = TRUE, nrow = 2, ncol = 8)

Similarly, when v and m are compared for equality, v is recycled 8 times to produce a vector of length 16. In other words, the following two equality comparisons produce the same results:

> v == m
> rep(v, 8) == m

You can think of the above two comparisons as happening between two vectors, where the matrix m is converted back into a vector by stacking the columns. You can use as.vector to see the vector that m corresponds to:

> as.vector(m)
# [1] "state 1" "state 1" "state 2" "state 2" "state 3" "state 3" "state 4" "state 4" "state 5"
# [10] "state 5" "state 6" "state 6" "state 7" "state 7" "state 8" "state 8"

Print a vector to file with predefined number of columns, without recycling

We can use stri_list2matrix from stringi after splitting the vector ('v1') into groups of successive 3 elements into a list ("lst"). The grouping can be done by gl or using %/% (ie. (seq_along(v1)-1)%/%3+1).

library(stringi)
lst <- split(v1, as.numeric(gl(length(v1), 3, length(v1))))
stri_list2matrix(lst, byrow=TRUE, fill='')
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "d" "e" "f"
#[3,] "g" "h" "i"
#[4,] "l" "m" "n"
#[5,] "o" "" ""

Or using base R, we can pad "NA's" into those list elements that have less number of elements compared to the maximum length.

t(sapply(lst, `length<-`, max(sapply(lst, length))))

data

 v1 <- letters[c(1:9,12:15)]

R matrix values recycling?

I increased n_times to 10000 and can find no evidence of recycling. While that doesn't mean it isn't happening, it means that unfortunately without a clear setup, we are unfortunately going to be unable to reproduce the problem. So my suggestions here are unproven.

Option 1

Given that you found one such scenario that ends with all agents$state == "e", then I'll suggest a trick that will always find at least one "s" (actually, one of each value that you know about):

  out[k,] <- table(c("e", "s", agents$state)) - 1

I'm assuming that the only possible values are "e" and "s"; if there are others, this technique relies completely on the premise that we ensure every possible value is seen at least once, and then decrement everything. Since we "add one observation" for each possible value, subtracting one from the table is safe. With this trick, your check should then be

table(agents$state)
# e
# 100
table(c("e", "s", agents$state))
# e s
# 101 1
table(c("e", "s", agents$state)) - 1
# e s
# 100 0

And therefore recycling should not be a factor.

Option 2

Another technique which is more robust (i.e., does not need to include all possible values) is to force the length, assuming we know with certainty what it should be (which I think we do here):

z <- table(agents$state)
z
# s
# 100
length(z) <- 2
z
# s
# 100 NA

Since you "know" that the length should always be 2, you can hard-code the 2 in there.

Option 3

This method is even a little more robust in that you don't need to know the absolute length, they will all be extended to the length of the longest return.

First, reproducible sample data:

set.seed(2021)
agents <- data.frame(agent_no = 1,
state = "e",
mixing = runif(1,0,1))
# specify agent population
pop_size <- 100
# fill agent data
for(i in 2:pop_size){
agent <- data.frame(agent_no = i,
state = "s",
mixing = runif(1,0,1))
agents <- rbind(agents, agent)
}
head(agents)
# agent_no state mixing
# 1 1 e 0.4512674
# 2 2 s 0.7837798
# 3 3 s 0.7096822
# 4 4 s 0.3817443
# 5 5 s 0.6363238
# 6 6 s 0.7013460

Replace your for loop:

for (k in 1:n_times) {
}

with

out <- lapply(seq_len(n_times), function(k) {
for(i in 1:pop_size){
# likelihood to meet others
likelihood <- agents$mixing[i]
# how many agents will they meet (integer). Add 1 to make sure everybody meets somebody
connect_with <- round(likelihood * 3, 0) + 1
# which agents will they probably meet (list of agents)
which_others <- sample(1:pop_size,
connect_with,
replace = T,
prob = agents$mixing)
for(j in 1:length(which_others)){
contacts <- agents[which_others[j],]
# if exposed, change state
if(contacts$state == "e"){
urand <- runif(1,0,1)
# control probability of state change
if(urand < 0.5){
agents$state[i] <- "e"
}
}
}
}
table(agents$state)
})

At this point, you have a list, likely of length-2 vectors:

out[1:3]
# [[1]]
# e s
# 1 99
# [[2]]
# e s
# 2 98
# [[3]]
# e s
# 3 97

Note that we can determine the length of all of them with

lengths(out)
# [1] 2 2 2 2 2 2 2 2 2 2

Similar to option 2 where we force the length of a vector, we can do the same here:

maxlen <- max(lengths(out))
out <- lapply(out, `length<-`, maxlen)
## or more verbosely
out <- lapply(out, function(vec) { length(vec) <- maxlen; vec; })

You can confirm that they are all the same length with table(lengths(out)), should be 2 by n_times of 10.

From here, we can combine all of these vectors into a matrix with

out <- do.call(rbind, out)
out
# e s
# [1,] 1 99
# [2,] 2 98
# [3,] 3 97
# [4,] 2 98
# [5,] 1 99
# [6,] 20 80
# [7,] 12 88
# [8,] 1 99
# [9,] 2 98
# [10,] 1 99

Combining multiple character vectors of different lengths into single matrix without recycling

Using Base R we need to...

First lets create a sample dataset with 4 vectors:

a <- rnorm(10)
b <- rnorm(5)
c <- rnorm(7)
d <- rnorm(20)

Then we can put them in a list as:

f <- list(a,b,c,d)

Then we need to find the length of the longest vector:

max_len <- max(sapply(f, length))

Then we need to make all vectors the max_len by substituting NAs in for the gap (so if you have a max_len = 20 and a current vector is only length(current) = 10 then you need the last 10 values to be NA

f1 <- lapply(f, function(x) c(x, rep(NA, max_len - length(x))))

Then you can turn this into a matrix as:

matrix(unlist(f1), ncol = length(f1), byrow = F)

which results in

             [,1]       [,2]       [,3]       [,4]
[1,] -0.53487289 -1.8570456 0.8304454 -0.6440267
[2,] 0.04283173 -1.2541836 0.9579962 -1.1664334
[3,] -1.31686110 -0.6789986 0.9424487 0.4073388
[4,] -0.54987484 -0.4326257 -1.5165032 0.1990406
[5,] 0.31529161 -0.2712977 0.1347272 -0.2479010
[6,] -1.08465865 NA 0.7442857 -1.1319033
[7,] 1.11283161 NA -0.8397640 0.2636702
[8,] 0.08882676 NA NA -0.1332037
[9,] 0.76028752 NA NA 0.1607880
[10,] -2.68513818 NA NA -2.3300150
[11,] NA NA NA -0.3356175
[12,] NA NA NA 0.8115210
[13,] NA NA NA 1.1668857
[14,] NA NA NA 0.5538027
[15,] NA NA NA -0.8910439
[16,] NA NA NA -1.4056796
[17,] NA NA NA -1.6713585
[18,] NA NA NA 0.2557690
[19,] NA NA NA -0.5970861
[20,] NA NA NA 0.1851019

Fastest way to recycle vector along matrix rows

library(microbenchmark)

byrow.speed.benchmark = function(ncol, nrow) {
mat = matrix(rnorm(nrow * ncol), nrow = nrow, ncol = ncol)
vec = colSums(mat)

microbenchmark(
aperm(aperm(mat) - vec),
t(t(mat) - vec),
mat - matrix(vec, ncol=ncol(mat), nrow = nrow(mat), byrow =T),
sweep(mat, 2, vec),
mat - rep(vec, each = nrow(mat)),
#mat %*% diag(vec),
mat - vec[col(mat)],
mat - vec,
times = 300
)
}

byrow.speed.benchmark(10, 10)

Comparing several methods of applying across matrix rows we find that allocating a vector is the fastest.

Unit: nanoseconds
expr min lq mean median uq max neval
aperm(aperm(mat) - vec) 8642 9283 10214.287 9923 10243 80344 300
t(t(mat) - vec) 6722 7362 7950.130 8002 8323 27208 300
mat - matrix(vec, ncol = ncol(mat), nrow = nrow(mat), byrow = T) 3201 3841 4282.947 4161 4482 20486 300
sweep(mat, 2, vec) 26888 28489 30016.310 29448 30089 85145 300
mat - rep(vec, each = nrow(mat)) 2560 3201 3481.630 3521 3841 10883 300
mat - vec[col(mat)] 1600 2241 2594.970 2561 2881 6081 300
mat - vec 0 320 389.530 320 321 1921 300

How does this scale?

ncols = floor(10^((4:12)/4))
nrows = floor(10^((4:12)/4))

results = cbind(expand.grid(ncols, nrows), aperm = NA, t=NA, alloc = NA, sweep = NA, rep = NA, indices=NA, control = NA)

for (i in seq(nrow(results))) {
df = byrow.speed.benchmark(results[i,1], results[i,2])

results[i,3:9] = sapply(split(df$time, as.numeric(df$expr)), mean)
}

library(ggplot2)

df = reshape2::melt(results, id.vars= c("Var1", "Var2"))

colnames(df) = c("ncol", "nrow", "method", "meantime")

ggplot(subset(df, ncol==1000)) + geom_point(aes(x = log10(ncol*nrow), y=meantime, colour = method))+ geom_line(aes(x = log10(ncol*nrow), y=meantime, colour = method)) + ggtitle("Scaling with cell number.") + coord_cartesian(ylim = c(0, 1E6))
ggplot(subset(df, ncol==1000)) + geom_point(aes(x = log10(ncol*nrow), y=meantime, colour = method))+ geom_line(aes(x = log10(ncol*nrow), y=meantime, colour = method)) + ggtitle("Scaling with cell number.") #+ coord_cartesian(ylim = c(0, 5E7))

ggplot(subset(df, ncol==1000)) + geom_point(aes(x = log10(ncol*nrow), y=meantime, colour = method))+ geom_line(aes(x = log10(ncol*nrow), y=meantime, colour = method)) + coord_cartesian(ylim = c(0, 3E7)) + ggtitle("Scaling with a wide matrix (1000 columns)")
ggplot(subset(df, nrow==1000)) + geom_point(aes(x = log10(ncol*nrow), y=meantime, colour = method))+ geom_line(aes(x = log10(ncol*nrow), y=meantime, colour = method)) + coord_cartesian(ylim = c(0, 3E7)) + ggtitle("Scaling with a tall matrix (1000 rows)")

The pink line is the case where we apply the vector over the columns with built in recycling. Allocating a matrix with matrix(vec, byrow=T) scales the best of our options.

Scaling with cell number
Scaling with cell number, small number of cells

On the off chance that the matrix dimensions affected this here is scaling for a wide and a tall matrix.

Wide matrix
Tall matrix

Edit: It's worth noting that (as expected) the matrix allocation does not scale as well as vector recycling. The above plots are slightly misleading in that regard.

Matrix vs recycling



Related Topics



Leave a reply



Submit