Large Matrices in R: Long Vectors Not Supported Yet

Large Matrices in R: long vectors not supported yet

A matrix is just an atomic vector with a dimension attribute which allows R to access it as a matrix. Your matrix is a vector of length 4000*9000000 which is 3.6e+10 elements (the largest integer value is approx 2.147e+9). Subsetting a long vector is supported for atomic vectors (i.e. accessing elements beyond the 2.147e+9 limit). Just treat your matrix as a long vector.

If we remember that by default R fills matrices column-wise then if we wanted to retrieve say the value at test[ 2701 , 850000 ] we could access it via:

i <- ( 2701 - 1 ) * 850000 + 2701 
test[i]
#[1] 1

Note that this really is long vector subsetting because:

2701L * 850000L
#[1] NA
#Warning message:
#In 2701L * 850000L : NAs produced by integer overflow

long vectors not supported yet error in Rmd but not in R Script

I also ran into this today, and fixed it by using cache.lazy = FALSE in the setup chunk in my .Rmd.

So what is inside of the first chunk in my R Markdown file looks like this:

library(knitr)
knitr::opts_chunk$set(cache = TRUE, warning = FALSE,
message = FALSE, cache.lazy = FALSE)

Error during wrapup: long vectors not supported yet: in glm() function

In order to close this question, I have to mention that the @Axeman's answer it is the only approach feasible for my problem. The whole issue is, there is not enough memory to manage such a huge design matrix.

Therefore, run a probit regression using the biglm package and bigglm() function is the only solution I found so far.

Nevertheless, I realize, due to how the biglm package works, taking iteratively chunks of the data, the use of factor() variables in the RHS it's problematic every time when factor level is not represented in the chunk. In other words, if a factor variable has 5 levels, but in the data chunk only 4 levels appear, I will have an error in the estimation.

There are several questions and comments about this on Stackoverflow.

Long Vector Not Supported Yet Error in R Windows 64bit version

Looking at the source of size.c and unique.c it looks like the hashing used to improve object.size doesn't support long vectors yet:

/* Use hashing to improve object.size. Here we want equal CHARSXPs,
not equal contents. */

and

/*  Currently the hash table is implemented as a (signed) integer
array. So there are two 31-bit restrictions, the length of the
array and the values. The values are initially NIL (-1). O-based
indices are inserted by isDuplicated, and invalidated by setting
to NA_INTEGER.
*/

Therefore, it is object.size that is choking. How about calling numeric(2^36) to see if you can create a such a large object, (should be 64GB).



Related Topics



Leave a reply



Submit