Large Matrices in R: long vectors not supported yet
A matrix is just an atomic vector with a dimension attribute which allows R to access it as a matrix. Your matrix is a vector of length 4000*9000000
which is 3.6e+10
elements (the largest integer value is approx 2.147e+9
). Subsetting a long vector is supported for atomic vectors (i.e. accessing elements beyond the 2.147e+9
limit). Just treat your matrix as a long vector.
If we remember that by default R fills matrices column-wise then if we wanted to retrieve say the value at test[ 2701 , 850000 ]
we could access it via:
i <- ( 2701 - 1 ) * 850000 + 2701
test[i]
#[1] 1
Note that this really is long vector subsetting because:
2701L * 850000L
#[1] NA
#Warning message:
#In 2701L * 850000L : NAs produced by integer overflow
long vectors not supported yet error in Rmd but not in R Script
I also ran into this today, and fixed it by using cache.lazy = FALSE
in the setup chunk in my .Rmd.
So what is inside of the first chunk in my R Markdown file looks like this:
library(knitr)
knitr::opts_chunk$set(cache = TRUE, warning = FALSE,
message = FALSE, cache.lazy = FALSE)
Error during wrapup: long vectors not supported yet: in glm() function
In order to close this question, I have to mention that the @Axeman's answer it is the only approach feasible for my problem. The whole issue is, there is not enough memory to manage such a huge design matrix.
Therefore, run a probit regression using the biglm
package and bigglm()
function is the only solution I found so far.
Nevertheless, I realize, due to how the biglm
package works, taking iteratively chunks of the data, the use of factor()
variables in the RHS it's problematic every time when factor level is not represented in the chunk. In other words, if a factor variable has 5 levels, but in the data chunk only 4 levels appear, I will have an error in the estimation.
There are several questions and comments about this on Stackoverflow.
Long Vector Not Supported Yet Error in R Windows 64bit version
Looking at the source of size.c and unique.c it looks like the hashing used to improve object.size
doesn't support long vectors yet:
/* Use hashing to improve object.size. Here we want equal CHARSXPs,
not equal contents. */
and
/* Currently the hash table is implemented as a (signed) integer
array. So there are two 31-bit restrictions, the length of the
array and the values. The values are initially NIL (-1). O-based
indices are inserted by isDuplicated, and invalidated by setting
to NA_INTEGER.
*/
Therefore, it is object.size
that is choking. How about calling numeric(2^36)
to see if you can create a such a large object, (should be 64GB).
Related Topics
Vary Colors of Axis Labels in R Based on Another Variable
How to Suppress Automatic Table Name and Number in an .Rmd File Using Xtable or Knitr::Kable
Matching Multiple Columns on Different Data Frames and Getting Other Column as Result
R Draw All Axis Labels (Prevent Some from Being Skipped)
Using Annotate to Add Different Annotations to Different Facets
Sub-Assign by Reference on Vector in R
R Markdown: How to Make Text Float Around Figures
R: Split Variable Column into Multiple (Unbalanced) Columns by Comma
Ggmap with Geom_Map Superimposed
How to Extract Substring Between Patterns "_" and "." in R
Generate Numbers with Specific Correlation
Trying to Use Dplyr to Group_By and Apply Scale()
What Does the Double Percentage Sign (%%) Mean