Maximum Size of a Matrix in R

maximum size of a matrix in R

The theoretical limit of a vector in R is 2147483647 elements. So that's about 1 billion rows / 2 columns.

...but that amount of data does not fit in 4 GB of memory... And especially not with strings in a character vector. Each string is at least 96 bytes (object.size('a') == 96), and each element in your matrix will be a pointer (8 bytes) to such a string (there is only one instance of each unique string though).

So what typically happens is that the machine starts using virtual memory and start swapping. Heavy swapping typically kills all hope of ever finishing in this century - especially on Windows.

But if you are using a package (igraph?) and you're asking it to produce the matrix, it probably does a lot of internal work and creates lots of auxiliary objects. So even if you're nowhere near the memory limit for the single result matrix, the algorithm used to produce it can run out of memory. It can also be non-linear (quadratic or worse) in time, which would again kill all hope of ever finishing in this century...

A good way to investigate could be to time it on a small graph (e.g. using system.time), and the again when doubling the graph size a couple of times. Then you can see if the time is linear or quadratic and you can estimate how long it will take to complete your big graph. If the prediction says a week, well then you know ;-)

Is there a limit on working with matrix in R with Rcpp?

To repeat more succintly:

  1. You can have more than 2^31-1 elements in a vector.

  2. Matrices are vectors with dim attributes.

  3. You can have more than 2^31-1 elements in a matrix (ie n times k)

  4. Your row and column index are still limited to 2^31.

Example of a big vector:

R> n <- .Machine$integer.max + 100
R> tmpVec <- 1:n
R> length(tmpVec)
[1] 2147483747
R> newVec <- sqrt(tmpVec)
R>

Practical limits of R data frame

R is suited for large data sets, but you may have to change your way of working somewhat from what the introductory textbooks teach you. I did a post on Big Data for R which crunches a 30 GB data set and which you may find useful for inspiration.

The usual sources for information to get started are High-Performance Computing Task View and the R-SIG HPC mailing list at R-SIG HPC.

The main limit you have to work around is a historic limit on the length of a vector to 2^31-1 elements which wouldn't be so bad if R did not store matrices as vectors. (The limit is for compatibility with some BLAS libraries.)

We regularly analyse telco call data records and marketing databases with multi-million customers using R, so would be happy to talk more if you are interested.

Select n row have highest combined value from a matrix in R

Update (Recursive approach, sub-optimal solution)

You can define a recursive function f (see it within function thomas2), which can be any number of rows k (1 <= k <= nrow(mat))

thomas2 <- function(mat, k) {
f <- function(mat, k) {
if (k == 1) {
return(which.max(rowSums(mat)))
}
p <- f(mat, k - 1)
q <- seq(nrow(mat))[-p]
rmax <- apply(mat[p, , drop = FALSE], 2, max)
c(p, q[which.max(sapply(q, function(k) sum(pmax(rmax, mat[k, ]))))])
}
row.names(mat)[sort(f(mat, k))]
}

For example

> thomas2(mat, 2)
[1] "10" "14"

> thomas2(mat, 3)
[1] "10" "12" "14"

> thomas2(mat, 4)
[1] "9" "10" "12" "14"

> thomas2(mat, 5)
[1] "9" "10" "11" "12" "14"

> thomas2(mat, 6)
[1] "9" "10" "11" "12" "13" "14"


Previous answer (Brute-force approach, inefficient)

Your algorithm is a greedy one, which cannot guarantee the global maximum always. Thus, a brute-force way might a straightforward workaround to reach your goal.

Maybe you can try the following brute-force method

rs <- combn(nrow(mat), 3)
row.names(mat)[rs[, which.max(apply(rs, 2, function(k) sum(do.call(pmax, data.frame(t(mat[k, ]))))))]]

which gives

[1] "10" "12" "14"

Keep one maximum value per row in a matrix in R

Or you could use. This would be faster.

  ret[cbind(seq_len(nrow(mat2)),max.col(mat2, "first"))] <- 1
ret
# [,1] [,2] [,3]
#[1,] 0 1 0
#[2,] 1 0 0
#[3,] 0 0 1

data

 mat1 <- matrix(c(0,1,0, 1,1,0,0,0,1), ncol=3)
mat2 <- matrix(c(11,16,19, 32, 16, 18, 12, 14, 27), ncol=3)
ret <- matrix(0, ncol(mat1), nrow(mat1))


Related Topics



Leave a reply



Submit