Creating (And Accessing) a Sparse Matrix with Na Default Entries

Creating (and Accessing) a Sparse Matrix with NA default entries

Yes, Thierry's answer is definitely true I can say as co-author of the 'Matrix' package...

To your other question: Why is accessing "M" slower than "Y"?
The main answer is that "M" is much much sparser than "Y" hence much smaller and -- depending on the sizes envolved and the RAM of your platform -- the access time is faster for much smaller objects, notably for indexing into them.

R: Generating sparse matrix with all elements as rows and columns

We can convert the columns to factor with levels as 1 through 6 and then use xtabs

df1[1:2] <- lapply(df1[1:2], factor, levels = 1:6)
as.matrix(xtabs(T~U1+U2,df1,sparse = TRUE))
#   U2
#U1  1 2 3 4 5 6
#  1 0 0 1 0 0 1
#  2 0 0 0 1 0 0
#  3 0 0 0 0 1 0
#  4 0 0 0 0 0 0
#  5 0 0 0 0 0 0
#  6 0 0 0 0 0 0

Or another option is to get the expanded index filled with 0s and then use sparseMatrix

library(tidyverse)
library(Matrix)
df2 <- crossing(U1 = 1:6, U2 = 1:6) %>% 
          left_join(df1) %>% 
          mutate(T = replace(T, is.na(T), 0))
sparseMatrix(i = df2$U1, j = df2$U2, x = df2$T)

Or use spread

spread(df2, U2, T)

How do I set the replaced value in a sparse matrix to NA rather than 0?

There are two separate questions, actually. The first one is how to display zeroes. It is easy to solve by looking for the exact method that is used after dispatch:

Matrix::printSpMatrix(toy, zero.print="0")

[1,]  0  0  0
[2,]  1  1  1
[3,] NA NA NA

The second question is whether the NA output can be suppressed with some other character. Well, it is not directly possible: there is no suitable parameter for that.

However, modifying the source is always an option. Beware: this is a hack, which may lead to unforseen consequences!

toy_print <- function (x, digits = NULL, maxp = getOption("max.print"), cld = getClassDef(class(x)), 
                       zero.print = ".", col.names, note.dropping.colnames = TRUE, 
                       col.trailer = "", align = c("fancy", "right")) 
{
    stopifnot(extends(cld, "sparseMatrix"))
    x.orig <- x
    cx <- formatSpMatrix(x, digits = digits, maxp = maxp, cld = cld, 
                         zero.print = zero.print, col.names = col.names, note.dropping.colnames = note.dropping.colnames, 
                         align = align)
    if (col.trailer != "") 
        cx <- cbind(cx, col.trailer, deparse.level = 0)
    # here's the NA hack
    cx[cx=="NA"] <- "."
    print(cx, quote = FALSE, right = TRUE, max = maxp)
    invisible(x.orig)
}

toy_print(toy, zero.print="0")

[1,]  0  0  0
[2,]  1  1  1
[3,]  .  .  .

Create a sparse matrix from lines of entries in R

Depends how you want to deal with the cases where there are combinations of values in lemma and doc that do not appear. You mention they are "not defined" and suggest "(no value)" to appear in the answer.

Here is a more complete toy example:

set.seed(1)
(dfr <- data.frame(lemma = rep(c("foo", "bar", "baz"), each = 2),
           mi = runif(6),
           doc = rep(c("mary", "jane", "mary", "bruce", "dolly", "zizz")),
           stringsAsFactors = FALSE))
#>   lemma        mi   doc
#> 1   foo 0.2655087  mary
#> 2   foo 0.3721239  jane
#> 3   bar 0.5728534  mary
#> 4   bar 0.9082078 bruce
#> 5   baz 0.2016819 dolly
#> 6   baz 0.8983897  zizz

If it makes sense for the number 0 to appear in such cases you can just use xtabs as follows:

xtabs(mi ~ lemma + doc, dfr, sparse = TRUE)

#> 3 x 5 sparse Matrix of class "dgCMatrix"
#>      doc
#> lemma     bruce     dolly      jane      mary      zizz
#>   bar 0.9082078 .         .         0.5728534 .        
#>   baz .         0.2016819 .         .         0.8983897
#>   foo .         .         0.3721239 0.2655087 .

If you want the values to be missing in the sense of NA then this is the best I can do, using tapply:

Matrix::Matrix(with(dfr, tapply(mi, list(lemma, doc), sum), sparse = TRUE))

#> 3 x 5 Matrix of class "dgeMatrix"
#>         bruce     dolly      jane      mary      zizz
#> bar 0.9082078        NA        NA 0.5728534        NA
#> baz        NA 0.2016819        NA        NA 0.8983897
#> foo        NA        NA 0.3721239 0.2655087        NA

which explicitly makes it a sparse matrix using Matrix::Matrix.

Bear in mind that sparse matrices are useful when they are big and don't have many non-zero entries, and that NA is not 0.

How to access a few elements of a sparse matrix from R Matrix library?

You can extract the elements of the Matrix directly using the S4 extraction @ without converting it to an ordinary matrix first. For example,

big@x[1]
big@x[random.idx]

In fact, you can extract other attributes as well. See str(big).

Creating a sparse matrix in r with a set number of integer values per row

You can try to build the spare matrix up using the row (i), column (j) amd value (x) components. This involves sampling subject to your row and value constraints.

# constraints
values <- 1:4
maxValuesPerRow <- 10
nrow <- 80
ncol <- 80

# sample values : how many values should each row get but <= 10 values
set.seed(1)
nValuesForEachRow <- sample(maxValuesPerRow, nrow, replace=TRUE)

# create matrix
library(Matrix)
i <- rep(seq_len(nrow), nValuesForEachRow)                       # row
j <- unlist(lapply(nValuesForEachRow, sample, x=seq_len(ncol)))  # which columns
x <- sample(values, sum(nValuesForEachRow), replace=TRUE)        # values
sm <- sparseMatrix(i=i, j=j, x=x)

check

dim(sm)
table(rowSums(sm>0))
table(as.vector(sm))

note, cant just sample columns like below as this can give duplicate values, hence loop used.

j <- sample(seq_len(ncol), sum(nValuesForEachRow), replace=TRUE)

Creating (And Accessing) a Sparse Matrix with Na Default Entries