Creating (and Accessing) a Sparse Matrix with NA default entries
Yes, Thierry's answer is definitely true I can say as co-author of the 'Matrix' package...
To your other question: Why is accessing "M" slower than "Y"?
The main answer is that "M" is much much sparser than "Y" hence much smaller and -- depending on the sizes envolved and the RAM of your platform -- the access time is faster for much smaller objects, notably for indexing into them.
R: Generating sparse matrix with all elements as rows and columns
We can convert the columns to factor
with levels
as 1 through 6 and then use xtabs
df1[1:2] <- lapply(df1[1:2], factor, levels = 1:6)
as.matrix(xtabs(T~U1+U2,df1,sparse = TRUE))
# U2
#U1 1 2 3 4 5 6
# 1 0 0 1 0 0 1
# 2 0 0 0 1 0 0
# 3 0 0 0 0 1 0
# 4 0 0 0 0 0 0
# 5 0 0 0 0 0 0
# 6 0 0 0 0 0 0
Or another option is to get the expanded index filled with 0s and then use sparseMatrix
library(tidyverse)
library(Matrix)
df2 <- crossing(U1 = 1:6, U2 = 1:6) %>%
left_join(df1) %>%
mutate(T = replace(T, is.na(T), 0))
sparseMatrix(i = df2$U1, j = df2$U2, x = df2$T)
Or use spread
spread(df2, U2, T)
How do I set the replaced value in a sparse matrix to NA rather than 0?
There are two separate questions, actually. The first one is how to display zeroes. It is easy to solve by looking for the exact method that is used after dispatch:
Matrix::printSpMatrix(toy, zero.print="0")
[1,] 0 0 0
[2,] 1 1 1
[3,] NA NA NA
The second question is whether the NA
output can be suppressed with some other character. Well, it is not directly possible: there is no suitable parameter for that.
However, modifying the source is always an option. Beware: this is a hack, which may lead to unforseen consequences!
toy_print <- function (x, digits = NULL, maxp = getOption("max.print"), cld = getClassDef(class(x)),
zero.print = ".", col.names, note.dropping.colnames = TRUE,
col.trailer = "", align = c("fancy", "right"))
{
stopifnot(extends(cld, "sparseMatrix"))
x.orig <- x
cx <- formatSpMatrix(x, digits = digits, maxp = maxp, cld = cld,
zero.print = zero.print, col.names = col.names, note.dropping.colnames = note.dropping.colnames,
align = align)
if (col.trailer != "")
cx <- cbind(cx, col.trailer, deparse.level = 0)
# here's the NA hack
cx[cx=="NA"] <- "."
print(cx, quote = FALSE, right = TRUE, max = maxp)
invisible(x.orig)
}
toy_print(toy, zero.print="0")
[1,] 0 0 0
[2,] 1 1 1
[3,] . . .
Create a sparse matrix from lines of entries in R
Depends how you want to deal with the cases where there are combinations of values in lemma
and doc
that do not appear. You mention they are "not defined" and suggest "(no value)" to appear in the answer.
Here is a more complete toy example:
set.seed(1)
(dfr <- data.frame(lemma = rep(c("foo", "bar", "baz"), each = 2),
mi = runif(6),
doc = rep(c("mary", "jane", "mary", "bruce", "dolly", "zizz")),
stringsAsFactors = FALSE))
#> lemma mi doc
#> 1 foo 0.2655087 mary
#> 2 foo 0.3721239 jane
#> 3 bar 0.5728534 mary
#> 4 bar 0.9082078 bruce
#> 5 baz 0.2016819 dolly
#> 6 baz 0.8983897 zizz
If it makes sense for the number 0
to appear in such cases you can just use xtabs
as follows:
xtabs(mi ~ lemma + doc, dfr, sparse = TRUE)
#> 3 x 5 sparse Matrix of class "dgCMatrix"
#> doc
#> lemma bruce dolly jane mary zizz
#> bar 0.9082078 . . 0.5728534 .
#> baz . 0.2016819 . . 0.8983897
#> foo . . 0.3721239 0.2655087 .
If you want the values to be missing in the sense of NA
then this is the best I can do, using tapply
:
Matrix::Matrix(with(dfr, tapply(mi, list(lemma, doc), sum), sparse = TRUE))
#> 3 x 5 Matrix of class "dgeMatrix"
#> bruce dolly jane mary zizz
#> bar 0.9082078 NA NA 0.5728534 NA
#> baz NA 0.2016819 NA NA 0.8983897
#> foo NA NA 0.3721239 0.2655087 NA
which explicitly makes it a sparse matrix using Matrix::Matrix
.
Bear in mind that sparse matrices are useful when they are big and don't have many non-zero entries, and that NA
is not 0.
How to access a few elements of a sparse matrix from R Matrix library?
You can extract the elements of the Matrix directly using the S4 extraction @
without converting it to an ordinary matrix first. For example,
big@x[1]
big@x[random.idx]
In fact, you can extract other attributes as well. See str(big)
.
Creating a sparse matrix in r with a set number of integer values per row
You can try to build the spare matrix up using the row (i
), column (j
) amd value (x
) components. This involves sampling subject to your row and value constraints.
# constraints
values <- 1:4
maxValuesPerRow <- 10
nrow <- 80
ncol <- 80
# sample values : how many values should each row get but <= 10 values
set.seed(1)
nValuesForEachRow <- sample(maxValuesPerRow, nrow, replace=TRUE)
# create matrix
library(Matrix)
i <- rep(seq_len(nrow), nValuesForEachRow) # row
j <- unlist(lapply(nValuesForEachRow, sample, x=seq_len(ncol))) # which columns
x <- sample(values, sum(nValuesForEachRow), replace=TRUE) # values
sm <- sparseMatrix(i=i, j=j, x=x)
check
dim(sm)
table(rowSums(sm>0))
table(as.vector(sm))
note, cant just sample columns like below as this can give duplicate values, hence loop used.
j <- sample(seq_len(ncol), sum(nValuesForEachRow), replace=TRUE)
Related Topics
Issue with Ggplot2, Geom_Bar, and Position="Dodge": Stacked Has Correct Y Values, Dodged Does Not
Wrap Text Around Plots in Markdown
Geom_Tile and Facet_Grid/Facet_Wrap for Same Height of Tiles
How to Use R Plotly Library in R Script Visual of Power Bi
R::Ggplot2::Geom_Points: How to Swap Points with Pie Charts
How to Subset a Matrix with Different Column Positions for Each Row
R Markdown - Variable Output Name
R: What Are Operators Like %In% Called and How to Learn About Them
Ggplot2 for Grayscale Printouts
Add Image in Title Page of Rmarkdown PDF
Read CSV File in R with Currency Column as Numeric
Dealing with Very Small Numbers in R
How to Combine Ggplot and Dplyr into a Function
Ggplot2: Reorder Bars from Highest to Lowest in Each Facet
Convert Integer as "20160119" to Different Columns of "Day" "Year" "Month"