R - Data Frame - Convert to Sparse Matrix

Sparse matrix to a data frame in R

Using summary, here is an example:

mat <- Matrix(data = c(1, 0, 2, 0, 0, 3, 4, 0, 0), nrow = 3, ncol = 3,
dimnames = list(Origin = c("A", "B", "C"),
Destination = c("X", "Y", "Z")),
sparse = TRUE)
mat
# 3 x 3 sparse Matrix of class "dgCMatrix"
# Destination
# X Y Z
# A 1 . 4
# B . . .
# C 2 3 .

summ <- summary(mat)
summ
# 3 x 3 sparse Matrix of class "dgCMatrix", with 4 entries
# i j x
# 1 1 1 1
# 2 3 1 2
# 3 3 2 3
# 4 1 3 4

data.frame(Origin = rownames(mat)[summ$i],
Destination = colnames(mat)[summ$j],
Weight = summ$x)
# Origin Destination Weight
# 1 A X 1
# 2 C X 2
# 3 C Y 3
# 4 A Z 4

R convert matrix or data frame to sparseMatrix

Here are two options:

library(Matrix)

A <- as(regMat, "sparseMatrix") # see also `vignette("Intro2Matrix")`
B <- Matrix(regMat, sparse = TRUE) # Thanks to Aaron for pointing this out

identical(A, B)
# [1] TRUE
A
# 10 x 10 sparse Matrix of class "dgCMatrix"
#
# [1,] . . . . . 45 . . . .
# [2,] . . . . . . . 59 . .
# [3,] . . . . 95 . . . . .
# [4,] . . . . . . . . . .
# [5,] . . . . . . . . . .
# [6,] . . . . . . . . . .
# [7,] . . . 23 . . . . . .
# [8,] . . . 63 . . . . . .
# [9,] . . . . . . . . . .
# [10,] . . . . . . . . . .

how to coerce a data.frame into a sparse matrix in R

Following user20650's comment, first coerce the CUI* columns to factor with the same levels, then use xtabs to create a sparse matrix, then add its transpose.

txt <- '
CUI1 CUI2 Count
1 C0000699 C3894683 2
2 C0000699 C0101725 1
3 C0000699 C1882413 3
4 C0000699 C0245531 3
5 C0000699 C0068475 2
6 C0000699 C0538927 3
7 C0000699 C0724693 1
8 C0000699 C0216784 2
9 C0000699 C2248020 1
10 C0000699 C0069449 3
'
test <- read.table(textConnection(txt), header = TRUE)

library(Matrix)

levls <- Reduce(union, test[1:2])
test[1:2] <- lapply(test[1:2], factor, levels = levls)
res <- xtabs(Count ~ CUI1 + CUI2, data = test, sparse = TRUE)
res <- forceSymmetric(res)
class(res)
#> [1] "dsCMatrix"
#> attr(,"package")
#> [1] "Matrix"

Created on 2022-02-13 by the reprex package (v2.0.1)

Convert large R data frame to dgcmatrix

You can try to split the large dataframe by rows/cols, convert to dgcMatrix and then join them.

nsplit = 10
splitMxList = lapply(split(my.M, cut(1:nrow(my.M), nsplit)), function(mx) {
Matrix(as.matrix(mx), sparse=T)
})
sparse.M = Reduce(rbind, splitMxList)

XGB sparse matrix from a dataframe

The Matrix package has the following function to create a sparse matrix sparse.model.matrix(). It may help if you remove NAs from your data before creating the sparse matrix to ensure the dependent variable y is of the same length as the sparse matrix when feeding into the xgboost function.

I also tend to make a record of the factors levels in my training data so that when it comes to predicting on an unseen test dataset I can make sure the test data has the same factor levels as the training data. This ensure the test data matrix will have the same dimensions as the training matrix.

Example from mtcars:

f<-mpg~hp+as.factor(cyl)
trainMatrix<-sparse.model.matrix(f,mtcars)

Create Sparse Matrix from a data frame

The Matrix package has a constructor made especially for your type of data:

library(Matrix)
UIMatrix <- sparseMatrix(i = trainingData$UserID,
j = trainingData$MovieID,
x = trainingData$Rating)

Otherwise, you might like knowing about that cool feature of the [ function known as matrix indexing. Your could have tried:

buildUserMovieMatrix <- function(trainingData) {
UIMatrix <- Matrix(0, nrow = max(trainingData$UserID),
ncol = max(trainingData$MovieID), sparse = TRUE);
UIMatrix[cbind(trainingData$UserID,
trainingData$MovieID)] <- trainingData$Rating;
return(UIMatrix);
}

(but I would definitely recommend the sparseMatrix approach over this.)



Related Topics



Leave a reply



Submit