Sparse matrix to a data frame in R
Using summary
, here is an example:
mat <- Matrix(data = c(1, 0, 2, 0, 0, 3, 4, 0, 0), nrow = 3, ncol = 3,
dimnames = list(Origin = c("A", "B", "C"),
Destination = c("X", "Y", "Z")),
sparse = TRUE)
mat
# 3 x 3 sparse Matrix of class "dgCMatrix"
# Destination
# X Y Z
# A 1 . 4
# B . . .
# C 2 3 .
summ <- summary(mat)
summ
# 3 x 3 sparse Matrix of class "dgCMatrix", with 4 entries
# i j x
# 1 1 1 1
# 2 3 1 2
# 3 3 2 3
# 4 1 3 4
data.frame(Origin = rownames(mat)[summ$i],
Destination = colnames(mat)[summ$j],
Weight = summ$x)
# Origin Destination Weight
# 1 A X 1
# 2 C X 2
# 3 C Y 3
# 4 A Z 4
R convert matrix or data frame to sparseMatrix
Here are two options:
library(Matrix)
A <- as(regMat, "sparseMatrix") # see also `vignette("Intro2Matrix")`
B <- Matrix(regMat, sparse = TRUE) # Thanks to Aaron for pointing this out
identical(A, B)
# [1] TRUE
A
# 10 x 10 sparse Matrix of class "dgCMatrix"
#
# [1,] . . . . . 45 . . . .
# [2,] . . . . . . . 59 . .
# [3,] . . . . 95 . . . . .
# [4,] . . . . . . . . . .
# [5,] . . . . . . . . . .
# [6,] . . . . . . . . . .
# [7,] . . . 23 . . . . . .
# [8,] . . . 63 . . . . . .
# [9,] . . . . . . . . . .
# [10,] . . . . . . . . . .
how to coerce a data.frame into a sparse matrix in R
Following user20650's comment, first coerce the CUI*
columns to factor with the same levels, then use xtabs
to create a sparse matrix, then add its transpose.
txt <- '
CUI1 CUI2 Count
1 C0000699 C3894683 2
2 C0000699 C0101725 1
3 C0000699 C1882413 3
4 C0000699 C0245531 3
5 C0000699 C0068475 2
6 C0000699 C0538927 3
7 C0000699 C0724693 1
8 C0000699 C0216784 2
9 C0000699 C2248020 1
10 C0000699 C0069449 3
'
test <- read.table(textConnection(txt), header = TRUE)
library(Matrix)
levls <- Reduce(union, test[1:2])
test[1:2] <- lapply(test[1:2], factor, levels = levls)
res <- xtabs(Count ~ CUI1 + CUI2, data = test, sparse = TRUE)
res <- forceSymmetric(res)
class(res)
#> [1] "dsCMatrix"
#> attr(,"package")
#> [1] "Matrix"
Created on 2022-02-13 by the reprex package (v2.0.1)
Convert large R data frame to dgcmatrix
You can try to split the large dataframe by rows/cols, convert to dgcMatrix and then join them.
nsplit = 10
splitMxList = lapply(split(my.M, cut(1:nrow(my.M), nsplit)), function(mx) {
Matrix(as.matrix(mx), sparse=T)
})
sparse.M = Reduce(rbind, splitMxList)
XGB sparse matrix from a dataframe
The Matrix
package has the following function to create a sparse matrix sparse.model.matrix()
. It may help if you remove NAs from your data before creating the sparse matrix to ensure the dependent variable y is of the same length as the sparse matrix when feeding into the xgboost function.
I also tend to make a record of the factors levels in my training data so that when it comes to predicting on an unseen test dataset I can make sure the test data has the same factor levels as the training data. This ensure the test data matrix will have the same dimensions as the training matrix.
Example from mtcars:
f<-mpg~hp+as.factor(cyl)
trainMatrix<-sparse.model.matrix(f,mtcars)
Create Sparse Matrix from a data frame
The Matrix
package has a constructor made especially for your type of data:
library(Matrix)
UIMatrix <- sparseMatrix(i = trainingData$UserID,
j = trainingData$MovieID,
x = trainingData$Rating)
Otherwise, you might like knowing about that cool feature of the [
function known as matrix indexing. Your could have tried:
buildUserMovieMatrix <- function(trainingData) {
UIMatrix <- Matrix(0, nrow = max(trainingData$UserID),
ncol = max(trainingData$MovieID), sparse = TRUE);
UIMatrix[cbind(trainingData$UserID,
trainingData$MovieID)] <- trainingData$Rating;
return(UIMatrix);
}
(but I would definitely recommend the sparseMatrix
approach over this.)
Related Topics
Add Columns to a Reactive Data Frame in Shiny and Update Them
Ggplot Dotplot: What Is the Proper Use of Geom_Dotplot
Split a File Path into Folder Names Vector
Subtract Values in One Dataframe from Another
Equal Frequency Discretization in R
How to Bookmark and Restore Dynamically Added Modules
How Many Elements in a Vector Are Greater Than X Without Using a Loop
Usemethod("Predict"):No Applicable Method for 'Predict' Applied to an Object of Class "Train"
Grouped Correlation with Dplyr (Works Only on Console)
Shiny App Does Not Reflect Changes in Update Rdata File
How to Log an R Session to a File
Disconnected from Server in Shinyapps, But Local's Working
Assign Point Color Depending on Data.Frame Column Value R
How to Set Factor Levels to the Order They Appear in a Data Frame
How to Flatten R Data Frame That Contains Lists