Transforming Dataset into value matrix
If you have several users and several movies, you could easily run out of memory in building a matrix
. For instance say that users are 1000 and the different movies are 1000. You'll end up with a matrix
containing 1M entries, most of them will be missing (since not every users saw every movie).
If your dataset is big, a sparseMatrix
from the Matrix
package is the way to go. If both users and movies id's are sequential (i.e. they start with 1 and finish with the number of different entries), building it is straightforward. Using @StevenBeaupré data
:
require(Matrix)
mat<-sparseMatrix(df$userId,df$movieId,x=df$rating)
If the id's are not sequential:
mat<-sparseMatrix(as.integer(factor(df$userId)),
as.integer(factor(df$movieId)),x=df$rating)
You can basically perform any matrix
operation on a sparseMatrix
too.
How to transform a dataset into a presence/absence matrix?
Here's a tidy solution:
library(stringr)
library(dplyr)
library(tidyr)
dat <- data.frame(
species = c("species_1", "species_1, species_2", "species_2, species_3"),
year = c(2000, 2003, 2005)
)
library(stringr)
dat %>%
rowwise() %>%
mutate(species = list(str_split(species, ",")[[1]])) %>%
unnest(species) %>%
mutate(species = trimws(species),
value=1) %>%
pivot_wider(names_from="species", values_fill = 0)
#> # A tibble: 3 × 4
#> year species_1 species_2 species_3
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 1 0 0
#> 2 2003 1 1 0
#> 3 2005 0 1 1
Created on 2022-06-30 by the reprex package (v2.0.1)
Transform data from column vector to matrix
Start by reading into a pandas dataframe, and then do a pivot_table
:
(df['Label-Label'].str.split('-', expand=True)
.assign(data=df.Data)
.pivot_table('data',0,1, fill_value=0))
1 B D F I
0
A 1.0 4.0 0.0 0.0
B 0.0 0.0 10.0 0.0
C 0.0 5.0 0.0 0.0
H 0.0 0.0 0.0 12.0
How to transform a directed Dataset into a Matrix with R
Another option , is to modelize your directed dataset as a directed graph and extract adjacency matrix.
library(igraph)
dat <- read.table(text='ID LinkedTo
2 1
3 1
4 3
5 4',header=TRUE)
gg <- graph.data.frame(dat)
as.matrix(get.adjacency(gg))
2 3 4 5 1
2 0 0 0 0 1
3 0 0 0 0 1
4 0 1 0 0 0
5 0 0 1 0 0
1 0 0 0 0 0
Transform a matrix (or table) into a table-list?
Inferring that you don't want 0
rows, this is just a pivot/filter 2-step.
base R and reshape2
longdat <- reshape2::melt(dat, "station", variable.name = "sp", value.name = "number")
longdat
# station sp number
# 1 2 SP1 0
# 2 10 SP1 0
# 3 34 SP1 0
# 4 53 SP1 0
# 5 56 SP1 6
# 6 57 SP1 1
# 7 62 SP1 1
# 8 2 SP2 1
# 9 10 SP2 3
# 10 34 SP2 0
# 11 53 SP2 3
# 12 56 SP2 0
# 13 57 SP2 0
# 14 62 SP2 8
# 15 2 SP3 1
# 16 10 SP3 0
# 17 34 SP3 0
# 18 53 SP3 5
# 19 56 SP3 3
# 20 57 SP3 0
# 21 62 SP3 10
subset(longdat, number > 0)
# station sp number
# 5 56 SP1 6
# 6 57 SP1 1
# 7 62 SP1 1
# 8 2 SP2 1
# 9 10 SP2 3
# 11 53 SP2 3
# 14 62 SP2 8
# 15 2 SP3 1
# 18 53 SP3 5
# 19 56 SP3 3
# 21 62 SP3 10
dplyr
library(dplyr)
dat %>%
pivot_longer(-station, names_to = "sp", values_to = "number") %>%
dplyr::filter(number > 0)
# # A tibble: 11 x 3
# station sp number
# <int> <chr> <int>
# 1 2 SP2 1
# 2 2 SP3 1
# 3 10 SP2 3
# 4 53 SP2 3
# 5 53 SP3 5
# 6 56 SP1 6
# 7 56 SP3 3
# 8 57 SP1 1
# 9 62 SP1 1
# 10 62 SP2 8
# 11 62 SP3 10
data.table
(Effectively the same as reshape2
.)
library(data.table)
data.table::melt(as.data.table(dat), "station", variable.name = "sp", value.name = "number"
)[ number > 0, ]
# station sp number
# <int> <fctr> <int>
# 1: 56 SP1 6
# 2: 57 SP1 1
# 3: 62 SP1 1
# 4: 2 SP2 1
# 5: 10 SP2 3
# 6: 53 SP2 3
# 7: 62 SP2 8
# 8: 2 SP3 1
# 9: 53 SP3 5
# 10: 56 SP3 3
# 11: 62 SP3 10
Data
dat <- structure(list(station = c(2L, 10L, 34L, 53L, 56L, 57L, 62L), SP1 = c(0L, 0L, 0L, 0L, 6L, 1L, 1L), SP2 = c(1L, 3L, 0L, 3L, 0L, 0L, 8L), SP3 = c(1L, 0L, 0L, 5L, 3L, 0L, 10L)), class = "data.frame", row.names = c(NA, -7L))
R: How do I transform a data frame into an adjacency matrix if values meet a certain condition?
I have only a non-vectorized solution to offer, which indeed essentially does iterate over every combination of sounds within the track.
track <- c('track1A', 'track1A', 'track1A', 'track1A', 'track1B', 'track1B', 'track1B', 'track1B', 'track1C', 'track1C', 'track1C')
sound <- c('car', 'person', 'car', 'dog', 'cat', 'car', 'car', 'person', 'dog', 'car', 'person')
start <- c(1000, 1200, 1500, 2300, 5000, 5500, 7500, 8000, 1300, 1500, 1700)
end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 9000, 1600, 1800, 2000)
track_df <- data.frame(track, sound, start, end)
names = levels(track_df$sound)
m = matrix(0, length(names), length(names), F, list(names, names))
for (track in split(track_df, track_df$track))
{
n = nrow(track)
for (i in 1:(n-1)) for (j in (i+1):n)
if (track[i,]$start < track[j,]$end)
if (track[j,]$start < track[i,]$end)
m[track[j,]$sound, track[i,]$sound] =
m[track[i,]$sound, track[j,]$sound] =
m[track[i,]$sound, track[j,]$sound] + 1
}
print(m)
Transform DF Column Values to Matrix in R
What you're looking for is dcast()
:
dcast(dat, Date ~ Type, fun.aggregate = length, value.var = "Type")
This function will quickly aggregate your data based upon the fun.aggregate
argument (in your case length()
.
Related Topics
Rank a Vector Based on Order and Replace Ties with Their Average
R: Arranging Multiple Plots Together Using Gridextra
What Best Practices Do You Use for Programming in R
More Efficient Means of Creating a Corpus and Dtm with 4M Rows
Putting X-Axis at Top of Ggplot2 Chart
How to Change the Background Color of the Shiny Dashboard Body
Plot Every Column in a Data Frame as a Histogram on One Page Using Ggplot
How to Plot 3D Scatter Diagram Using Ggplot
R Partial Reshape Data from Long to Wide
R: Multiple Linear Regression Model and Prediction Model
How to Identify the Distribution of the Given Data Using R
Rmarkdown: Pandoc: PDFlatex Not Found
Writing Functions VS. Line-By-Line Interpretation in an R Workflow
Duplicate a Column in Data Frame and Rename It to Another Column Name