Transforming Dataset into Value Matrix

Transforming Dataset into value matrix

If you have several users and several movies, you could easily run out of memory in building a matrix. For instance say that users are 1000 and the different movies are 1000. You'll end up with a matrix containing 1M entries, most of them will be missing (since not every users saw every movie).

If your dataset is big, a sparseMatrix from the Matrix package is the way to go. If both users and movies id's are sequential (i.e. they start with 1 and finish with the number of different entries), building it is straightforward. Using @StevenBeaupré data:

require(Matrix)
mat<-sparseMatrix(df$userId,df$movieId,x=df$rating)

If the id's are not sequential:

mat<-sparseMatrix(as.integer(factor(df$userId)), 
as.integer(factor(df$movieId)),x=df$rating)

You can basically perform any matrix operation on a sparseMatrix too.

How to transform a dataset into a presence/absence matrix?

Here's a tidy solution:

library(stringr)
library(dplyr)
library(tidyr)
dat <- data.frame(
species = c("species_1", "species_1, species_2", "species_2, species_3"),
year = c(2000, 2003, 2005)
)
library(stringr)
dat %>%
rowwise() %>%
mutate(species = list(str_split(species, ",")[[1]])) %>%
unnest(species) %>%
mutate(species = trimws(species),
value=1) %>%
pivot_wider(names_from="species", values_fill = 0)
#> # A tibble: 3 × 4
#> year species_1 species_2 species_3
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 1 0 0
#> 2 2003 1 1 0
#> 3 2005 0 1 1

Created on 2022-06-30 by the reprex package (v2.0.1)

Transform data from column vector to matrix

Start by reading into a pandas dataframe, and then do a pivot_table:

(df['Label-Label'].str.split('-', expand=True)
.assign(data=df.Data)
.pivot_table('data',0,1, fill_value=0))

1 B D F I
0
A 1.0 4.0 0.0 0.0
B 0.0 0.0 10.0 0.0
C 0.0 5.0 0.0 0.0
H 0.0 0.0 0.0 12.0

How to transform a directed Dataset into a Matrix with R

Another option , is to modelize your directed dataset as a directed graph and extract adjacency matrix.

library(igraph)

dat <- read.table(text='ID LinkedTo
2 1
3 1
4 3
5 4',header=TRUE)

gg <- graph.data.frame(dat)
as.matrix(get.adjacency(gg))
2 3 4 5 1
2 0 0 0 0 1
3 0 0 0 0 1
4 0 1 0 0 0
5 0 0 1 0 0
1 0 0 0 0 0

Transform a matrix (or table) into a table-list?

Inferring that you don't want 0 rows, this is just a pivot/filter 2-step.

base R and reshape2

longdat <- reshape2::melt(dat, "station", variable.name = "sp", value.name = "number")
longdat
# station sp number
# 1 2 SP1 0
# 2 10 SP1 0
# 3 34 SP1 0
# 4 53 SP1 0
# 5 56 SP1 6
# 6 57 SP1 1
# 7 62 SP1 1
# 8 2 SP2 1
# 9 10 SP2 3
# 10 34 SP2 0
# 11 53 SP2 3
# 12 56 SP2 0
# 13 57 SP2 0
# 14 62 SP2 8
# 15 2 SP3 1
# 16 10 SP3 0
# 17 34 SP3 0
# 18 53 SP3 5
# 19 56 SP3 3
# 20 57 SP3 0
# 21 62 SP3 10
subset(longdat, number > 0)
# station sp number
# 5 56 SP1 6
# 6 57 SP1 1
# 7 62 SP1 1
# 8 2 SP2 1
# 9 10 SP2 3
# 11 53 SP2 3
# 14 62 SP2 8
# 15 2 SP3 1
# 18 53 SP3 5
# 19 56 SP3 3
# 21 62 SP3 10

dplyr

library(dplyr)
dat %>%
pivot_longer(-station, names_to = "sp", values_to = "number") %>%
dplyr::filter(number > 0)
# # A tibble: 11 x 3
# station sp number
# <int> <chr> <int>
# 1 2 SP2 1
# 2 2 SP3 1
# 3 10 SP2 3
# 4 53 SP2 3
# 5 53 SP3 5
# 6 56 SP1 6
# 7 56 SP3 3
# 8 57 SP1 1
# 9 62 SP1 1
# 10 62 SP2 8
# 11 62 SP3 10

data.table

(Effectively the same as reshape2.)

library(data.table)
data.table::melt(as.data.table(dat), "station", variable.name = "sp", value.name = "number"
)[ number > 0, ]
# station sp number
# <int> <fctr> <int>
# 1: 56 SP1 6
# 2: 57 SP1 1
# 3: 62 SP1 1
# 4: 2 SP2 1
# 5: 10 SP2 3
# 6: 53 SP2 3
# 7: 62 SP2 8
# 8: 2 SP3 1
# 9: 53 SP3 5
# 10: 56 SP3 3
# 11: 62 SP3 10

Data

dat <- structure(list(station = c(2L, 10L, 34L, 53L, 56L, 57L, 62L), SP1 = c(0L, 0L, 0L, 0L, 6L, 1L, 1L), SP2 = c(1L, 3L, 0L, 3L, 0L, 0L, 8L), SP3 = c(1L, 0L, 0L, 5L, 3L, 0L, 10L)), class = "data.frame", row.names = c(NA, -7L))

R: How do I transform a data frame into an adjacency matrix if values meet a certain condition?

I have only a non-vectorized solution to offer, which indeed essentially does iterate over every combination of sounds within the track.

track <- c('track1A', 'track1A', 'track1A', 'track1A', 'track1B', 'track1B', 'track1B', 'track1B', 'track1C', 'track1C', 'track1C')
sound <- c('car', 'person', 'car', 'dog', 'cat', 'car', 'car', 'person', 'dog', 'car', 'person')
start <- c(1000, 1200, 1500, 2300, 5000, 5500, 7500, 8000, 1300, 1500, 1700)
end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 9000, 1600, 1800, 2000)
track_df <- data.frame(track, sound, start, end)
names = levels(track_df$sound)
m = matrix(0, length(names), length(names), F, list(names, names))
for (track in split(track_df, track_df$track))
{
n = nrow(track)
for (i in 1:(n-1)) for (j in (i+1):n)
if (track[i,]$start < track[j,]$end)
if (track[j,]$start < track[i,]$end)
m[track[j,]$sound, track[i,]$sound] =
m[track[i,]$sound, track[j,]$sound] =
m[track[i,]$sound, track[j,]$sound] + 1
}
print(m)

Transform DF Column Values to Matrix in R

What you're looking for is dcast():

dcast(dat, Date ~ Type, fun.aggregate = length, value.var = "Type")

This function will quickly aggregate your data based upon the fun.aggregate argument (in your case length().



Related Topics



Leave a reply



Submit