Transforming Dataset into Value Matrix

Transforming Dataset into value matrix

If you have several users and several movies, you could easily run out of memory in building a matrix. For instance say that users are 1000 and the different movies are 1000. You'll end up with a matrix containing 1M entries, most of them will be missing (since not every users saw every movie).

If your dataset is big, a sparseMatrix from the Matrix package is the way to go. If both users and movies id's are sequential (i.e. they start with 1 and finish with the number of different entries), building it is straightforward. Using @StevenBeaupré data:

require(Matrix)
mat<-sparseMatrix(df$userId,df$movieId,x=df$rating)

If the id's are not sequential:

mat<-sparseMatrix(as.integer(factor(df$userId)), 
                  as.integer(factor(df$movieId)),x=df$rating)

You can basically perform any matrix operation on a sparseMatrix too.

How to transform a dataset into a presence/absence matrix?

Here's a tidy solution:

library(stringr)
library(dplyr)
library(tidyr)
dat <- data.frame(
  species = c("species_1", "species_1, species_2", "species_2, species_3"), 
  year = c(2000, 2003, 2005)
)
library(stringr)
dat %>% 
  rowwise() %>% 
  mutate(species = list(str_split(species, ",")[[1]])) %>% 
  unnest(species) %>% 
  mutate(species = trimws(species), 
         value=1) %>% 
  pivot_wider(names_from="species", values_fill = 0)
#> # A tibble: 3 × 4
#>    year species_1 species_2 species_3
#>   <dbl>     <dbl>     <dbl>     <dbl>
#> 1  2000         1         0         0
#> 2  2003         1         1         0
#> 3  2005         0         1         1

^{Created on 2022-06-30 by the reprex package (v2.0.1)}

Transform data from column vector to matrix

Start by reading into a pandas dataframe, and then do a pivot_table:

(df['Label-Label'].str.split('-', expand=True)
                  .assign(data=df.Data)
                  .pivot_table('data',0,1, fill_value=0))

1    B    D     F     I
0                      
A  1.0  4.0   0.0   0.0
B  0.0  0.0  10.0   0.0
C  0.0  5.0   0.0   0.0
H  0.0  0.0   0.0  12.0

How to transform a directed Dataset into a Matrix with R

Another option , is to modelize your directed dataset as a directed graph and extract adjacency matrix.

library(igraph)

dat <- read.table(text='ID LinkedTo
2  1
3  1
4  3
5  4',header=TRUE)

gg <- graph.data.frame(dat)
 as.matrix(get.adjacency(gg))
  2 3 4 5 1
2 0 0 0 0 1
3 0 0 0 0 1
4 0 1 0 0 0
5 0 0 1 0 0
1 0 0 0 0 0

Transform a matrix (or table) into a table-list?

Inferring that you don't want 0 rows, this is just a pivot/filter 2-step.

base R and reshape2

longdat <- reshape2::melt(dat, "station", variable.name = "sp", value.name = "number")
longdat
#    station  sp number
# 1        2 SP1      0
# 2       10 SP1      0
# 3       34 SP1      0
# 4       53 SP1      0
# 5       56 SP1      6
# 6       57 SP1      1
# 7       62 SP1      1
# 8        2 SP2      1
# 9       10 SP2      3
# 10      34 SP2      0
# 11      53 SP2      3
# 12      56 SP2      0
# 13      57 SP2      0
# 14      62 SP2      8
# 15       2 SP3      1
# 16      10 SP3      0
# 17      34 SP3      0
# 18      53 SP3      5
# 19      56 SP3      3
# 20      57 SP3      0
# 21      62 SP3     10
subset(longdat, number > 0)
#    station  sp number
# 5       56 SP1      6
# 6       57 SP1      1
# 7       62 SP1      1
# 8        2 SP2      1
# 9       10 SP2      3
# 11      53 SP2      3
# 14      62 SP2      8
# 15       2 SP3      1
# 18      53 SP3      5
# 19      56 SP3      3
# 21      62 SP3     10

dplyr

library(dplyr)
dat %>%
  pivot_longer(-station, names_to = "sp", values_to = "number") %>%
  dplyr::filter(number > 0)
# # A tibble: 11 x 3
#    station sp    number
#      <int> <chr>  <int>
#  1       2 SP2        1
#  2       2 SP3        1
#  3      10 SP2        3
#  4      53 SP2        3
#  5      53 SP3        5
#  6      56 SP1        6
#  7      56 SP3        3
#  8      57 SP1        1
#  9      62 SP1        1
# 10      62 SP2        8
# 11      62 SP3       10

data.table

(Effectively the same as reshape2.)

library(data.table)
data.table::melt(as.data.table(dat), "station", variable.name = "sp", value.name = "number"
   )[ number > 0, ]
#     station     sp number
#       <int> <fctr>  <int>
#  1:      56    SP1      6
#  2:      57    SP1      1
#  3:      62    SP1      1
#  4:       2    SP2      1
#  5:      10    SP2      3
#  6:      53    SP2      3
#  7:      62    SP2      8
#  8:       2    SP3      1
#  9:      53    SP3      5
# 10:      56    SP3      3
# 11:      62    SP3     10

Data

dat <- structure(list(station = c(2L, 10L, 34L, 53L, 56L, 57L, 62L), SP1 = c(0L, 0L, 0L, 0L, 6L, 1L, 1L), SP2 = c(1L, 3L, 0L, 3L, 0L, 0L, 8L), SP3 = c(1L, 0L, 0L, 5L, 3L, 0L, 10L)), class = "data.frame", row.names = c(NA, -7L))

R: How do I transform a data frame into an adjacency matrix if values meet a certain condition?

I have only a non-vectorized solution to offer, which indeed essentially does iterate over every combination of sounds within the track.

track <- c('track1A', 'track1A', 'track1A', 'track1A', 'track1B', 'track1B', 'track1B', 'track1B', 'track1C', 'track1C', 'track1C')
sound <- c('car', 'person', 'car', 'dog', 'cat', 'car', 'car', 'person', 'dog', 'car', 'person')
start <- c(1000, 1200, 1500, 2300, 5000, 5500, 7500, 8000, 1300, 1500, 1700)
end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 9000, 1600, 1800, 2000)
track_df <- data.frame(track, sound, start, end)
names = levels(track_df$sound)
m = matrix(0, length(names), length(names), F, list(names, names))
for (track in split(track_df, track_df$track))
{
    n = nrow(track)
    for (i in 1:(n-1)) for (j in (i+1):n)
        if (track[i,]$start < track[j,]$end)
        if (track[j,]$start < track[i,]$end)
            m[track[j,]$sound, track[i,]$sound] =
            m[track[i,]$sound, track[j,]$sound] =
            m[track[i,]$sound, track[j,]$sound] + 1
}
print(m)

Transform DF Column Values to Matrix in R

What you're looking for is dcast():

dcast(dat, Date ~ Type, fun.aggregate = length, value.var = "Type")

This function will quickly aggregate your data based upon the fun.aggregate argument (in your case length().

Transforming Dataset into Value Matrix