Convert a Matrix with Dimnames into a Long Format Data.Frame

Convert a matrix with dimnames into a long format data.frame

Use melt from reshape2:

library(reshape2)
#Fake data
x <- matrix(1:12, ncol = 3)
colnames(x) <- letters[1:3]
rownames(x) <- 1:4
x.m <- melt(x)
x.m

Var1 Var2 value
1 1 a 1
2 2 a 2
3 3 a 3
4 4 a 4
...

Fastest conversion of matrix to long format data frame in R

So many options:

library(dplyr)
library(tidyr)
library(data.table)

library(microbenchmark)
library(ggplot2)

set.seed(1)
ex <- matrix(data = round(runif(100000), 1), nrow = 1000, ncol = 100)
rownames(ex) <- paste0("row", 1:nrow(ex))
colnames(ex) <- paste0("col", 1:ncol(ex))

comp <- microbenchmark(
table = {
df1 <- as.data.frame(as.table(ex))
},

reshape = {
df2 <- reshape2::melt(ex)
},

dplyr = {
df3 <- ex %>%
as.data.frame() %>%
tibble::rownames_to_column("Var1") %>%
gather("Var2", "value", -Var1)
},

data.table = {
dt = melt(data.table(ex, keep.rownames = TRUE) , id.vars = c("rn"))
},

data.table2 = {
melt(as.data.table(ex)[, rn := seq_len(.N)], id.var = 'rn')
},

data.table3 = {
data.table(Var1 = rownames(ex), Var2 = colnames(ex), value = c(ex))
}

)

autoplot(comp)

Sample Image

R, dpylr: Converting list of lists of differing lenghts within dataframe into long format dataframe

Approach one:

First, we get the matrices to data.frames, then we add the rownames as a separate column called a, and gather them all. By unnesting we get one big data.frame. Adding in the NA values is easy with complete

library(tidyverse) # using dplyr, tidyr and purrr

df %>%
mutate(Value = map(Value, as.data.frame),
Value = map(Value, rownames_to_column, 'a'),
Value = map(Value, ~gather(., b, value, -a))) %>%
unnest(Value) %>%
complete(Step, a, b)

Approach two:

Manually define the data.frame, then do the same:

df %>% 
mutate(Value = map(Value,
~data_frame(val = c(.),
a = rep(rownames(.), each = ncol(.)),
b = rep(colnames(.), nrow(.))))) %>%
unnest(Value) %>%
complete(Step, a, b))

Result:

Both give:

# A tibble: 30 × 4
Step a b value
<int> <chr> <chr> <dbl>
1 1 4 0.01 NA
2 1 4 0.021 NA
3 1 4 0.044 NA
4 1 4 0.094 0.932
5 1 4 0.2 0.232
6 1 5 0.01 NA
7 1 5 0.021 NA
8 1 5 0.044 NA
9 1 5 0.094 0.875
10 1 5 0.2 0.261
# ... with 20 more rows

Convert from n x m matrix to long matrix in R

If you need a single column matrix

 matrix(m, dimnames=list(t(outer(colnames(m), rownames(m), FUN=paste)), NULL))
# [,1]
#a d 1
#a e 4
#b d 2
#b e 5
#c d 3
#c e 6

For a data.frame output, you can use melt from reshape2

 library(reshape2)
melt(m)

Melt logical matrix to long format

If the matrix is called mat, you can use which with arr.ind = TRUE to get row and column number of TRUE values. Use that to index rownames and colnames.

mat1 <- which(mat, arr.ind = TRUE)
data.frame(R = rownames(mat)[mat1[, 1]], C = colnames(mat)[mat1[, 2]])

# R C
#1 a x
#2 b x
#3 b y

R: Reshape count matrix to long format with multiple entries

We could do this with base R. We convert the dimnames of 'm0' to a 'data.frame' with two columns using expand.grid, then replicate the rows of the dataset with the values in 'm0', order the rows and change the row names to NULL (if necessary).

d1 <- expand.grid(dimnames(m0))
d2 <- d1[rep(1:nrow(d1), c(m0)),]
res <- d2[order(d2$Var1),]
row.names(res) <- NULL
res
# Var1 Var2
#1 A A
#2 A B
#3 A B
#4 A B
#5 B A
#6 B A
#7 B B
#8 B B
#9 B B
#10 B B

Or with melt, we convert the 'm0' to 'long' format and then replicate the rows as before.

 library(reshape2)
dM <- melt(m0)
dM[rep(1:nrow(dM), dM$value),1:2]

As @Frank mentioned, we can also use table with as.data.frame to create 'dM'

 dM <- as.data.frame(as.table(m0))

Reshape three column data frame to matrix (long to wide format)

There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.

tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
y=gl(3,1,6, labels=letters[1:3]),
z=c(1,2,3,3,3,2))

Using the tidyverse:

The new cool new way to do this is with pivot_wider from tidyr 1.0.0. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.

library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x a b c
## <fct> <dbl> <dbl> <dbl>
## 1 x 1 2 3
## 2 y 3 3 2

The old cool new way to do this is with spread from tidyr. It similarly returns a data frame.

library(tidyr)
spread(tmp, y, z)
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2

Using reshape2:

One of the first steps toward the tidyverse was the reshape2 package.

To get a matrix use acast:

library(reshape2)
acast(tmp, x~y, value.var="z")
## a b c
## x 1 2 3
## y 3 3 2

Or to get a data frame, use dcast, as here: Reshape data for values in one column.

dcast(tmp, x~y, value.var="z")
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2

Using plyr:

In between reshape2 and the tidyverse came plyr, with the daply function, as shown here: https://stackoverflow.com/a/7020101/210673

library(plyr)
daply(tmp, .(x, y), function(x) x$z)
## y
## x a b c
## x 1 2 3
## y 3 3 2

Using matrix indexing:

This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.

with(tmp, {
out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
dimnames=list(levels(x), levels(y)))
out[cbind(x, y)] <- z
out
})

Using xtabs:

xtabs(z~x+y, data=tmp)

Using a sparse matrix:

There's also sparseMatrix within the Matrix package, as seen here: R - convert BIG table into matrix by column names

with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
## a b c
## x 1 2 3
## y 3 3 2

Using reshape:

You can also use the base R function reshape, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).

reshape(tmp, idvar="x", timevar="y", direction="wide")
## x z.a z.b z.c
## 1 x 1 2 3
## 4 y 3 3 2

Converting a matrix into a tibble in R

as.tibble can convert the matrix's rownames to a column, and then you can use gather() to create the group2 column:

library(tidyverse)

m <- matrix(1:3, nrow = 3, dimnames = list(c("X","Y","Z"), c("A")))

newtib <- m %>%
as.tibble(rownames = "group1") %>%
gather('A', key = "group2", value = "value")

> newtib
# A tibble: 3 × 3
group1 group2 value
<chr> <chr> <int>
1 X A 1
2 Y A 2
3 Z A 3

> tibble::tribble(~group1, ~group2, ~value, "X", "A", 1, "Y", "A", 2, "Z", "A", 3)
# A tibble: 3 × 3
group1 group2 value
<chr> <chr> <dbl>
1 X A 1
2 Y A 2
3 Z A 3

Create dataframe from a matrix

If you change your time column into row names, then you can use as.data.frame(as.table(mat)) for simple cases like this.

Example:

data <- c(0.1, 0.2, 0.3, 0.3, 0.4, 0.5)
dimnames <- list(time=c(0, 0.5, 1), name=c("C_0", "C_1"))
mat <- matrix(data, ncol=2, nrow=3, dimnames=dimnames)
as.data.frame(as.table(mat))
time name Freq
1 0 C_0 0.1
2 0.5 C_0 0.2
3 1 C_0 0.3
4 0 C_1 0.3
5 0.5 C_1 0.4
6 1 C_1 0.5

In this case time and name are both factors. You may want to convert time back to numeric, or it may not matter.



Related Topics



Leave a reply



Submit