Convert a matrix with dimnames into a long format data.frame
Use melt
from reshape2:
library(reshape2)
#Fake data
x <- matrix(1:12, ncol = 3)
colnames(x) <- letters[1:3]
rownames(x) <- 1:4
x.m <- melt(x)
x.m
Var1 Var2 value
1 1 a 1
2 2 a 2
3 3 a 3
4 4 a 4
...
Fastest conversion of matrix to long format data frame in R
So many options:
library(dplyr)
library(tidyr)
library(data.table)
library(microbenchmark)
library(ggplot2)
set.seed(1)
ex <- matrix(data = round(runif(100000), 1), nrow = 1000, ncol = 100)
rownames(ex) <- paste0("row", 1:nrow(ex))
colnames(ex) <- paste0("col", 1:ncol(ex))
comp <- microbenchmark(
table = {
df1 <- as.data.frame(as.table(ex))
},
reshape = {
df2 <- reshape2::melt(ex)
},
dplyr = {
df3 <- ex %>%
as.data.frame() %>%
tibble::rownames_to_column("Var1") %>%
gather("Var2", "value", -Var1)
},
data.table = {
dt = melt(data.table(ex, keep.rownames = TRUE) , id.vars = c("rn"))
},
data.table2 = {
melt(as.data.table(ex)[, rn := seq_len(.N)], id.var = 'rn')
},
data.table3 = {
data.table(Var1 = rownames(ex), Var2 = colnames(ex), value = c(ex))
}
)
autoplot(comp)
R, dpylr: Converting list of lists of differing lenghts within dataframe into long format dataframe
Approach one:
First, we get the matrices to data.frames, then we add the rownames as a separate column called a
, and gather them all. By unnesting we get one big data.frame. Adding in the NA
values is easy with complete
library(tidyverse) # using dplyr, tidyr and purrr
df %>%
mutate(Value = map(Value, as.data.frame),
Value = map(Value, rownames_to_column, 'a'),
Value = map(Value, ~gather(., b, value, -a))) %>%
unnest(Value) %>%
complete(Step, a, b)
Approach two:
Manually define the data.frame, then do the same:
df %>%
mutate(Value = map(Value,
~data_frame(val = c(.),
a = rep(rownames(.), each = ncol(.)),
b = rep(colnames(.), nrow(.))))) %>%
unnest(Value) %>%
complete(Step, a, b))
Result:
Both give:
# A tibble: 30 × 4
Step a b value
<int> <chr> <chr> <dbl>
1 1 4 0.01 NA
2 1 4 0.021 NA
3 1 4 0.044 NA
4 1 4 0.094 0.932
5 1 4 0.2 0.232
6 1 5 0.01 NA
7 1 5 0.021 NA
8 1 5 0.044 NA
9 1 5 0.094 0.875
10 1 5 0.2 0.261
# ... with 20 more rows
Convert from n x m matrix to long matrix in R
If you need a single column matrix
matrix(m, dimnames=list(t(outer(colnames(m), rownames(m), FUN=paste)), NULL))
# [,1]
#a d 1
#a e 4
#b d 2
#b e 5
#c d 3
#c e 6
For a data.frame output, you can use melt
from reshape2
library(reshape2)
melt(m)
Melt logical matrix to long format
If the matrix is called mat
, you can use which
with arr.ind = TRUE
to get row and column number of TRUE
values. Use that to index rownames
and colnames
.
mat1 <- which(mat, arr.ind = TRUE)
data.frame(R = rownames(mat)[mat1[, 1]], C = colnames(mat)[mat1[, 2]])
# R C
#1 a x
#2 b x
#3 b y
R: Reshape count matrix to long format with multiple entries
We could do this with base R
. We convert the dimnames
of 'm0' to a 'data.frame' with two columns using expand.grid
, then replicate the rows of the dataset with the values in 'm0', order
the rows and change the row names to NULL
(if necessary).
d1 <- expand.grid(dimnames(m0))
d2 <- d1[rep(1:nrow(d1), c(m0)),]
res <- d2[order(d2$Var1),]
row.names(res) <- NULL
res
# Var1 Var2
#1 A A
#2 A B
#3 A B
#4 A B
#5 B A
#6 B A
#7 B B
#8 B B
#9 B B
#10 B B
Or with melt
, we convert the 'm0' to 'long' format and then replicate the rows as before.
library(reshape2)
dM <- melt(m0)
dM[rep(1:nrow(dM), dM$value),1:2]
As @Frank mentioned, we can also use table
with as.data.frame
to create 'dM'
dM <- as.data.frame(as.table(m0))
Reshape three column data frame to matrix (long to wide format)
There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.
tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
y=gl(3,1,6, labels=letters[1:3]),
z=c(1,2,3,3,3,2))
Using the tidyverse:
The new cool new way to do this is with pivot_wider
from tidyr 1.0.0
. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.
library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x a b c
## <fct> <dbl> <dbl> <dbl>
## 1 x 1 2 3
## 2 y 3 3 2
The old cool new way to do this is with spread
from tidyr
. It similarly returns a data frame.
library(tidyr)
spread(tmp, y, z)
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using reshape2:
One of the first steps toward the tidyverse was the reshape2 package.
To get a matrix use acast
:
library(reshape2)
acast(tmp, x~y, value.var="z")
## a b c
## x 1 2 3
## y 3 3 2
Or to get a data frame, use dcast
, as here: Reshape data for values in one column.
dcast(tmp, x~y, value.var="z")
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using plyr:
In between reshape2 and the tidyverse came plyr
, with the daply
function, as shown here: https://stackoverflow.com/a/7020101/210673
library(plyr)
daply(tmp, .(x, y), function(x) x$z)
## y
## x a b c
## x 1 2 3
## y 3 3 2
Using matrix indexing:
This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.
with(tmp, {
out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
dimnames=list(levels(x), levels(y)))
out[cbind(x, y)] <- z
out
})
Using xtabs
:
xtabs(z~x+y, data=tmp)
Using a sparse matrix:
There's also sparseMatrix
within the Matrix
package, as seen here: R - convert BIG table into matrix by column names
with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
## a b c
## x 1 2 3
## y 3 3 2
Using reshape
:
You can also use the base R function reshape
, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).
reshape(tmp, idvar="x", timevar="y", direction="wide")
## x z.a z.b z.c
## 1 x 1 2 3
## 4 y 3 3 2
Converting a matrix into a tibble in R
as.tibble
can convert the matrix's rownames
to a column, and then you can use gather()
to create the group2
column:
library(tidyverse)
m <- matrix(1:3, nrow = 3, dimnames = list(c("X","Y","Z"), c("A")))
newtib <- m %>%
as.tibble(rownames = "group1") %>%
gather('A', key = "group2", value = "value")
> newtib
# A tibble: 3 × 3
group1 group2 value
<chr> <chr> <int>
1 X A 1
2 Y A 2
3 Z A 3
> tibble::tribble(~group1, ~group2, ~value, "X", "A", 1, "Y", "A", 2, "Z", "A", 3)
# A tibble: 3 × 3
group1 group2 value
<chr> <chr> <dbl>
1 X A 1
2 Y A 2
3 Z A 3
Create dataframe from a matrix
If you change your time
column into row names, then you can use as.data.frame(as.table(mat))
for simple cases like this.
Example:
data <- c(0.1, 0.2, 0.3, 0.3, 0.4, 0.5)
dimnames <- list(time=c(0, 0.5, 1), name=c("C_0", "C_1"))
mat <- matrix(data, ncol=2, nrow=3, dimnames=dimnames)
as.data.frame(as.table(mat))
time name Freq
1 0 C_0 0.1
2 0.5 C_0 0.2
3 1 C_0 0.3
4 0 C_1 0.3
5 0.5 C_1 0.4
6 1 C_1 0.5
In this case time and name are both factors. You may want to convert time back to numeric, or it may not matter.
Related Topics
Plotting Envfit Vectors (Vegan Package) in Ggplot2
Differencebetween These Two Comparisons
Using a Loop to Create Multiple Data Frames in R
Adding Time to Posixct Object in R
R: Numeric 'Envir' Arg Not of Length One in Predict()
Differencebetween a List and a Pairlist in R
R: Determine If a Script Is Running in Windows or Linux
Increase Plot Size (Width) in Ggplot2
How to Suppress Output When Using ':=' in R {Data.Table}, Prior to V1.8.3
How to Remove Row If It Has a Na Value in One Certain Column
Changing Font Size in R Datatables (Dt)
How to Remove Unique Entry and Keep Duplicates in R
How to Use Earlier Declared Variables Within Aes in Ggplot with Special Operators (..Count.., etc.)
Update Graph/Plot with Fixed Interval of Time
How to Write a Function That Calls a Function That Calls Data.Table
Time-Series - Data Splitting and Model Evaluation
Show That Shiny Is Busy (Or Loading) When Changing Tab Panels