Reshape three column data frame to matrix (long to wide format)
There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.
tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
y=gl(3,1,6, labels=letters[1:3]),
z=c(1,2,3,3,3,2))
Using the tidyverse:
The new cool new way to do this is with pivot_wider
from tidyr 1.0.0
. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.
library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x a b c
## <fct> <dbl> <dbl> <dbl>
## 1 x 1 2 3
## 2 y 3 3 2
The old cool new way to do this is with spread
from tidyr
. It similarly returns a data frame.
library(tidyr)
spread(tmp, y, z)
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using reshape2:
One of the first steps toward the tidyverse was the reshape2 package.
To get a matrix use acast
:
library(reshape2)
acast(tmp, x~y, value.var="z")
## a b c
## x 1 2 3
## y 3 3 2
Or to get a data frame, use dcast
, as here: Reshape data for values in one column.
dcast(tmp, x~y, value.var="z")
## x a b c
## 1 x 1 2 3
## 2 y 3 3 2
Using plyr:
In between reshape2 and the tidyverse came plyr
, with the daply
function, as shown here: https://stackoverflow.com/a/7020101/210673
library(plyr)
daply(tmp, .(x, y), function(x) x$z)
## y
## x a b c
## x 1 2 3
## y 3 3 2
Using matrix indexing:
This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.
with(tmp, {
out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
dimnames=list(levels(x), levels(y)))
out[cbind(x, y)] <- z
out
})
Using xtabs
:
xtabs(z~x+y, data=tmp)
Using a sparse matrix:
There's also sparseMatrix
within the Matrix
package, as seen here: R - convert BIG table into matrix by column names
with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
## a b c
## x 1 2 3
## y 3 3 2
Using reshape
:
You can also use the base R function reshape
, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).
reshape(tmp, idvar="x", timevar="y", direction="wide")
## x z.a z.b z.c
## 1 x 1 2 3
## 4 y 3 3 2
creating matrix from three column data frame in R
You may define the fun.aggregate=
.
library(reshape2)
acast(df1, state~season, value.var = 'val_1', fun.aggregate=sum)
# spring summer winter
# BOS 26 19 14
# NY 10 24 3
# WASH 66 42 99
How to reshape data from long to wide format
Using reshape
function:
reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")
Convert a three column dataframe into matrix
You could do
xtabs(freq~., df1)
# Rptname
#Gene Cha1 Cha2 Cha3 Cha4 Cha5 KI2 TH1
# Gene1 1 1 1 1 0 0 0
# Gene2 1 0 0 2 4 2 2
Or
library(reshape2)
acast(df1, Gene~Rptname, value.var='freq', fill=0)
# Cha1 Cha2 Cha3 Cha4 Cha5 KI2 TH1
#Gene1 1 1 1 1 0 0 0
#Gene2 1 0 0 2 4 2 2
Or use spread
to get a 'data.frame' output
library(tidyr)
spread(df1, Rptname, freq, fill=0)
If your original dataset is 'dat', this could be also done using table
table(dat[c('Gene', 'Rptname')])
Convert R dataframe from long to wide format, but with unequal group sizes, for use with qcc
You can create a sequence column ('.id') using getanID
from splitstackshape
and use dcast
from data.table
to convert the long format to wide format. The output of splitstackshape
is a data.table. When we load splitstackshape
, data.table will also be loaded. So, if you already have the devel version of data.table, then the dcast
from data.table
can be used as well.
library(splitstackshape)
dcast(getanID(df1, 'time'), time~.id, value.var='measure')
# time 1 2 3 4 5
#1: 2001 Q1 0.1468068 0.53593193 0.5609797 NA NA
#2: 2001 Q2 -1.4810269 0.18150972 NA NA NA
#3: 2001 Q3 1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502
Update
As @snoram mentioned in the comments, function rowid
from data.table
makes it easier to use just data.table
alone
library(data.table)
dcast(setDT(df1), time ~ rowid(time), value.var = "measure")
Reshape 3 Column Data with ID
I'll take a stab at this. To start, we'll make a reproducible dataset from the example data you posted:
df <- structure(list(SOURCE_SUBREDDIT = c("rddtgaming", "xboxone",
"ps4", "fitnesscirclejerk", "fitnesscirclejerk", "fitnesscirclejerk",
"cancer", "jleague", "bestoftldr", "quityourbullshit"), TARGET_SUBREDDIT = c("rddtrust",
"battlefield_4", "battlefield_4", "leangains", "lifeprotips",
"leangains", "fuckcancer", "soccer", "tifu", "pics"), LINK_SENTIMENT = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, 10L), class = "data.frame")
Note that fitnesscirclejerk
is associated with leangains
twice, which is a feature you mentioned occurs in your data:
df
SOURCE_SUBREDDIT TARGET_SUBREDDIT LINK_SENTIMENT
1 rddtgaming rddtrust 1
2 xboxone battlefield_4 1
3 ps4 battlefield_4 1
4 fitnesscirclejerk leangains 1
5 fitnesscirclejerk lifeprotips 1
6 fitnesscirclejerk leangains 1
7 cancer fuckcancer 1
8 jleague soccer 1
9 bestoftldr tifu 1
10 quityourbullshit pics 1
Now, the goal is to spread this from long-format to wide-format, as in the example image you posted. As you already determined, the identical rows (rows 4 and 6) pose a problem when trying to spread:
tidyr::spread(df, key = TARGET_SUBREDDIT, value = LINK_SENTIMENT, fill = 0)
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 2 rows:
* 4, 6
Do you need to create unique ID with tibble::rowid_to_column()?
Since you want to keep the same number of rows when spreading, we can get around this by adding a unique ID to each row, so each row is unique. You do that with splitstackshape::getanID
, but we can also do that with tidyverse
packages:
df2 <- dplyr::mutate(df, rowid = dplyr::row_number())
df2 <- tibble::rowid_to_column(df)
Both of these give us this data.frame, which I am assuming is similar to your mydata_id
:
df2
rowid SOURCE_SUBREDDIT TARGET_SUBREDDIT LINK_SENTIMENT
1 1 rddtgaming rddtrust 1
2 2 xboxone battlefield_4 1
3 3 ps4 battlefield_4 1
4 4 fitnesscirclejerk leangains 1
5 5 fitnesscirclejerk lifeprotips 1
6 6 fitnesscirclejerk leangains 1
7 7 cancer fuckcancer 1
8 8 jleague soccer 1
9 9 bestoftldr tifu 1
10 10 quityourbullshit pics 1
Now, when we spread, the existence of the unique ID column keeps R from combining (or trying to combine) the rows with identical subreddit pairs:
df3 <- tidyr::spread(df2, key = TARGET_SUBREDDIT, value = LINK_SENTIMENT, fill = 0)
df3
rowid SOURCE_SUBREDDIT battlefield_4 fuckcancer leangains lifeprotips pics rddtrust soccer tifu
1 1 rddtgaming 0 0 0 0 0 1 0 0
2 2 xboxone 1 0 0 0 0 0 0 0
3 3 ps4 1 0 0 0 0 0 0 0
4 4 fitnesscirclejerk 0 0 1 0 0 0 0 0
5 5 fitnesscirclejerk 0 0 0 1 0 0 0 0
6 6 fitnesscirclejerk 0 0 1 0 0 0 0 0
7 7 cancer 0 1 0 0 0 0 0 0
8 8 jleague 0 0 0 0 0 0 1 0
9 9 bestoftldr 0 0 0 0 0 0 0 1
10 10 quityourbullshit 0 0 0 0 1 0 0 0
As you can see, the output of this mirrors the format of your desired output image and preserves both the order of the relationship and duplicate rows.
Reshape from long to wide according to the number of occurrence of one variable
You can try something like this:
df1 %>%
group_by(person, visitID) %>%
summarise(across(matches("v[0-9]+"), list)) %>%
group_by(person) %>%
mutate(visit = seq_len(n()) %>% str_c("visit.", .)) %>%
ungroup() %>%
pivot_wider(
id_cols = person,
names_from = visit,
values_from = c("visitID", matches("v[0-9]+"))
)
replace list
with ~str_c(.x, collapse = ",")
if you want to have it in character style.
Convert dataframe from wide format to long format with key stored in row R
Here is one implementation. We can split
the tibble into a list
of tibble based on the last row, loop through the list
with imap
, rename
the colums to same column names ('a', 'b'), mutate
to create the column 'c' with the list
name and bind the rows
library(tidyverse)
df %>%
slice(-n()) %>%
split.default(df %>%
slice(n()) %>%
flatten_chr) %>%
imap_dfr(~ .x %>%
rename_all(~ c('a', 'b')) %>%
mutate(c = .y))
# A tibble: 20 x 3
# a b c
# <chr> <chr> <chr>
# 1 5 5 G1
# 2 3 3 G1
# 3 5 5 G1
# 4 6 6 G1
# 5 2 2 G1
# 6 8 8 G2
# 7 2 2 G2
# 8 6 6 G2
# 9 4 4 G2
#10 1 1 G2
#11 9 9 G3
#12 3 3 G3
#13 7 7 G3
#14 8 8 G3
#15 4 4 G3
#16 6 6 G4
#17 5 5 G4
#18 2 2 G4
#19 1 1 G4
#20 8 8 G4
Related Topics
How to Trim Leading and Trailing White Space
Dictionary Style Replace Multiple Items
Convert Date-Time String to Class Date
Filtering a Data Frame by Values in a Column
Lm' Summary Not Display All Factor Levels
How to Change the Spacing Between Legend Items in Ggplot2
Cleaning Up Factor Levels (Collapsing Multiple Levels/Labels)
How to Change Legend Title in Ggplot
Select Groups Based on Number of Unique/Distinct Values
Add Count of Unique/Distinct Values by Group to the Original Data
Transform Year/Week to Date Object
Can Dplyr Package Be Used For Conditional Mutating
How to Read Multiple .Txt Files into R
Splitting a Dataframe String Column into Multiple Different Columns