reshape vs. reshape2 in R
reshape2
let Hadley make a rebooted reshape
that was way, way faster, while avoiding busting up people's dependencies and habits.
https://stat.ethz.ch/pipermail/r-packages/2010/001169.html
Reshape2 is a reboot of the reshape package. It's been over five years
since the first release of the package, and in that time I've learned
a tremendous amount about R programming, and how to work with data in
R. Reshape2 uses that knowledge to make a new package for reshaping
data that is much more focussed and much much faster.This version improves speed at the cost of functionality, so I have
renamed it toreshape2
to avoid causing problems for existing users.
Based on user feedback I may reintroduce some of these features.What's new in
reshape2
:
considerably faster and more memory efficient thanks to a much
better underlying algorithm that uses the power and speed of
subsetting to the fullest extent, in most cases only making a
single copy of the data.cast is replaced by two functions depending on the output type:
dcast
produces data frames, andacast
produces matrices/arrays.multidimensional margins are now possible:
grand_row
and
grand_col
have been dropped: now the name of the margin refers to
the variable that has its value set to (all).some features have been removed such as the
|
cast operator, and
the ability to return multiple values from an aggregation function.
I'm reasonably sure both these operations are better performed by
plyr.a new cast syntax which allows you to reshape based on functions
of variables (based on the same underlying syntax as plyr):better development practices like namespaces and tests.
reshape a dataframe with tidyr or reshape2
Using tidyr
:
library(tidyr)
input %>%
gather(var, val, v1:c3) %>%
separate(var, c("var", "T"), sep = 1) %>%
spread(var, val) %>%
arrange(T)
# ID T c v
#1 1 1 -6 -3
#2 2 1 -11 -10
#3 3 1 5 4
#4 4 1 5 -6
#5 5 1 -12 -7
#6 1 2 -1 -11
#7 2 2 4 -4
#8 3 2 1 -4
#9 4 2 -1 0
#10 5 2 -11 12
#11 1 3 -1 -2
#12 2 3 6 -12
#13 3 3 -3 15
#14 4 3 8 -6
#15 5 3 11 6
tidyr VS dplyr + reshape2
Tidyr follows the tidyverse
conventions, like dplyr
:
functions designed to work well with pipes
%>%
non-standard evaluation (NSE), which means you use unquoted column names rather than strings
rlang
tidy dots semantics, like other tidyverse packages, which means you can use!!
and!!!
which are very powerful once you know how to use them. Of course, you can do the same without fancy syntax if you don't use functions with NSE... but if you already usedplyr
you're already using NSE everywhere.
If you already use dplyr
, your code may look more consistent if you also use tidyr
for data reshaping.
Besides, reshape2
focuses on reshaping data (melt/cast
) while tidyr
does this (gather/spread
) and more like manipulating columns (unite/separate/extract
), creating and working with list-columns and nested data/frames (nest/unnest
), dealing with missing values (complete/expand/fill
).
I should also say that dplyr
and tidyr
are complementary, so I would challenge your frame (tidyr)
VS (dplyr + reshape2)
. dplyr
is indispensible whether you work with tidyr
or reshape2
.
Ultimately, melt/dcast
is equivalent to gather/spread
, so it is a personal preference until you need the other tidyr
features, or if you want to follow the "tidyverse
trend".
R using Reshape2 to do what reshape (stats package function) was designed for
This is just one of those times when reshape()
is more straightforward to use.
The most direct approach using a combination of melt
and dcast.data.table
that I can think of is as follows:
library(data.table)
library(reshape2)
longtable <- melt(widetable, id.vars = "id")
vars <- do.call(rbind, strsplit(as.character(longtable$variable), ".", TRUE))
dcast.data.table(longtable[, c("V1", "V2") := lapply(1:2, function(x) vars[, x])],
id + V2 ~ V1, value.var = "value")
An alternative is to use merged.stack
from my "splitstackshape" package, specifically the development version.
# library(devtools)
# install_github("splitstackshape", "mrdwab", ref = "devel")
library(splitstackshape)
merged.stack(widetable, id.vars = "id", var.stubs = c("A", "B"), sep = "\\.")
# id .time_1 A B
# 1: 1 2012-10 0.26550866 0.2059746
# 2: 1 2012-11 0.89838968 0.4976992
# 3: 2 2012-10 0.37212390 0.1765568
# 4: 2 2012-11 0.94467527 0.7176185
# 5: 3 2012-10 0.57285336 0.6870228
# 6: 3 2012-11 0.66079779 0.9919061
# 7: 4 2012-10 0.90820779 0.3841037
# 8: 4 2012-11 0.62911404 0.3800352
# 9: 5 2012-10 0.20168193 0.7698414
# 10: 5 2012-11 0.06178627 0.7774452
The merged.stack
function works differently from a simple melt
because it starts by "stacking" different groups of columns in a list
and then merging them together. This allows the function to:
- Work with column groups where each column group might be of a different type (character, numeric, and so on).
- Work with "unbalanced" column groups (where one group might have two measure columns and another might have three).
This answer is based on the following sample data:
set.seed(1) # Please use `set.seed()` when sharing an example with random numbers
widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
"B.2012-10"=runif(5),"B.2012-11"=runif(5))
See also: What reshaping problems can melt/cast not solve in a single step?
Reshape DF from long to wide in R using Reshape2 without an aggregation function
We can use dcast
from data.table
, which can take multiple value.var
columns. Convert the 'data.frame' to 'data.table' (setDT(df)
), use the dcast
with formula and value.var
specified.
library(data.table)
dcast(setDT(df), id~gid, value.var=names(df)[2:6])
NOTE: The data.table
method would be faster compared to the reshape2
Base R reshape() versus tidyverse
I don't think there is a tidyverse solution with a single function call, but a good solution is not that complicated either. We need to gather first, then separate the time and keys, and then spread it back again.
DF %>%
gather(key, val, -id, -trt) %>%
separate(key, c('key', 'time')) %>%
spread(key, val)
id trt time play talk total work
1 x1.1 tr T1 0.86472123 0.53559704 0.27548386 0.65165567
2 x1.1 tr T2 0.03188816 0.07557029 0.86138244 0.35432806
3 x1.10 cnt T1 0.35589774 0.50050323 0.80154700 0.83613414
4 x1.10 cnt T2 0.21913855 0.20795168 0.17015172 0.50528560
5 x1.2 cnt T1 0.61535242 0.09308813 0.22890394 0.56773775
6 x1.2 cnt T2 0.11446759 0.53442678 0.46439198 0.93643254
7 x1.3 cnt T1 0.77510990 0.16980304 0.01443391 0.11350898
8 x1.3 cnt T2 0.46893548 0.64135658 0.22286743 0.24586639
9 x1.4 tr T1 0.35556869 0.89983245 0.72896456 0.59592531
10 x1.4 tr T2 0.39698674 0.52573932 0.62354960 0.47314146
11 x1.5 cnt T1 0.40584997 0.42263761 0.24988047 0.35804998
12 x1.5 cnt T2 0.83361919 0.03928139 0.20364770 0.19156087
13 x1.6 cnt T1 0.70664691 0.74774647 0.16118328 0.42880942
14 x1.6 cnt T2 0.76112174 0.54585984 0.01967341 0.58322197
15 x1.7 cnt T1 0.83828767 0.82265258 0.01704265 0.05190332
16 x1.7 cnt T2 0.57335645 0.37276310 0.79799301 0.45947319
17 x1.8 cnt T1 0.23958913 0.95465365 0.48610035 0.26417767
18 x1.8 cnt T2 0.44750805 0.96130241 0.27431890 0.46743405
19 x1.9 tr T1 0.77077153 0.68544451 0.10290017 0.39879073
20 x1.9 tr T2 0.08380201 0.25734157 0.16660910 0.39983256
Data manipulation using dcast R
reshape2::dcast(dat, Data ~ Flag, value.var = "Answer")
# Data 1 2
# 1 X Yes Yes
# 2 Y Yes No
# 3 Z Yes Yes
Data
dat <- structure(list(Data = c("X", "X", "Y", "Y", "Z", "Z"), Flag = c(1L, 2L, 1L, 2L, 1L, 2L), Answer = c("Yes", "Yes", "Yes", "No", "Yes", "Yes")), class = "data.frame", row.names = c(NA, -6L))
Difference between gather, reshape, cast, etc
Please use the search function prior to posting. This has been asked a lot here on SO!
In the tidyverse
you can do:
data %>%
group_by(id) %>%
mutate(n = 1:n()) %>%
ungroup() %>%
spread(id, val) %>%
select(-n)
## A tibble: 10 x 3
# A B C
# <int> <int> <int>
# 1 1 11 21
# 2 2 12 22
# 3 3 13 23
# 4 4 14 24
# 5 5 15 25
# 6 6 16 26
# 7 7 17 27
# 8 8 18 28
# 9 9 19 29
#10 10 20 30
Comment: I suggest executing the above line by line to see what each command does. Also note that
data %>%
spread(id, val)
will produce an error (see @neilfws' explanation in the comment).
Related Topics
How to Run Lm Regression for Every Column in R
How to Play Birthday Music Using R
Remove Data.Frame Row Names When Using Xtable
Extracting Coefficient Variable Names from Glmnet into a Data.Frame
In R, Extract Part of Object from List
Ggplot2 Increase Space Between Legend Keys
How to Create a Pivot Table in R with Multiple (3+) Variables
Relative Positioning of Geom_Text in Ggplot2
How to Read a Password Protected Excel File into R
Multiple Functions on Multiple Columns by Group, and Create Informative Column Names
Plot Random Effects from Lmer (Lme4 Package) Using Qqmath or Dotplot: How to Make It Look Fancy
Transform Only One Axis to Log10 Scale with Ggplot2
Categorical Bubble Plot for Mapping Studies
What's the Difference in Using a Semicolon or Explicit New Line in R Code