Reshape VS. Reshape2 in R

reshape vs. reshape2 in R

reshape2 let Hadley make a rebooted reshape that was way, way faster, while avoiding busting up people's dependencies and habits.

https://stat.ethz.ch/pipermail/r-packages/2010/001169.html

Reshape2 is a reboot of the reshape package. It's been over five years
since the first release of the package, and in that time I've learned
a tremendous amount about R programming, and how to work with data in
R. Reshape2 uses that knowledge to make a new package for reshaping
data that is much more focussed and much much faster.

This version improves speed at the cost of functionality, so I have
renamed it to reshape2 to avoid causing problems for existing users.
Based on user feedback I may reintroduce some of these features.

What's new in reshape2:

  • considerably faster and more memory efficient thanks to a much
    better underlying algorithm that uses the power and speed of
    subsetting to the fullest extent, in most cases only making a
    single copy of the data.

  • cast is replaced by two functions depending on the output type:
    dcast produces data frames, and acast produces matrices/arrays.

  • multidimensional margins are now possible: grand_row and
    grand_col have been dropped: now the name of the margin refers to
    the variable that has its value set to (all).

  • some features have been removed such as the | cast operator, and
    the ability to return multiple values from an aggregation function.
    I'm reasonably sure both these operations are better performed by
    plyr.

  • a new cast syntax which allows you to reshape based on functions

    of variables (based on the same underlying syntax as plyr):

  • better development practices like namespaces and tests.

reshape a dataframe with tidyr or reshape2

Using tidyr:

library(tidyr)
input %>%
gather(var, val, v1:c3) %>%
separate(var, c("var", "T"), sep = 1) %>%
spread(var, val) %>%
arrange(T)
# ID T c v
#1 1 1 -6 -3
#2 2 1 -11 -10
#3 3 1 5 4
#4 4 1 5 -6
#5 5 1 -12 -7
#6 1 2 -1 -11
#7 2 2 4 -4
#8 3 2 1 -4
#9 4 2 -1 0
#10 5 2 -11 12
#11 1 3 -1 -2
#12 2 3 6 -12
#13 3 3 -3 15
#14 4 3 8 -6
#15 5 3 11 6

tidyr VS dplyr + reshape2

Tidyr follows the tidyverse conventions, like dplyr:

  • functions designed to work well with pipes %>%

  • non-standard evaluation (NSE), which means you use unquoted column names rather than strings

  • rlang tidy dots semantics, like other tidyverse packages, which means you can use !! and !!! which are very powerful once you know how to use them. Of course, you can do the same without fancy syntax if you don't use functions with NSE... but if you already use dplyr you're already using NSE everywhere.

If you already use dplyr, your code may look more consistent if you also use tidyr for data reshaping.

Besides, reshape2 focuses on reshaping data (melt/cast) while tidyr does this (gather/spread) and more like manipulating columns (unite/separate/extract), creating and working with list-columns and nested data/frames (nest/unnest), dealing with missing values (complete/expand/fill).

I should also say that dplyr and tidyr are complementary, so I would challenge your frame (tidyr) VS (dplyr + reshape2). dplyr is indispensible whether you work with tidyr or reshape2.

Ultimately, melt/dcast is equivalent to gather/spread, so it is a personal preference until you need the other tidyr features, or if you want to follow the "tidyverse trend".

R using Reshape2 to do what reshape (stats package function) was designed for

This is just one of those times when reshape() is more straightforward to use.

The most direct approach using a combination of melt and dcast.data.table that I can think of is as follows:

library(data.table)
library(reshape2)

longtable <- melt(widetable, id.vars = "id")
vars <- do.call(rbind, strsplit(as.character(longtable$variable), ".", TRUE))
dcast.data.table(longtable[, c("V1", "V2") := lapply(1:2, function(x) vars[, x])],
id + V2 ~ V1, value.var = "value")

An alternative is to use merged.stack from my "splitstackshape" package, specifically the development version.

# library(devtools)
# install_github("splitstackshape", "mrdwab", ref = "devel")
library(splitstackshape)

merged.stack(widetable, id.vars = "id", var.stubs = c("A", "B"), sep = "\\.")
# id .time_1 A B
# 1: 1 2012-10 0.26550866 0.2059746
# 2: 1 2012-11 0.89838968 0.4976992
# 3: 2 2012-10 0.37212390 0.1765568
# 4: 2 2012-11 0.94467527 0.7176185
# 5: 3 2012-10 0.57285336 0.6870228
# 6: 3 2012-11 0.66079779 0.9919061
# 7: 4 2012-10 0.90820779 0.3841037
# 8: 4 2012-11 0.62911404 0.3800352
# 9: 5 2012-10 0.20168193 0.7698414
# 10: 5 2012-11 0.06178627 0.7774452

The merged.stack function works differently from a simple melt because it starts by "stacking" different groups of columns in a list and then merging them together. This allows the function to:

  1. Work with column groups where each column group might be of a different type (character, numeric, and so on).
  2. Work with "unbalanced" column groups (where one group might have two measure columns and another might have three).

This answer is based on the following sample data:

set.seed(1) # Please use `set.seed()` when sharing an example with random numbers
widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
"B.2012-10"=runif(5),"B.2012-11"=runif(5))

See also: What reshaping problems can melt/cast not solve in a single step?

Reshape DF from long to wide in R using Reshape2 without an aggregation function

We can use dcast from data.table, which can take multiple value.var columns. Convert the 'data.frame' to 'data.table' (setDT(df)), use the dcast with formula and value.var specified.

library(data.table)
dcast(setDT(df), id~gid, value.var=names(df)[2:6])

NOTE: The data.table method would be faster compared to the reshape2

Base R reshape() versus tidyverse

I don't think there is a tidyverse solution with a single function call, but a good solution is not that complicated either. We need to gather first, then separate the time and keys, and then spread it back again.

DF %>% 
gather(key, val, -id, -trt) %>%
separate(key, c('key', 'time')) %>%
spread(key, val)
      id trt time       play       talk      total       work
1 x1.1 tr T1 0.86472123 0.53559704 0.27548386 0.65165567
2 x1.1 tr T2 0.03188816 0.07557029 0.86138244 0.35432806
3 x1.10 cnt T1 0.35589774 0.50050323 0.80154700 0.83613414
4 x1.10 cnt T2 0.21913855 0.20795168 0.17015172 0.50528560
5 x1.2 cnt T1 0.61535242 0.09308813 0.22890394 0.56773775
6 x1.2 cnt T2 0.11446759 0.53442678 0.46439198 0.93643254
7 x1.3 cnt T1 0.77510990 0.16980304 0.01443391 0.11350898
8 x1.3 cnt T2 0.46893548 0.64135658 0.22286743 0.24586639
9 x1.4 tr T1 0.35556869 0.89983245 0.72896456 0.59592531
10 x1.4 tr T2 0.39698674 0.52573932 0.62354960 0.47314146
11 x1.5 cnt T1 0.40584997 0.42263761 0.24988047 0.35804998
12 x1.5 cnt T2 0.83361919 0.03928139 0.20364770 0.19156087
13 x1.6 cnt T1 0.70664691 0.74774647 0.16118328 0.42880942
14 x1.6 cnt T2 0.76112174 0.54585984 0.01967341 0.58322197
15 x1.7 cnt T1 0.83828767 0.82265258 0.01704265 0.05190332
16 x1.7 cnt T2 0.57335645 0.37276310 0.79799301 0.45947319
17 x1.8 cnt T1 0.23958913 0.95465365 0.48610035 0.26417767
18 x1.8 cnt T2 0.44750805 0.96130241 0.27431890 0.46743405
19 x1.9 tr T1 0.77077153 0.68544451 0.10290017 0.39879073
20 x1.9 tr T2 0.08380201 0.25734157 0.16660910 0.39983256

Data manipulation using dcast R

reshape2::dcast(dat, Data ~ Flag, value.var = "Answer")
# Data 1 2
# 1 X Yes Yes
# 2 Y Yes No
# 3 Z Yes Yes

Data

dat <- structure(list(Data = c("X", "X", "Y", "Y", "Z", "Z"), Flag = c(1L, 2L, 1L, 2L, 1L, 2L), Answer = c("Yes", "Yes", "Yes", "No", "Yes", "Yes")), class = "data.frame", row.names = c(NA, -6L))

Difference between gather, reshape, cast, etc

Please use the search function prior to posting. This has been asked a lot here on SO!

In the tidyverse you can do:

data %>%
group_by(id) %>%
mutate(n = 1:n()) %>%
ungroup() %>%
spread(id, val) %>%
select(-n)
## A tibble: 10 x 3
# A B C
# <int> <int> <int>
# 1 1 11 21
# 2 2 12 22
# 3 3 13 23
# 4 4 14 24
# 5 5 15 25
# 6 6 16 26
# 7 7 17 27
# 8 8 18 28
# 9 9 19 29
#10 10 20 30

Comment: I suggest executing the above line by line to see what each command does. Also note that

data %>%
spread(id, val)

will produce an error (see @neilfws' explanation in the comment).



Related Topics



Leave a reply



Submit