Using Reshape from Wide to Long in R

Reshaping data.frame from wide to long format

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

reshape(d, 
direction = "long",
varying = list(names(d)[3:7]),
v.names = "Value",
idvar = c("Code", "Country"),
timevar = "Year",
times = 1950:1954)

Using Reshape from wide to long in R

Here are three examples (along with some sample data that I think is representative of what you described).

Here's the sample data:

set.seed(1)
mydf <- data.frame(
company = LETTERS[1:4],
earnings_2012 = runif(4),
earnings_2011 = runif(4),
earnings_2010 = runif(4),
assets_2012 = runif(4),
assets_2011 = runif(4),
assets_2010 = runif(4)
)

mydf
# company earnings_2012 earnings_2011 earnings_2010 assets_2012 assets_2011 assets_2010
# 1 A 0.2655087 0.2016819 0.62911404 0.6870228 0.7176185 0.9347052
# 2 B 0.3721239 0.8983897 0.06178627 0.3841037 0.9919061 0.2121425
# 3 C 0.5728534 0.9446753 0.20597457 0.7698414 0.3800352 0.6516738
# 4 D 0.9082078 0.6607978 0.17655675 0.4976992 0.7774452 0.1255551

Option 1: reshape

One limitation is that it won't handle "unbalanced" datasets (for example, if you didn't have "assets_2010" as part of your data, this wouldn't work).

reshape(mydf, direction = "long", idvar="company", 
varying = 2:ncol(mydf), sep = "_")
# company time earnings assets
# A.2012 A 2012 0.26550866 0.6870228
# B.2012 B 2012 0.37212390 0.3841037
# C.2012 C 2012 0.57285336 0.7698414
# D.2012 D 2012 0.90820779 0.4976992
# A.2011 A 2011 0.20168193 0.7176185
# B.2011 B 2011 0.89838968 0.9919061
# C.2011 C 2011 0.94467527 0.3800352
# D.2011 D 2011 0.66079779 0.7774452
# A.2010 A 2010 0.62911404 0.9347052
# B.2010 B 2010 0.06178627 0.2121425
# C.2010 C 2010 0.20597457 0.6516738
# D.2010 D 2010 0.17655675 0.1255551

Option 2: The "reshape2" package

Quite popular for its syntax. Needs a little bit of processing before it can work since the column names need to be split in order for us to get this "double-wide" type of data. Is able to handle unbalanced data, but won't be the best if your varying columns are of different column types (numeric, character, factor).

library(reshape2)
dfL <- melt(mydf, id.vars="company")
dfL <- cbind(dfL, colsplit(dfL$variable, "_", c("var", "year")))
dcast(dfL, company + year ~ var, value.var="value")
# company year assets earnings
# 1 A 2010 0.9347052 0.62911404
# 2 A 2011 0.7176185 0.20168193
# 3 A 2012 0.6870228 0.26550866
# 4 B 2010 0.2121425 0.06178627
# 5 B 2011 0.9919061 0.89838968
# 6 B 2012 0.3841037 0.37212390
# 7 C 2010 0.6516738 0.20597457
# 8 C 2011 0.3800352 0.94467527
# 9 C 2012 0.7698414 0.57285336
# 10 D 2010 0.1255551 0.17655675
# 11 D 2011 0.7774452 0.66079779
# 12 D 2012 0.4976992 0.90820779

Option 3: merged.stack from "splitstackshape"

merged.stack from my "splitstackshape" package has pretty straightforward syntax and should be pretty fast if you need to end up with this "double-wide" type of structure. It was created to be able to handle unbalanced data and since it treats columns separately, won't have problems with converting column types.

library(splitstackshape)
merged.stack(mydf, id.vars="company",
var.stubs=c("earnings", "assets"), sep = "_")
# company .time_1 earnings assets
# 1: A 2010 0.62911404 0.9347052
# 2: A 2011 0.20168193 0.7176185
# 3: A 2012 0.26550866 0.6870228
# 4: B 2010 0.06178627 0.2121425
# 5: B 2011 0.89838968 0.9919061
# 6: B 2012 0.37212390 0.3841037
# 7: C 2010 0.20597457 0.6516738
# 8: C 2011 0.94467527 0.3800352
# 9: C 2012 0.57285336 0.7698414
# 10: D 2010 0.17655675 0.1255551
# 11: D 2011 0.66079779 0.7774452
# 12: D 2012 0.90820779 0.4976992

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

Problem when reshaping data from long to wide format in R

Reshaping data with stats::reshape can be tedious. Hadley Wickham and
his team have spent quite some time on creating a comprehensive solution.
First there was the reshape2 package, then tidyr had spread() and gather(),
those are now replaced complemented by pivot_wider() and pivot_longer().

This is how you can use tidyr::pivot_wider() to achieve the result, you seem to
be going for.

library(tidyr)
pivot_wider(
my_df,
id_cols = c(transcript, response),
names_from = hours,
values_from = exp.change,
names_prefix = "exp.change_"
)
#> # A tibble: 6 x 7
#> transcript response exp.change_0 exp.change_2 exp.change_8 exp.change_24
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TR100743-… Primary NA -43.2 -61.3 965.
#> 2 TR100987-… Primary NA -46.3 3.29 -100.
#> 3 TR101301-… Primary NA -29.6 522. 40.5
#> 4 TR102190-… Tertiary NA -18.8 5.49 55.1
#> 5 TR102346-… Primary NA -100. 789697313. 18.6
#> 6 TR102352-… Primary NA -31.3 9.65 28.5
#> # … with 1 more variable: exp.change_48 <dbl>

I think having dedicated commands with dedicated documentation for the two transformations (wide/long) makes the tidyr commands much easier to use, compared to stats::reshape().

EDIT:
stats::reshape() is giving weird results, because it seems to be having issues dealing with my_df being a tibble. Other than that your command was just fine. Just add in a as.data.frame() and you are good to go.

reshape(
as.data.frame(my_df),
idvar = c("transcript", "response"),
timevar = "hours",
v.names = "exp.change",
direction = "wide"
)
#> transcript response exp.change.0 exp.change.2 exp.change.8
#> 1 TR100743-c0_g1_i3 Primary NA -43.19583 -6.130140e+01
#> 6 TR100987-c0_g1_i2 Primary NA -46.25638 3.293969e+00
#> 11 TR101301-c4_g1_i16 Primary NA -29.63413 5.222249e+02
#> 16 TR102190-c1_g1_i1 Tertiary NA -18.76708 5.494728e+00
#> 21 TR102346-c0_g2_i1 Primary NA -99.99996 7.896973e+08
#> 26 TR102352-c4_g2_i5 Primary NA -31.33341 9.647458e+00
#> exp.change.24 exp.change.48
#> 1 964.92512 -5.270607e+01
#> 6 -99.99947 1.067105e+08
#> 11 40.47377 -1.343882e+00
#> 16 55.10727 3.358246e+01
#> 21 18.63375 5.244430e+01
#> 26 28.48553 7.058088e+01

But since it seems that you are already using the tidyverse tidyr::pivot_wider() seems like the best fit.

reshape dataframe from wide to long in R

Using data.table:

library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'),
measure.vars = patterns(fixed='fixed_', current='current_'),
variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
## id name year fixed current
## 1: 1 A 2020 2300 3000
## 2: 2 A 2019 2100 3100
## 3: 3 A 2018 2600 3200
## 4: 4 A 2017 2600 3300
## 5: 5 A 2016 1900 3400

This should be very fast but your data-set is not very big tbh.

Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020 as the first fixed_* column, there is also a current_2020 as the first current_* column, and so on. Otherwise, the year column will correctly associate with fixed but not current

Converting data from wide to long (using multiple columns)

You can use the base reshape() function to (roughly) simultaneously melt over multiple sets of variables, by using the varying parameter and setting direction to "long".

For example here, you are supplying a list of three "sets" (vectors) of variable names to the varying argument:

dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk
1 2 0 0 2 4 5 7 0.876
1 5 0 1 2 4 4 3 0.765
", header=TRUE)

reshape(dat, direction="long",
varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")),
v.names=c("f","op","ed"))

You'll end up with this:

    cid dyad  junk time f op ed id
1.1 1 2 0.876 1 0 2 5 1
2.1 1 5 0.765 1 0 2 4 2
1.2 1 2 0.876 2 0 4 7 1
2.2 1 5 0.765 2 1 4 3 2

Notice that two variables get created, in addition to the three sets getting collapsed: an $id variable -- which tracks the row number in the original table (dat), and a $time variable -- which corresponds to the order of the original variables that were collapsed. There are also now nested row numbers -- 1.1, 2.1, 1.2, 2.2, which here are just the values of $id and $time at that row, respectively.

Without knowing exactly what you're trying to track, hard to say whether $id or $time is what you want to use as the row identifier, but they're both there.

Might also be useful to play with the parameters timevar and idvar (you can set timevar to NULL, for example).

reshape(dat, direction="long", 
varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")),
v.names=c("f","op","ed"),
timevar="id1", idvar="id2")

Reshape data set from wide to long format grouped by variable suffix

Using reshape we can set the cutpoints with sep="".

reshape(d, idvar="ID", varying=2:5, timevar="YEAR", sep="", direction="long")
# ID YEAR MI FRAC
# 1.1995 1 1995 2 3
# 7.1995 7 1995 3 10
# 10.1995 10 1995 1 2
# 1.1996 1 1996 2 4
# 7.1996 7 1996 12 1
# 10.1996 10 1996 1 1

Data

d <- structure(list(ID = c(1L, 7L, 10L), MI_1995 = c(2L, 3L, 1L),
FRAC_1995 = c(3L, 10L, 2L), MI_1996 = c(2L, 12L, 1L),
FRAC_1996 = c(4L, 1L, 1L)), row.names = c(NA, -3L),
class = "data.frame")


Related Topics



Leave a reply



Submit