Reshape Data from Wide to Long

Reshaping data.frame from wide to long format

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

reshape(d, 
        direction = "long",
        varying = list(names(d)[3:7]),
        v.names = "Value",
        idvar = c("Code", "Country"),
        timevar = "Year",
        times = 1950:1954)

Reshape data table from wide to long with transpose

A better approach is to use the new pivot_londer and pivot_wider functions from the tidyr package.

Easier convention to use and has convenient text manipulation options built in. In this case removing the "X." that was added to column names.

df <- read.table(header=TRUE, text="Mill   Acid `1_day`  `3_days` `1_week` `2_weeks` `4_weeks` `2_months` `3_months` `6-7_months`
Gävle  0    10.5      12.0     10.9      10.7      10.6       10.1       10    9.81        
Gävle  0.5  8.79    10        9.29      9.08      9.39       9.13       9.14 8.86        
Gävle  0.75 8.05     8.95     8.33      8.26      8.24       8.22       8.25 7.44        
Gävle  1    6.7       7.82     7.77      8.02      8.19       7.79       7.97 6.99        
Gävle  1.25 6.52     7.43     7.33      7.11      7.72       7.88       7.91 6.96        
Gävle  1.5  6.41     7.25     7.28      6.92      7.63       7.01       7.64 6.7   
Obbola  0    10.5    12.0     10.9      10.7      10.6       10.1       10    9.81        
Obbola  0.5  8.79    10        9.29      9.08      9.39       9.13       9.14 8.86        
Obbola  0.75 8.05     8.95     8.33      8.26      8.24       8.22       8.25 7.44        
Obbola  1    6.7     7.82     7.77      8.02      8.19       7.79       7.97 6.99        
Obbola  1.25 6.52   7.43     7.33      7.11      7.72       7.88       7.91 6.96        
Obbola  1.5  6.41   7.25     7.28      6.92      7.63       7.01       7.64 6.7   ")

library(tidyr)

longdf <- df %>% pivot_longer(-c("Mill", "Acid"), names_to="Time", values_to = "value", names_prefix="X.")

answer <-longdf %>% pivot_wider(id_cols= c("Time", "Acid" ), names_from = "Mill" )

reshape dataframe from wide to long in R

Using data.table:

library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'), 
                 measure.vars = patterns(fixed='fixed_', current='current_'), 
                 variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
##    id name year fixed current
## 1:  1    A 2020  2300    3000
## 2:  2    A 2019  2100    3100
## 3:  3    A 2018  2600    3200
## 4:  4    A 2017  2600    3300
## 5:  5    A 2016  1900    3400

This should be very fast but your data-set is not very big tbh.

Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020 as the first fixed_* column, there is also a current_2020 as the first current_* column, and so on. Otherwise, the year column will correctly associate with fixed but not current

Converting data from wide to long (using multiple columns)

You can use the base reshape() function to (roughly) simultaneously melt over multiple sets of variables, by using the varying parameter and setting direction to "long".

For example here, you are supplying a list of three "sets" (vectors) of variable names to the varying argument:

dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765
", header=TRUE)

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"))

You'll end up with this:

    cid dyad  junk time f op ed id
1.1   1    2 0.876    1 0  2  5  1
2.1   1    5 0.765    1 0  2  4  2
1.2   1    2 0.876    2 0  4  7  1
2.2   1    5 0.765    2 1  4  3  2

Notice that two variables get created, in addition to the three sets getting collapsed: an $id variable -- which tracks the row number in the original table (dat), and a $time variable -- which corresponds to the order of the original variables that were collapsed. There are also now nested row numbers -- 1.1, 2.1, 1.2, 2.2, which here are just the values of $id and $time at that row, respectively.

Without knowing exactly what you're trying to track, hard to say whether $id or $time is what you want to use as the row identifier, but they're both there.

Might also be useful to play with the parameters timevar and idvar (you can set timevar to NULL, for example).

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"), 
        timevar="id1", idvar="id2")

Reshape data set from wide to long format grouped by variable suffix

Using reshape we can set the cutpoints with sep="".

reshape(d, idvar="ID", varying=2:5, timevar="YEAR", sep="", direction="long")
#         ID YEAR MI FRAC
# 1.1995   1 1995  2    3
# 7.1995   7 1995  3   10
# 10.1995 10 1995  1    2
# 1.1996   1 1996  2    4
# 7.1996   7 1996 12    1
# 10.1996 10 1996  1    1

Data

d <- structure(list(ID = c(1L, 7L, 10L), MI_1995 = c(2L, 3L, 1L),
                    FRAC_1995 = c(3L, 10L, 2L), MI_1996 = c(2L, 12L, 1L),
                    FRAC_1996 = c(4L, 1L, 1L)), row.names = c(NA, -3L),
               class = "data.frame")

Using Reshape from wide to long in R

Here are three examples (along with some sample data that I think is representative of what you described).

Here's the sample data:

set.seed(1)
mydf <- data.frame(
  company = LETTERS[1:4],
  earnings_2012 = runif(4),
  earnings_2011 = runif(4),
  earnings_2010 = runif(4),
  assets_2012 = runif(4),
  assets_2011 = runif(4),
  assets_2010 = runif(4)
)

mydf
#   company earnings_2012 earnings_2011 earnings_2010 assets_2012 assets_2011 assets_2010
# 1       A     0.2655087     0.2016819    0.62911404   0.6870228   0.7176185   0.9347052
# 2       B     0.3721239     0.8983897    0.06178627   0.3841037   0.9919061   0.2121425
# 3       C     0.5728534     0.9446753    0.20597457   0.7698414   0.3800352   0.6516738
# 4       D     0.9082078     0.6607978    0.17655675   0.4976992   0.7774452   0.1255551

Option 1: `reshape`

One limitation is that it won't handle "unbalanced" datasets (for example, if you didn't have "assets_2010" as part of your data, this wouldn't work).

reshape(mydf, direction = "long", idvar="company", 
        varying = 2:ncol(mydf), sep = "_")
#        company time   earnings    assets
# A.2012       A 2012 0.26550866 0.6870228
# B.2012       B 2012 0.37212390 0.3841037
# C.2012       C 2012 0.57285336 0.7698414
# D.2012       D 2012 0.90820779 0.4976992
# A.2011       A 2011 0.20168193 0.7176185
# B.2011       B 2011 0.89838968 0.9919061
# C.2011       C 2011 0.94467527 0.3800352
# D.2011       D 2011 0.66079779 0.7774452
# A.2010       A 2010 0.62911404 0.9347052
# B.2010       B 2010 0.06178627 0.2121425
# C.2010       C 2010 0.20597457 0.6516738
# D.2010       D 2010 0.17655675 0.1255551

Option 2: The "reshape2" package

Quite popular for its syntax. Needs a little bit of processing before it can work since the column names need to be split in order for us to get this "double-wide" type of data. Is able to handle unbalanced data, but won't be the best if your varying columns are of different column types (numeric, character, factor).

library(reshape2)
dfL <- melt(mydf, id.vars="company")
dfL <- cbind(dfL, colsplit(dfL$variable, "_", c("var", "year")))
dcast(dfL, company + year ~ var, value.var="value")
#    company year    assets   earnings
# 1        A 2010 0.9347052 0.62911404
# 2        A 2011 0.7176185 0.20168193
# 3        A 2012 0.6870228 0.26550866
# 4        B 2010 0.2121425 0.06178627
# 5        B 2011 0.9919061 0.89838968
# 6        B 2012 0.3841037 0.37212390
# 7        C 2010 0.6516738 0.20597457
# 8        C 2011 0.3800352 0.94467527
# 9        C 2012 0.7698414 0.57285336
# 10       D 2010 0.1255551 0.17655675
# 11       D 2011 0.7774452 0.66079779
# 12       D 2012 0.4976992 0.90820779

Option 3: `merged.stack` from "splitstackshape"

merged.stack from my "splitstackshape" package has pretty straightforward syntax and should be pretty fast if you need to end up with this "double-wide" type of structure. It was created to be able to handle unbalanced data and since it treats columns separately, won't have problems with converting column types.

library(splitstackshape)
merged.stack(mydf, id.vars="company", 
             var.stubs=c("earnings", "assets"), sep = "_")
#     company .time_1   earnings    assets
#  1:       A    2010 0.62911404 0.9347052
#  2:       A    2011 0.20168193 0.7176185
#  3:       A    2012 0.26550866 0.6870228
#  4:       B    2010 0.06178627 0.2121425
#  5:       B    2011 0.89838968 0.9919061
#  6:       B    2012 0.37212390 0.3841037
#  7:       C    2010 0.20597457 0.6516738
#  8:       C    2011 0.94467527 0.3800352
#  9:       C    2012 0.57285336 0.7698414
# 10:       D    2010 0.17655675 0.1255551
# 11:       D    2011 0.66079779 0.7774452
# 12:       D    2012 0.90820779 0.4976992

Reshape Data from Wide to Long

Reshaping data.frame from wide to long format

Reshape data table from wide to long with transpose

reshape dataframe from wide to long in R

Converting data from wide to long (using multiple columns)

Reshape data set from wide to long format grouped by variable suffix

Using Reshape from wide to long in R

Option 1: `reshape`

Option 2: The "reshape2" package

Option 3: `merged.stack` from "splitstackshape"

Related Topics

Leave a reply

Reshaping data.frame from wide to long format

Reshape data table from wide to long with transpose

reshape dataframe from wide to long in R

Converting data from wide to long (using multiple columns)

Reshape data set from wide to long format grouped by variable suffix

Using Reshape from wide to long in R

Option 1: reshape

Option 2: The "reshape2" package

Option 3: merged.stack from "splitstackshape"

Related Topics

Leave a reply

Option 1: `reshape`

Option 3: `merged.stack` from "splitstackshape"