Reshaping Multiple Sets of Measurement Columns (Wide Format) into Single Columns (Long Format)

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

library("tidyr")
library("magrittr")

a <- structure(list(ID = 1L, 
                    DateRange1Start = structure(7305, class = "Date"), 
                    DateRange1End = structure(7307, class = "Date"), 
                    Value1 = 4.4, 
                    DateRange2Start = structure(7793, class = "Date"),
                    DateRange2End = structure(7856, class = "Date"), 
                    Value2 = 6.2, 
                    DateRange3Start = structure(9255, class = "Date"), 
                    DateRange3End = structure(9653, class = "Date"), 
                    Value3 = 3.3),
               row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer() (counterpart: pivot_wider()) works similar to gather().
However, it offers additional functionality such as multiple value columns.
With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to.
For multiple value columns, names_to may receive multiple new names.

This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc.
Therefore, I renamed the columns in the first step.

(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#>  [1] "ID"               "DateRangeStart_1" "DateRangeEnd_1"  
#>  [4] "Value_1"          "DateRangeStart_2" "DateRangeEnd_2"  
#>  [7] "Value_2"          "DateRangeStart_3" "DateRangeEnd_3"  
#> [10] "Value_3"

pivot_longer(a, 
             cols = -ID, 
             names_to = c(".value", "group"),
             # names_prefix = "DateRange",
             names_sep = "_")
#> # A tibble: 3 x 5
#>      ID group DateRangeEnd DateRangeStart Value
#>   <int> <chr> <date>       <date>         <dbl>
#> 1     1 1     1990-01-03   1990-01-01       4.4
#> 2     1 2     1991-07-06   1991-05-04       6.2
#> 3     1 3     1996-06-06   1995-05-05       3.3

Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

spec <- a %>%
    build_longer_spec(cols = -ID) %>%
    dplyr::transmute(.name = .name,
                     group = readr::parse_number(name),
                     .value = stringr::str_extract(name, "Start|End|Value"))

pivot_longer(a, spec = spec)

^{Created on 2019-03-26 by the reprex package (v0.2.1)}

See also: https://tidyr.tidyverse.org/articles/pivot.html

Reshaping wide to long with multiple values columns

reshape does this with the appropriate arguments.

varying lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names is the long format equivalents. Between the two, a mapping is created.

From ?reshape:

Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.

Given these varying and v.names arguments, reshape is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5 for this example data, but that is not safe in general.

Given the values of times and v.names, reshape splits the varying columns on a . character (the default sep argument) to create the columns in the output.

times specifies values that are to be used in the created var column, and v.names are pasted onto these values to get column names in the wide format for mapping to the result.

Finally, idvar is specified to be the sbj column, which identifies individual records in the wide format (thanks @thelatemail).

reshape(dw, direction='long', 
        varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'), 
        timevar='var',
        times=c('f1', 'f2'),
        v.names=c('avg', 'sd'),
        idvar='sbj')

##      sbj blabla var avg sd
## A.f1   A     bA  f1  10  6
## B.f1   B     bB  f1  12  5
## C.f1   C     bC  f1  20  7
## D.f1   D     bD  f1  22  8
## A.f2   A     bA  f2  50 10
## B.f2   B     bB  f2  70 11
## C.f2   C     bC  f2  20  8
## D.f2   D     bD  f2  22  9

Reshaping from long to wide with multiple columns

pivot_wider may be easier

library(dplyr)
library(stringr)
library(tidyr)
df %>% 
   mutate(time = str_c('t', time)) %>%
   pivot_wider(names_from = time, values_from = c(age, height))

-output

# A tibble: 2 × 5
    PIN age_t1 age_t2 height_t1 height_t2
  <dbl>  <dbl>  <dbl>     <dbl>     <dbl>
1  1001     84     86        58        58
2  1002     22     24        60        62

With reshape from base R, it may need a sequence column

out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
   FUN = seq_along)), idvar = "PIN", 
    direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
   PIN age_1 height_1 age_2 height_2
1 1001    84       58    86       58
3 1002    22       60    24       62

Wide to long with multiple columns

Use the names_sep (with regex lookaround - names_sep = "(?<=\\D)(?=\\d)") or capture as groups in names_pattern (names_pattern = "(\\D+)(\\d+)" - here we are capturing one or more non-digits (\\D+) as a group ((...)) followed by one or more digits (\\d+) separately that corresponds to the vector passed in names_to i.e. ".value" will the value of the columns that for 'arrest', 'lien' and the "grp" will create the new column with the suffix digits from the column names)

library(tidyr)
pivot_longer(df_wide, cols = -rowid, names_to = c(".value", "grp"), 
     names_pattern = "(\\D+)(\\d+)")

-output

# A tibble: 50 × 4
   rowid grp   arrest lien  
   <int> <chr> <date> <date>
 1  9317 1     NA     NA    
 2  9317 2     NA     NA    
 3  9317 3     NA     NA    
 4  9317 4     NA     NA    
 5  9317 5     NA     NA    
 6  9317 6     NA     NA    
 7  9317 7     NA     NA    
 8  9317 8     NA     NA    
 9  9317 9     NA     NA    
10  9317 10    NA     NA    
# … with 40 more rows

Wide to Long preserving a group

You could use pivot_longer to extract the group followed by pivot_wider to restore the readout columns:

library(dplyr)
library(tidyr)

dat %>% 
  pivot_longer(-ID, names_pattern = "(.*)_(.*)", names_to = c("name", "grp")) %>% 
  pivot_wider()

This returns

# A tibble: 30 x 4
   ID    grp   readout1 readout2
   <chr> <chr>    <dbl>    <dbl>
 1 id1   g1      0.732    0.499 
 2 id1   g2     -0.301   -0.0941
 3 id1   g3      0.198    1.62  
 4 id2   g1      0.653    0.732 
 5 id2   g2     -0.325    1.49  
 6 id2   g3     -0.0475  -1.73  
 7 id3   g1     -1.59     1.25  
 8 id3   g2     -1.59     0.688 
 9 id3   g3     -0.738    0.341 
10 id4   g1      1.12     1.11  
# ... with 20 more rows

Wide to long data transformation multiple columns

Using pivot_longer

tidyr::pivot_longer(df_wide, 
                   cols = -c(Company, Industry), 
                   names_to = c(".value", "Year"),
                   names_sep = "\\.") %>% type.convert()

#  Company  Industry       Year Sales EBITDA
#  <fct>    <fct>         <int> <int>  <int>
#1 CompanyA Manufacturing  2015   100     10
#2 CompanyA Manufacturing  2016   110     11
#3 CompanyA Manufacturing  2017   120     12
#4 CompanyB Telecom        2015   500     50
#5 CompanyB Telecom        2016   550     55
#6 CompanyB Telecom        2017   600     60
#7 CompanyC Services       2015  1000    100
#8 CompanyC Services       2016  1100    110
#9 CompanyC Services       2017  1200    120

Reshaping Multiple Sets of Measurement Columns (Wide Format) into Single Columns (Long Format)