Reshape a Dataframe to Long Format with Multiple Sets of Measure Columns

Reshape a dataframe to long format with multiple sets of measure columns

If the 'year', 'pop', columns are alternating, we can subset with c(TRUE, FALSE) to get the columns 1, 3, 5,..etc. and c(FALSE, TRUE) to get 2, 4, 6,.. due to the recycling. Then, we unlist the columns and create a new 'data.frame.

 df2 <- data.frame(year=unlist(df1[c(TRUE, FALSE)]), 
                  pop=unlist(df1[c(FALSE, TRUE)]))
 row.names(df2) <- NULL
 head(df2)
 #   year    pop
 #1            
 #2 16XX 4675,0
 #3 17XX 4739,3
 #4 17XX 4834,0
 #5 180X 4930,0
 #6 180X 5029,0

Or another option is

library(splitstackshape)
merged.stack(transform(df1, id=1:nrow(df1)), var.stubs=c('year', 'pop'), 
        sep='var.stubs')[order(.time_1), 3:4, with=FALSE]

data

df1 <- structure(list(year1 = c("", "16XX", "17XX", "17XX", "180X", 
"180X", "181X", "181X", "182X", "182X"), pop1 = c("", "4675,0", 
"4739,3", "4834,0", "4930,0", "5029,0", "5129,0", "5231,9", "5297,0", 
"5362,0"), year2 = c(NA, 1900L, 1901L, 1902L, 1903L, 1904L, 1905L, 
1906L, 1907L, 1908L), pop2 = c("", "6453,0", "6553,5", "6684,0", 
"6818,0", "6955,0", "7094,0", "7234,7", "7329,0", "7422,0"), 
year3 = c(NA, 1930L, 1931L, 1932L, 1933L, 1934L, 1935L, 1936L, 
1937L, 1938L), pop3 = c("", "9981,2", "", "", "", "", "", 
"", "", "")), .Names = c("year1", "pop1", "year2", "pop2", 
"year3", "pop3"), class = "data.frame", row.names = c(NA, -10L))

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

library("tidyr")
library("magrittr")

a <- structure(list(ID = 1L, 
                    DateRange1Start = structure(7305, class = "Date"), 
                    DateRange1End = structure(7307, class = "Date"), 
                    Value1 = 4.4, 
                    DateRange2Start = structure(7793, class = "Date"),
                    DateRange2End = structure(7856, class = "Date"), 
                    Value2 = 6.2, 
                    DateRange3Start = structure(9255, class = "Date"), 
                    DateRange3End = structure(9653, class = "Date"), 
                    Value3 = 3.3),
               row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer() (counterpart: pivot_wider()) works similar to gather().
However, it offers additional functionality such as multiple value columns.
With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to.
For multiple value columns, names_to may receive multiple new names.

This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc.
Therefore, I renamed the columns in the first step.

(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#>  [1] "ID"               "DateRangeStart_1" "DateRangeEnd_1"  
#>  [4] "Value_1"          "DateRangeStart_2" "DateRangeEnd_2"  
#>  [7] "Value_2"          "DateRangeStart_3" "DateRangeEnd_3"  
#> [10] "Value_3"

pivot_longer(a, 
             cols = -ID, 
             names_to = c(".value", "group"),
             # names_prefix = "DateRange",
             names_sep = "_")
#> # A tibble: 3 x 5
#>      ID group DateRangeEnd DateRangeStart Value
#>   <int> <chr> <date>       <date>         <dbl>
#> 1     1 1     1990-01-03   1990-01-01       4.4
#> 2     1 2     1991-07-06   1991-05-04       6.2
#> 3     1 3     1996-06-06   1995-05-05       3.3

Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

spec <- a %>%
    build_longer_spec(cols = -ID) %>%
    dplyr::transmute(.name = .name,
                     group = readr::parse_number(name),
                     .value = stringr::str_extract(name, "Start|End|Value"))

pivot_longer(a, spec = spec)

^{Created on 2019-03-26 by the reprex package (v0.2.1)}

See also: https://tidyr.tidyverse.org/articles/pivot.html

Reshaping wide to long with multiple values columns

reshape does this with the appropriate arguments.

varying lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names is the long format equivalents. Between the two, a mapping is created.

From ?reshape:

Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.

Given these varying and v.names arguments, reshape is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5 for this example data, but that is not safe in general.

Given the values of times and v.names, reshape splits the varying columns on a . character (the default sep argument) to create the columns in the output.

times specifies values that are to be used in the created var column, and v.names are pasted onto these values to get column names in the wide format for mapping to the result.

Finally, idvar is specified to be the sbj column, which identifies individual records in the wide format (thanks @thelatemail).

reshape(dw, direction='long', 
        varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'), 
        timevar='var',
        times=c('f1', 'f2'),
        v.names=c('avg', 'sd'),
        idvar='sbj')

##      sbj blabla var avg sd
## A.f1   A     bA  f1  10  6
## B.f1   B     bB  f1  12  5
## C.f1   C     bC  f1  20  7
## D.f1   D     bD  f1  22  8
## A.f2   A     bA  f2  50 10
## B.f2   B     bB  f2  70 11
## C.f2   C     bC  f2  20  8
## D.f2   D     bD  f2  22  9

Data frame from wide to long with multiple variables and ids R

Answer already exists here: https://stackoverflow.com/a/12466668/2371031

e.g.,

set.seed(123)
wide_df = data.frame('participant_id' = LETTERS[1:12]
                     , 'judgment_1' = round(rnorm(12)*100)
                     , 'correct_1' = round(rnorm(12)*100)
                     , 'text_id_1' = sample(1:12, 12, replace = F)
                     , 'judgment_2' = round(rnorm(12)*100)
                     , 'correct_2' = round(rnorm(12)*100)
                     , 'text_id_2' = sample(13:24, 12, replace = F)
)

dl <- reshape(data = wide_df, 
              idvar = "participant_id", 
              varying = list(judgment=c(2,5),correct=c(3,6),text_id=c(4,7)), 
              direction="long", 
              v.names = c("judgment","correct","text_id"),
              sep="_")

Result:

    participant_id time judgment correct text_id
A.1              A    1      -56      40       4
B.1              B    1      -23      11      10
C.1              C    1      156     -56       1
D.1              D    1        7     179      12
E.1              E    1       13      50       7
F.1              F    1      172    -197      11
G.1              G    1       46      70       9
H.1              H    1     -127     -47       2
I.1              I    1      -69    -107       8
J.1              J    1      -45     -22       3
K.1              K    1      122    -103       5
L.1              L    1       36     -73       6
A.2              A    2       43    -127      17
B.2              B    2      -30     217      14
C.2              C    2       90     121      22
D.2              D    2       88    -112      15
E.2              E    2       82     -40      13
F.2              F    2       69     -47      19
G.2              G    2       55      78      24
H.2              H    2       -6      -8      20
I.2              I    2      -31      25      21
J.2              J    2      -38      -3      16
K.2              K    2      -69      -4      23
L.2              L    2      -21     137      18

Reshape data from long to wide with multiple measure columns using spread() or other reshape functions

A tidyr solution below. You need to gather the region into a single column to be able to spread it.

library(tidyr)
data %>% gather(region,val,-age) %>% spread(age,val)  

#   region age 0 age 1 age 10 age 11 age 12 age 2 age 3 age 4 age 5 age 6 age 7 age 8 age 9
# 1     X1     2     2      6      3      3     2     4     7    12    19    22    18    11
# 2     X2     2     2      7      4      3     3     4     8    14    21    24    20    12

Wide to long data transformation multiple columns

Using pivot_longer

tidyr::pivot_longer(df_wide, 
                   cols = -c(Company, Industry), 
                   names_to = c(".value", "Year"),
                   names_sep = "\\.") %>% type.convert()

#  Company  Industry       Year Sales EBITDA
#  <fct>    <fct>         <int> <int>  <int>
#1 CompanyA Manufacturing  2015   100     10
#2 CompanyA Manufacturing  2016   110     11
#3 CompanyA Manufacturing  2017   120     12
#4 CompanyB Telecom        2015   500     50
#5 CompanyB Telecom        2016   550     55
#6 CompanyB Telecom        2017   600     60
#7 CompanyC Services       2015  1000    100
#8 CompanyC Services       2016  1100    110
#9 CompanyC Services       2017  1200    120

Reshaping data to long format with multiple variables as measure.vars

reshape function is good here.

reshape(df, varying=list(c(3,6), c(4,7), c(5,8)), 
            times=c("A","B"), v.names=paste0("Col_",1:3), direction="long")

data

df <- 
structure(list(Person = structure(1:3, .Label = c("Andrew", "John", 
"Mike"), class = "factor"), Age = c(25, 34, 21), ColA_1 = c(1, 
5, 7), ColA_2 = c(5, 0, 9), ColA_3 = c(4, 4, 1), ColB_1 = c(16, 
55, 37), ColB_2 = c(25, 14, 39), ColB_3 = c(43, 64, 31)), .Names = c("Person", 
"Age", "ColA_1", "ColA_2", "ColA_3", "ColB_1", "ColB_2", "ColB_3"
), row.names = c(NA, -3L), class = "data.frame")

from wide format to long format with results in multiple columns

We could use melt from the devel version of data.table which can take multiple patterns for the measure columns. Instructions to install the devel version of 'data.table' is here

We convert the 'data.frame' to 'data.table' (setDT(df)), melt, and specify the regex in the patterns of measure argument. Remove the rows that are NA for the 'names' and 'address' column.

library(data.table)#v1.9.5+
dM <- melt(setDT(df), measure=patterns(c('^name', '^adress')),
          value.name=c('names', 'address') )
dM[!(is.na(names) & is.na(address))]
# id variable  names  address
#1:  1        1   John street a
#2:  2        1   Jack street b
#3:  3        1   Joey       NA
#4:  1        2   Burt street d
#5:  2        2    Ben street e
#6:  3        2    Bob street f
#7:  1        3  chris street 1
#8:  2        3 connor street 2

Or we can use reshape from base R.

 dM2 <- reshape(df, idvar='id', varying=list(grep('name', names(df)), 
             grep('adress', names(df))), direction='long')

The NA rows can be removed as in the data.table solution by using standard 'data.frame' indexing after we create the logical index with is.na.

Reshape a Dataframe to Long Format with Multiple Sets of Measure Columns