Reshaping Multiple Sets of Measurement Columns (Wide Format) into Single Columns (Long Format)

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

library("tidyr")
library("magrittr")

a <- structure(list(ID = 1L,
DateRange1Start = structure(7305, class = "Date"),
DateRange1End = structure(7307, class = "Date"),
Value1 = 4.4,
DateRange2Start = structure(7793, class = "Date"),
DateRange2End = structure(7856, class = "Date"),
Value2 = 6.2,
DateRange3Start = structure(9255, class = "Date"),
DateRange3End = structure(9653, class = "Date"),
Value3 = 3.3),
row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer() (counterpart: pivot_wider()) works similar to gather().
However, it offers additional functionality such as multiple value columns.
With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to.
For multiple value columns, names_to may receive multiple new names.

This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc.
Therefore, I renamed the columns in the first step.

(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#> [1] "ID" "DateRangeStart_1" "DateRangeEnd_1"
#> [4] "Value_1" "DateRangeStart_2" "DateRangeEnd_2"
#> [7] "Value_2" "DateRangeStart_3" "DateRangeEnd_3"
#> [10] "Value_3"

pivot_longer(a,
cols = -ID,
names_to = c(".value", "group"),
# names_prefix = "DateRange",
names_sep = "_")
#> # A tibble: 3 x 5
#> ID group DateRangeEnd DateRangeStart Value
#> <int> <chr> <date> <date> <dbl>
#> 1 1 1 1990-01-03 1990-01-01 4.4
#> 2 1 2 1991-07-06 1991-05-04 6.2
#> 3 1 3 1996-06-06 1995-05-05 3.3

Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

spec <- a %>%
build_longer_spec(cols = -ID) %>%
dplyr::transmute(.name = .name,
group = readr::parse_number(name),
.value = stringr::str_extract(name, "Start|End|Value"))

pivot_longer(a, spec = spec)

Created on 2019-03-26 by the reprex package (v0.2.1)

See also: https://tidyr.tidyverse.org/articles/pivot.html

Reshaping wide to long with multiple values columns

reshape does this with the appropriate arguments.

varying lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names is the long format equivalents. Between the two, a mapping is created.

From ?reshape:

Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.

Given these varying and v.names arguments, reshape is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5 for this example data, but that is not safe in general.

Given the values of times and v.names, reshape splits the varying columns on a . character (the default sep argument) to create the columns in the output.

times specifies values that are to be used in the created var column, and v.names are pasted onto these values to get column names in the wide format for mapping to the result.

Finally, idvar is specified to be the sbj column, which identifies individual records in the wide format (thanks @thelatemail).

reshape(dw, direction='long', 
varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),
timevar='var',
times=c('f1', 'f2'),
v.names=c('avg', 'sd'),
idvar='sbj')

## sbj blabla var avg sd
## A.f1 A bA f1 10 6
## B.f1 B bB f1 12 5
## C.f1 C bC f1 20 7
## D.f1 D bD f1 22 8
## A.f2 A bA f2 50 10
## B.f2 B bB f2 70 11
## C.f2 C bC f2 20 8
## D.f2 D bD f2 22 9

Reshaping from long to wide with multiple columns

pivot_wider may be easier

library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(time = str_c('t', time)) %>%
pivot_wider(names_from = time, values_from = c(age, height))

-output

# A tibble: 2 × 5
PIN age_t1 age_t2 height_t1 height_t2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 84 86 58 58
2 1002 22 24 60 62

With reshape from base R, it may need a sequence column

out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
FUN = seq_along)), idvar = "PIN",
direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
PIN age_1 height_1 age_2 height_2
1 1001 84 58 86 58
3 1002 22 60 24 62

Wide to long with multiple columns

Use the names_sep (with regex lookaround - names_sep = "(?<=\\D)(?=\\d)") or capture as groups in names_pattern (names_pattern = "(\\D+)(\\d+)" - here we are capturing one or more non-digits (\\D+) as a group ((...)) followed by one or more digits (\\d+) separately that corresponds to the vector passed in names_to i.e. ".value" will the value of the columns that for 'arrest', 'lien' and the "grp" will create the new column with the suffix digits from the column names)

library(tidyr)
pivot_longer(df_wide, cols = -rowid, names_to = c(".value", "grp"),
names_pattern = "(\\D+)(\\d+)")

-output

# A tibble: 50 × 4
rowid grp arrest lien
<int> <chr> <date> <date>
1 9317 1 NA NA
2 9317 2 NA NA
3 9317 3 NA NA
4 9317 4 NA NA
5 9317 5 NA NA
6 9317 6 NA NA
7 9317 7 NA NA
8 9317 8 NA NA
9 9317 9 NA NA
10 9317 10 NA NA
# … with 40 more rows

Wide to Long preserving a group

You could use pivot_longer to extract the group followed by pivot_wider to restore the readout columns:

library(dplyr)
library(tidyr)

dat %>%
pivot_longer(-ID, names_pattern = "(.*)_(.*)", names_to = c("name", "grp")) %>%
pivot_wider()

This returns

# A tibble: 30 x 4
ID grp readout1 readout2
<chr> <chr> <dbl> <dbl>
1 id1 g1 0.732 0.499
2 id1 g2 -0.301 -0.0941
3 id1 g3 0.198 1.62
4 id2 g1 0.653 0.732
5 id2 g2 -0.325 1.49
6 id2 g3 -0.0475 -1.73
7 id3 g1 -1.59 1.25
8 id3 g2 -1.59 0.688
9 id3 g3 -0.738 0.341
10 id4 g1 1.12 1.11
# ... with 20 more rows

Wide to long data transformation multiple columns

Using pivot_longer

tidyr::pivot_longer(df_wide, 
cols = -c(Company, Industry),
names_to = c(".value", "Year"),
names_sep = "\\.") %>% type.convert()

# Company Industry Year Sales EBITDA
# <fct> <fct> <int> <int> <int>
#1 CompanyA Manufacturing 2015 100 10
#2 CompanyA Manufacturing 2016 110 11
#3 CompanyA Manufacturing 2017 120 12
#4 CompanyB Telecom 2015 500 50
#5 CompanyB Telecom 2016 550 55
#6 CompanyB Telecom 2017 600 60
#7 CompanyC Services 2015 1000 100
#8 CompanyC Services 2016 1100 110
#9 CompanyC Services 2017 1200 120


Related Topics



Leave a reply



Submit