Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer()
of the tidyr package since version 1.0.0.
This is superior to the previous tidyr strategy of gather()
than spread()
(see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).
library("tidyr")
library("magrittr")
a <- structure(list(ID = 1L,
DateRange1Start = structure(7305, class = "Date"),
DateRange1End = structure(7307, class = "Date"),
Value1 = 4.4,
DateRange2Start = structure(7793, class = "Date"),
DateRange2End = structure(7856, class = "Date"),
Value2 = 6.2,
DateRange3Start = structure(9255, class = "Date"),
DateRange3End = structure(9653, class = "Date"),
Value3 = 3.3),
row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
pivot_longer()
(counterpart: pivot_wider()
) works similar to gather()
.
However, it offers additional functionality such as multiple value columns.
With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to
.
For multiple value columns, names_to
may receive multiple new names.
This is easiest if all column names follow a specific pattern like Start_1
, End_1
, Start_2
, etc.
Therefore, I renamed the columns in the first step.
(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#> [1] "ID" "DateRangeStart_1" "DateRangeEnd_1"
#> [4] "Value_1" "DateRangeStart_2" "DateRangeEnd_2"
#> [7] "Value_2" "DateRangeStart_3" "DateRangeEnd_3"
#> [10] "Value_3"
pivot_longer(a,
cols = -ID,
names_to = c(".value", "group"),
# names_prefix = "DateRange",
names_sep = "_")
#> # A tibble: 3 x 5
#> ID group DateRangeEnd DateRangeStart Value
#> <int> <chr> <date> <date> <dbl>
#> 1 1 1 1990-01-03 1990-01-01 4.4
#> 2 1 2 1991-07-06 1991-05-04 6.2
#> 3 1 3 1996-06-06 1995-05-05 3.3
Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):
spec <- a %>%
build_longer_spec(cols = -ID) %>%
dplyr::transmute(.name = .name,
group = readr::parse_number(name),
.value = stringr::str_extract(name, "Start|End|Value"))
pivot_longer(a, spec = spec)
Created on 2019-03-26 by the reprex package (v0.2.1)
See also: https://tidyr.tidyverse.org/articles/pivot.html
Reshaping wide to long with multiple values columns
reshape
does this with the appropriate arguments.
varying
lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names
is the long format equivalents. Between the two, a mapping is created.
From ?reshape
:
Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.
Given these varying
and v.names
arguments, reshape
is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5
for this example data, but that is not safe in general.
Given the values of times
and v.names
, reshape
splits the varying
columns on a .
character (the default sep
argument) to create the columns in the output.
times
specifies values that are to be used in the created var
column, and v.names
are pasted onto these values to get column names in the wide format for mapping to the result.
Finally, idvar
is specified to be the sbj
column, which identifies individual records in the wide format (thanks @thelatemail).
reshape(dw, direction='long',
varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),
timevar='var',
times=c('f1', 'f2'),
v.names=c('avg', 'sd'),
idvar='sbj')
## sbj blabla var avg sd
## A.f1 A bA f1 10 6
## B.f1 B bB f1 12 5
## C.f1 C bC f1 20 7
## D.f1 D bD f1 22 8
## A.f2 A bA f2 50 10
## B.f2 B bB f2 70 11
## C.f2 C bC f2 20 8
## D.f2 D bD f2 22 9
Reshaping from long to wide with multiple columns
pivot_wider
may be easier
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(time = str_c('t', time)) %>%
pivot_wider(names_from = time, values_from = c(age, height))
-output
# A tibble: 2 × 5
PIN age_t1 age_t2 height_t1 height_t2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 84 86 58 58
2 1002 22 24 60 62
With reshape
from base R
, it may need a sequence column
out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
FUN = seq_along)), idvar = "PIN",
direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
PIN age_1 height_1 age_2 height_2
1 1001 84 58 86 58
3 1002 22 60 24 62
Wide to long with multiple columns
Use the names_sep
(with regex lookaround - names_sep = "(?<=\\D)(?=\\d)"
) or capture as groups in names_pattern
(names_pattern = "(\\D+)(\\d+)"
- here we are capturing one or more non-digits (\\D+
) as a group ((...)
) followed by one or more digits (\\d+
) separately that corresponds to the vector passed in names_to
i.e. ".value" will the value of the columns that for 'arrest', 'lien' and the "grp" will create the new column with the suffix digits from the column names)
library(tidyr)
pivot_longer(df_wide, cols = -rowid, names_to = c(".value", "grp"),
names_pattern = "(\\D+)(\\d+)")
-output
# A tibble: 50 × 4
rowid grp arrest lien
<int> <chr> <date> <date>
1 9317 1 NA NA
2 9317 2 NA NA
3 9317 3 NA NA
4 9317 4 NA NA
5 9317 5 NA NA
6 9317 6 NA NA
7 9317 7 NA NA
8 9317 8 NA NA
9 9317 9 NA NA
10 9317 10 NA NA
# … with 40 more rows
Wide to Long preserving a group
You could use pivot_longer
to extract the group followed by pivot_wider
to restore the readout
columns:
library(dplyr)
library(tidyr)
dat %>%
pivot_longer(-ID, names_pattern = "(.*)_(.*)", names_to = c("name", "grp")) %>%
pivot_wider()
This returns
# A tibble: 30 x 4
ID grp readout1 readout2
<chr> <chr> <dbl> <dbl>
1 id1 g1 0.732 0.499
2 id1 g2 -0.301 -0.0941
3 id1 g3 0.198 1.62
4 id2 g1 0.653 0.732
5 id2 g2 -0.325 1.49
6 id2 g3 -0.0475 -1.73
7 id3 g1 -1.59 1.25
8 id3 g2 -1.59 0.688
9 id3 g3 -0.738 0.341
10 id4 g1 1.12 1.11
# ... with 20 more rows
Wide to long data transformation multiple columns
Using pivot_longer
tidyr::pivot_longer(df_wide,
cols = -c(Company, Industry),
names_to = c(".value", "Year"),
names_sep = "\\.") %>% type.convert()
# Company Industry Year Sales EBITDA
# <fct> <fct> <int> <int> <int>
#1 CompanyA Manufacturing 2015 100 10
#2 CompanyA Manufacturing 2016 110 11
#3 CompanyA Manufacturing 2017 120 12
#4 CompanyB Telecom 2015 500 50
#5 CompanyB Telecom 2016 550 55
#6 CompanyB Telecom 2017 600 60
#7 CompanyC Services 2015 1000 100
#8 CompanyC Services 2016 1100 110
#9 CompanyC Services 2017 1200 120
Related Topics
How to Give Subtitles for Subplot in Plot_Ly Using R
How to Change Y Axis Limits in Decimal Points in R
Converting Year and Month ("Yyyy-Mm" Format) to a Date
How to Get Summary Statistics by Group
How to Assign Colors to Categorical Variables in Ggplot2 That Have Stable Mapping
Cbind a Dataframe With an Empty Dataframe - Cbind.Fill
Filtering a Data Frame by Values in a Column
Subtract Value from Previous Row by Group
Split Character Column into Several Binary (0/1) Columns
Easier Way to Use Grepl and Ifelse Across Multiple Columns
Change Rows into Columns in R With Values Yes/No (1/0)
Filter Data.Frame Rows by a Logical Condition
How to Replace Na Values With Zeros in an R Dataframe
How to Remove All Duplicates So That None Are Left in a Data Frame