Reshape Wide Format, to Multi-Column Long Format

Reshaping from long to wide with multiple columns

pivot_wider may be easier

library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(time = str_c('t', time)) %>%
pivot_wider(names_from = time, values_from = c(age, height))

-output

# A tibble: 2 × 5
PIN age_t1 age_t2 height_t1 height_t2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 84 86 58 58
2 1002 22 24 60 62

With reshape from base R, it may need a sequence column

out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
FUN = seq_along)), idvar = "PIN",
direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
PIN age_1 height_1 age_2 height_2
1 1001 84 58 86 58
3 1002 22 60 24 62

Reshape wide format, to multi-column long format

Using reshape2:

# Thanks to Ista for helping with direct naming using "variable.name"
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
df.m <- transform(df.m, Test = paste0("Test", Test))
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")

Update: Using data.table melt/cast from versions >= 1.9.0:

data.table from versions 1.9.0 imports reshape2 package and implements fast melt and dcast methods in C for data.tables. A comparison of speed on bigger data is shown below.

For more info regarding NEWS, go here.

require(data.table) ## ver. >=1.9.0
require(reshape2)

dt <- as.data.table(df, key=c("ID", "Test", "Year"))
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
dt.m[, Test := paste0("Test", Test)]
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")

At the moment, you'll have to write dcast.data.table explicitly as it's not a S3 generic in reshape2 yet.


Benchmarking on bigger data:

# generate data:
set.seed(45L)
DT <- data.table(ID = sample(1e2, 1e7, TRUE),
Test = sample(1e3, 1e7, TRUE),
Year = sample(2008:2014, 1e7,TRUE),
Fall = sample(50, 1e7, TRUE),
Spring = sample(50, 1e7,TRUE),
Winter = sample(50, 1e7, TRUE))
DF <- as.data.frame(DT)

reshape2 timings:

reshape2_melt <- function(df) {
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(df.m <- reshape2_melt(DF))
# user system elapsed
# 43.319 4.909 48.932

df.m <- transform(df.m, Test = paste0("Test", Test))

reshape2_cast <- function(df) {
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(reshape2_cast(df.m))
# user system elapsed
# 57.728 9.712 69.573

data.table timings:

DT_melt <- function(dt) {
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(dt.m <- reshape2_melt(DT))
# user system elapsed
# 0.276 0.001 0.279

dt.m[, Test := paste0("Test", Test)]

DT_cast <- function(dt) {
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(DT_cast(dt.m))
# user system elapsed
# 12.732 0.825 14.006

melt.data.table is ~175x faster than reshape2:::melt and dcast.data.table is ~5x than reshape2:::dcast.

Reshaping wide to long with multiple values columns

reshape does this with the appropriate arguments.

varying lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names is the long format equivalents. Between the two, a mapping is created.

From ?reshape:

Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.

Given these varying and v.names arguments, reshape is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5 for this example data, but that is not safe in general.

Given the values of times and v.names, reshape splits the varying columns on a . character (the default sep argument) to create the columns in the output.

times specifies values that are to be used in the created var column, and v.names are pasted onto these values to get column names in the wide format for mapping to the result.

Finally, idvar is specified to be the sbj column, which identifies individual records in the wide format (thanks @thelatemail).

reshape(dw, direction='long', 
varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),
timevar='var',
times=c('f1', 'f2'),
v.names=c('avg', 'sd'),
idvar='sbj')

## sbj blabla var avg sd
## A.f1 A bA f1 10 6
## B.f1 B bB f1 12 5
## C.f1 C bC f1 20 7
## D.f1 D bD f1 22 8
## A.f2 A bA f2 50 10
## B.f2 B bB f2 70 11
## C.f2 C bC f2 20 8
## D.f2 D bD f2 22 9

from wide format to long format with results in multiple columns

We could use melt from the devel version of data.table which can take multiple patterns for the measure columns. Instructions to install the devel version of 'data.table' is here

We convert the 'data.frame' to 'data.table' (setDT(df)), melt, and specify the regex in the patterns of measure argument. Remove the rows that are NA for the 'names' and 'address' column.

library(data.table)#v1.9.5+
dM <- melt(setDT(df), measure=patterns(c('^name', '^adress')),
value.name=c('names', 'address') )
dM[!(is.na(names) & is.na(address))]
# id variable names address
#1: 1 1 John street a
#2: 2 1 Jack street b
#3: 3 1 Joey NA
#4: 1 2 Burt street d
#5: 2 2 Ben street e
#6: 3 2 Bob street f
#7: 1 3 chris street 1
#8: 2 3 connor street 2

Or we can use reshape from base R.

 dM2 <- reshape(df, idvar='id', varying=list(grep('name', names(df)), 
grep('adress', names(df))), direction='long')

The NA rows can be removed as in the data.table solution by using standard 'data.frame' indexing after we create the logical index with is.na.

How to reshape data to long format by combining multiple columns and repeating rows?

You could achieve your desired result by first renaming your columns in a consistent manner and two pivot_longers like so:

set.seed(123)

library(tidyr)
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)

dat_long <- dat |>
rename_with(.fn = ~ paste0(.x, "_value"), matches("^prac\\d+$")) |>
rename_with(.fn = ~ gsub("_vs", "", .x)) |>
pivot_longer(-subject, names_to = c("prac", ".value"), names_sep = "_") |>
pivot_longer(c(crit1, crit2), names_to = "crit", values_to = "difference") |>
unite(comparison, prac, crit, sep = "_vs_") |>
rename(prac_value = value)

head(dat_long)
#> # A tibble: 6 × 4
#> subject comparison prac_value difference
#> <chr> <chr> <dbl> <dbl>
#> 1 subject 1 prac1_vs_crit1 0.233 0.011
#> 2 subject 1 prac1_vs_crit2 0.233 0.006
#> 3 subject 1 prac2_vs_crit1 0.229 0.015
#> 4 subject 1 prac2_vs_crit2 0.229 0.002
#> 5 subject 1 prac3_vs_crit1 0.25 0.021
#> 6 subject 1 prac3_vs_crit2 0.25 0.011

count(dat_long, subject)
#> # A tibble: 10 × 2
#> subject n
#> <chr> <int>
#> 1 subject 1 24
#> 2 subject 10 24
#> 3 subject 2 24
#> 4 subject 3 24
#> 5 subject 4 24
#> 6 subject 5 24
#> 7 subject 6 24
#> 8 subject 7 24
#> 9 subject 8 24
#> 10 subject 9 24

R: Reshaping Multiple Columns from Long to Wide

An option would be to replace the duplicated elements by 'Letter' to NA and then in the reshaped data, remove the columns that are all NA

library(data.table)
out <- dcast(setDT(sample_df)[, lapply(.SD, function(x)
replace(x, duplicated(x), NA)), Letter], Letter ~ rowid(Letter),
value.var = c("Number", "Fruit"))
nm1 <- out[, names(which(!colSums(!is.na(.SD))))]
out[, (nm1) := NULL][]
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
#1: a 1 2 Apple Plum Peach
#2: b 3 4 Pear Peach <NA>

If we want to use the tidyverse approach, a similar option can be used. Note that pivot_wider is from the dev version of tidyr (tidyr_0.8.3.9000)

library(tidyverse)
sample_df %>%
group_by(Letter) %>%
mutate_at(vars(-group_cols()), ~ replace(., duplicated(.), NA)) %>%
mutate(rn = row_number()) %>%
pivot_wider(
names_from = rn,
values_from = c("Number", "Fruit")) %>%
select_if(~ any(!is.na(.)))
# A tibble: 2 x 6
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
# <fct> <dbl> <dbl> <fct> <fct> <fct>
#1 a 1 2 Apple Plum Peach
#2 b 3 4 Pear Peach <NA>


Related Topics



Leave a reply



Submit