Reshaping from long to wide with multiple columns
pivot_wider
may be easier
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(time = str_c('t', time)) %>%
pivot_wider(names_from = time, values_from = c(age, height))
-output
# A tibble: 2 × 5
PIN age_t1 age_t2 height_t1 height_t2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 84 86 58 58
2 1002 22 24 60 62
With reshape
from base R
, it may need a sequence column
out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
FUN = seq_along)), idvar = "PIN",
direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
PIN age_1 height_1 age_2 height_2
1 1001 84 58 86 58
3 1002 22 60 24 62
Reshape wide format, to multi-column long format
Using reshape2
:
# Thanks to Ista for helping with direct naming using "variable.name"
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
df.m <- transform(df.m, Test = paste0("Test", Test))
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
Update: Using data.table melt/cast from versions >= 1.9.0:
data.table
from versions 1.9.0 imports reshape2
package and implements fast melt
and dcast
methods in C for data.tables. A comparison of speed on bigger data is shown below.
For more info regarding NEWS, go here.
require(data.table) ## ver. >=1.9.0
require(reshape2)
dt <- as.data.table(df, key=c("ID", "Test", "Year"))
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
dt.m[, Test := paste0("Test", Test)]
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
At the moment, you'll have to write dcast.data.table
explicitly as it's not a S3 generic in reshape2
yet.
Benchmarking on bigger data:
# generate data:
set.seed(45L)
DT <- data.table(ID = sample(1e2, 1e7, TRUE),
Test = sample(1e3, 1e7, TRUE),
Year = sample(2008:2014, 1e7,TRUE),
Fall = sample(50, 1e7, TRUE),
Spring = sample(50, 1e7,TRUE),
Winter = sample(50, 1e7, TRUE))
DF <- as.data.frame(DT)
reshape2 timings:
reshape2_melt <- function(df) {
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(df.m <- reshape2_melt(DF))
# user system elapsed
# 43.319 4.909 48.932
df.m <- transform(df.m, Test = paste0("Test", Test))
reshape2_cast <- function(df) {
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(reshape2_cast(df.m))
# user system elapsed
# 57.728 9.712 69.573
data.table timings:
DT_melt <- function(dt) {
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(dt.m <- reshape2_melt(DT))
# user system elapsed
# 0.276 0.001 0.279
dt.m[, Test := paste0("Test", Test)]
DT_cast <- function(dt) {
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(DT_cast(dt.m))
# user system elapsed
# 12.732 0.825 14.006
melt.data.table
is ~175x faster than reshape2:::melt
and dcast.data.table
is ~5x than reshape2:::dcast
.
Reshaping wide to long with multiple values columns
reshape
does this with the appropriate arguments.
varying
lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names
is the long format equivalents. Between the two, a mapping is created.
From ?reshape
:
Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.
Given these varying
and v.names
arguments, reshape
is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5
for this example data, but that is not safe in general.
Given the values of times
and v.names
, reshape
splits the varying
columns on a .
character (the default sep
argument) to create the columns in the output.
times
specifies values that are to be used in the created var
column, and v.names
are pasted onto these values to get column names in the wide format for mapping to the result.
Finally, idvar
is specified to be the sbj
column, which identifies individual records in the wide format (thanks @thelatemail).
reshape(dw, direction='long',
varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),
timevar='var',
times=c('f1', 'f2'),
v.names=c('avg', 'sd'),
idvar='sbj')
## sbj blabla var avg sd
## A.f1 A bA f1 10 6
## B.f1 B bB f1 12 5
## C.f1 C bC f1 20 7
## D.f1 D bD f1 22 8
## A.f2 A bA f2 50 10
## B.f2 B bB f2 70 11
## C.f2 C bC f2 20 8
## D.f2 D bD f2 22 9
from wide format to long format with results in multiple columns
We could use melt
from the devel
version of data.table which can take multiple patterns
for the measure
columns. Instructions to install the devel version of 'data.table' is here
We convert the 'data.frame' to 'data.table' (setDT(df)
), melt
, and specify the regex
in the patterns
of measure
argument. Remove the rows that are NA
for the 'names' and 'address' column.
library(data.table)#v1.9.5+
dM <- melt(setDT(df), measure=patterns(c('^name', '^adress')),
value.name=c('names', 'address') )
dM[!(is.na(names) & is.na(address))]
# id variable names address
#1: 1 1 John street a
#2: 2 1 Jack street b
#3: 3 1 Joey NA
#4: 1 2 Burt street d
#5: 2 2 Ben street e
#6: 3 2 Bob street f
#7: 1 3 chris street 1
#8: 2 3 connor street 2
Or we can use reshape
from base R
.
dM2 <- reshape(df, idvar='id', varying=list(grep('name', names(df)),
grep('adress', names(df))), direction='long')
The NA
rows can be removed as in the data.table
solution by using standard 'data.frame' indexing after we create the logical index with is.na
.
How to reshape data to long format by combining multiple columns and repeating rows?
You could achieve your desired result by first renaming your columns in a consistent manner and two pivot_longer
s like so:
set.seed(123)
library(tidyr)
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
dat_long <- dat |>
rename_with(.fn = ~ paste0(.x, "_value"), matches("^prac\\d+$")) |>
rename_with(.fn = ~ gsub("_vs", "", .x)) |>
pivot_longer(-subject, names_to = c("prac", ".value"), names_sep = "_") |>
pivot_longer(c(crit1, crit2), names_to = "crit", values_to = "difference") |>
unite(comparison, prac, crit, sep = "_vs_") |>
rename(prac_value = value)
head(dat_long)
#> # A tibble: 6 × 4
#> subject comparison prac_value difference
#> <chr> <chr> <dbl> <dbl>
#> 1 subject 1 prac1_vs_crit1 0.233 0.011
#> 2 subject 1 prac1_vs_crit2 0.233 0.006
#> 3 subject 1 prac2_vs_crit1 0.229 0.015
#> 4 subject 1 prac2_vs_crit2 0.229 0.002
#> 5 subject 1 prac3_vs_crit1 0.25 0.021
#> 6 subject 1 prac3_vs_crit2 0.25 0.011
count(dat_long, subject)
#> # A tibble: 10 × 2
#> subject n
#> <chr> <int>
#> 1 subject 1 24
#> 2 subject 10 24
#> 3 subject 2 24
#> 4 subject 3 24
#> 5 subject 4 24
#> 6 subject 5 24
#> 7 subject 6 24
#> 8 subject 7 24
#> 9 subject 8 24
#> 10 subject 9 24
R: Reshaping Multiple Columns from Long to Wide
An option would be to replace the duplicated
elements by 'Letter' to NA
and then in the reshaped data, remove the columns that are all NA
library(data.table)
out <- dcast(setDT(sample_df)[, lapply(.SD, function(x)
replace(x, duplicated(x), NA)), Letter], Letter ~ rowid(Letter),
value.var = c("Number", "Fruit"))
nm1 <- out[, names(which(!colSums(!is.na(.SD))))]
out[, (nm1) := NULL][]
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
#1: a 1 2 Apple Plum Peach
#2: b 3 4 Pear Peach <NA>
If we want to use the tidyverse
approach, a similar option can be used. Note that pivot_wider
is from the dev version of tidyr
(tidyr_0.8.3.9000
)
library(tidyverse)
sample_df %>%
group_by(Letter) %>%
mutate_at(vars(-group_cols()), ~ replace(., duplicated(.), NA)) %>%
mutate(rn = row_number()) %>%
pivot_wider(
names_from = rn,
values_from = c("Number", "Fruit")) %>%
select_if(~ any(!is.na(.)))
# A tibble: 2 x 6
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
# <fct> <dbl> <dbl> <fct> <fct> <fct>
#1 a 1 2 Apple Plum Peach
#2 b 3 4 Pear Peach <NA>
Related Topics
Ggally::Ggpairs Plot Without Gridlines When Plotting Correlation Coefficient
Download Attachment from an Outlook Email Using R
Change the Color of Action Button in Shiny
Label Minimum and Maximum of Scale Fill Gradient Legend with Text: Ggplot2
How to Append a Plot to an Existing PDF File
How to Prevent Exposure of My Password When Using Rgoogledocs
R: Legend with Points and Lines Being Different Colors (For the Same Legend Item)
Remove Strip Background Keep Panel Border
Save All Plots Already Present in the Panel of Rstudio
Any Way to Pause at Specific Frames/Time Points with Transition_Reveal in Gganimate
How to Sum and Combine Two Data Frames
Keep Before and After Date of an External List
R Formatting a Date from a Character Mmm Dd, Yyyy to Class Date