Reshape Data from Long to Wide, with Time in New Wide Variable Name

Reshape data from long to wide, with time in new wide variable name

This is trivial with the reshape package:

library(reshape)
cast(tmpdata, ... ~ varname + time)

Reshaping from long to wide format in R, problem with variables re-naming

With pivot_wider(), you can supply a glue specification that uses the names_from columns (and special .value) to create custom column names.

library(tidyr)
library(stringr)

df %>%
pivot_wider(
names_from = time,
names_glue = "{str_replace(.value, '(?=_)', str_c('_r', time))}",
values_from = WSAS_01)

# # A tibble: 2 × 3
# ID WSAS_r1_01 WSAS_r2_01
# <int> <int> <int>
# 1 1 4 3
# 2 2 6 8

In an extending case that values_from contains multiple values, this method also works:

df <- data.frame(
ID = rep(1:2, each = 2),
time = rep(1:2, 2),
WSAS_01 = c(4, 3, 6, 8),
WSAS_02 = c(1, 3, 5, 7)
)

df %>%
pivot_wider(
names_from = time,
names_glue = "{str_replace(.value, '(?=_)', str_c('_r', time))}",
values_from = starts_with("WSAS"))

# # A tibble: 2 × 5
# ID WSAS_r1_01 WSAS_r2_01 WSAS_r1_02 WSAS_r2_02
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 4 3 1 3
# 2 2 6 8 5 7

Long to wide format using variable names

We can usepivor_longer %>% pivot_wider. separateis not needed if we set the appropriate parameters to pivor_longer.

library(tidyr)

dataset %>%
pivot_longer(cols = matches('time\\d+$'), names_to = c('sport', 'time'), names_pattern = '(.*)\\.(.*)') %>%
pivot_wider(names_from = sport, values_from = value)

# A tibble: 15 × 5
id time basketball volleyball vollyeball
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 time1 2 2 NA
2 1 time2 3 3 NA
3 1 time3 1 NA 1
4 2 time1 5 3 NA
5 2 time2 4 4 NA
6 2 time3 8 NA 8
7 3 time1 4 4 NA
8 3 time2 5 3 NA
9 3 time3 4 NA 12
10 4 time1 3 0 NA
11 4 time2 3 1 NA
12 4 time3 3 NA 2
13 5 time1 3 1 NA
14 5 time2 2 3 NA
15 5 time3 1 NA 3

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

Reshaping Data Wide To Long: New variables based on Column Names

You can simply convert to long and split the column you want. A way via tidyverse methods can be,

library(dplyr)
library(tidyr)

df %>%
pivot_longer(everything()) %>%
separate(name, into = c('ModelNumber', 'Emotion', 'Gender'), sep = '_')

Problem when reshaping data from long to wide format in R

Reshaping data with stats::reshape can be tedious. Hadley Wickham and
his team have spent quite some time on creating a comprehensive solution.
First there was the reshape2 package, then tidyr had spread() and gather(),
those are now replaced complemented by pivot_wider() and pivot_longer().

This is how you can use tidyr::pivot_wider() to achieve the result, you seem to
be going for.

library(tidyr)
pivot_wider(
my_df,
id_cols = c(transcript, response),
names_from = hours,
values_from = exp.change,
names_prefix = "exp.change_"
)
#> # A tibble: 6 x 7
#> transcript response exp.change_0 exp.change_2 exp.change_8 exp.change_24
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TR100743-… Primary NA -43.2 -61.3 965.
#> 2 TR100987-… Primary NA -46.3 3.29 -100.
#> 3 TR101301-… Primary NA -29.6 522. 40.5
#> 4 TR102190-… Tertiary NA -18.8 5.49 55.1
#> 5 TR102346-… Primary NA -100. 789697313. 18.6
#> 6 TR102352-… Primary NA -31.3 9.65 28.5
#> # … with 1 more variable: exp.change_48 <dbl>

I think having dedicated commands with dedicated documentation for the two transformations (wide/long) makes the tidyr commands much easier to use, compared to stats::reshape().

EDIT:
stats::reshape() is giving weird results, because it seems to be having issues dealing with my_df being a tibble. Other than that your command was just fine. Just add in a as.data.frame() and you are good to go.

reshape(
as.data.frame(my_df),
idvar = c("transcript", "response"),
timevar = "hours",
v.names = "exp.change",
direction = "wide"
)
#> transcript response exp.change.0 exp.change.2 exp.change.8
#> 1 TR100743-c0_g1_i3 Primary NA -43.19583 -6.130140e+01
#> 6 TR100987-c0_g1_i2 Primary NA -46.25638 3.293969e+00
#> 11 TR101301-c4_g1_i16 Primary NA -29.63413 5.222249e+02
#> 16 TR102190-c1_g1_i1 Tertiary NA -18.76708 5.494728e+00
#> 21 TR102346-c0_g2_i1 Primary NA -99.99996 7.896973e+08
#> 26 TR102352-c4_g2_i5 Primary NA -31.33341 9.647458e+00
#> exp.change.24 exp.change.48
#> 1 964.92512 -5.270607e+01
#> 6 -99.99947 1.067105e+08
#> 11 40.47377 -1.343882e+00
#> 16 55.10727 3.358246e+01
#> 21 18.63375 5.244430e+01
#> 26 28.48553 7.058088e+01

But since it seems that you are already using the tidyverse tidyr::pivot_wider() seems like the best fit.

Reshape data from long to wide format - more than one variable

The dcast() statement given by the OP works almost perfect with the recent versions of the data.table package as these allow for multiple measure variables to be used with dcast() and melt():

library(data.table)   # CRAN version 1.10.4
setDT(world) # coerce to data.table
data_wide <- dcast(world, Country ~ Year,
value.var = c("Growth", "Unemployment", "Population"))

data_wide
# Country Growth_2015 Growth_2016 Growth_2017 Unemployment_2015 Unemployment_2016 Unemployment_2017 Population_2015
#1: A 2.0 4.0 4.5 8.3 8.1 8.1 40
#2: B 3.0 3.5 4.4 9.2 9.0 8.4 32
#3: C 2.5 3.7 4.3 9.1 9.0 8.5 30
#4: D 1.5 3.1 4.2 6.1 5.3 5.2 27
# Population_2016 Population_2017
1: 42.0 42.5
2: 32.5 33.0
3: 31.0 30.0
4: 29.0 30.0

This is the same result as the tidyr solution.


However, the OP has requested a specific column order for his ideal solution where the different measure variables of each year are grouped together.

If the proper order of columns is important, there are two ways to achieve this. The first approach is to reorder the columns appropriately using setcolorder():

new_ord <- CJ(world$Year, c("Growth","Unemployment","Population"), 
sorted = FALSE, unique = TRUE)[, paste(V2, V1, sep = "_")]
setcolorder(data_wide, c("Country", new_ord))

data_wide
# Country Growth_2015 Unemployment_2015 Population_2015 Growth_2016 Unemployment_2016 Population_2016 Growth_2017
#1: A 2.0 8.3 40 4.0 8.1 42.0 4.5
#2: B 3.0 9.2 32 3.5 9.0 32.5 4.4
#3: C 2.5 9.1 30 3.7 9.0 31.0 4.3
#4: D 1.5 6.1 27 3.1 5.3 29.0 4.2
# Unemployment_2017 Population_2017
#1: 8.1 42.5
#2: 8.4 33.0
#3: 8.5 30.0
#4: 5.2 30.0

Note the the cross join function CJ() is used to create the cross product of the vectors.


The other approach to achieve the desired column order is to melt and recast:

molten <- melt(world, id.vars = c("Country", "Year"))
dcast(molten, Country ~ Year + variable)
# Country 2015_Growth 2015_Unemployment 2015_Population 2016_Growth 2016_Unemployment 2016_Population 2017_Growth
#1: A 2.0 8.3 40 4.0 8.1 42.0 4.5
#2: B 3.0 9.2 32 3.5 9.0 32.5 4.4
#3: C 2.5 9.1 30 3.7 9.0 31.0 4.3
#4: D 1.5 6.1 27 3.1 5.3 29.0 4.2
# 2017_Unemployment 2017_Population
#1: 8.1 42.5
#2: 8.4 33.0
#3: 8.5 30.0
#4: 5.2 30.0

Reshape wide data to long when variables have different naming pattern in R

reshape, pivot_longer, and pivot_wider are variations of the same idea. For any of them, you need a column that's unique for the data. So let's say you wanted to change all of the data into a longer format, then add a column with the row numbers and made that your independent column. In pivots in this answer, I just left the first column as the static field and pivoted everything else.

For the data you provided:

library(tidyverse)

df1 <- pivot_longer(df, cols = r1weight:bmi2010, # inclusively all columns between
names_to = "fields", values_to = "values")
head(df1)
# # A tibble: 6 × 3
# id fields values
# <fct> <chr> <dbl>
# 1 00000001 r1weight 56
# 2 00000001 r2weight 57
# 3 00000001 r3weight 56
# 4 00000001 r4weight 56
# 5 00000001 r5weight 55
# 6 00000001 r1height 151

# frame is now 60 observations with three columns

I also created a data structure with all of the column names you provided. (df4 is a vector of the column names you provided in your question.)

df5 <- matrix(ncol = length(df4), nrow = 100, dimnames = list(1:100, df4))
colnames(df5)[c(1, 2, 319)]
# [1] "hhid" "rahhidpn.x" "hhidpn"

df5 <- as.data.frame(df5)

df6 <- pivot_longer(df5, cols = rahhidpn.x:hhidpn, # inclusively all columns between
names_to = "fields", values_to = "values")

nrow(df6)
# [1] 31800

ncol(df6)
# [1] 3

Reshape from Long to Wide Format by Multiple Factors

base R:

One way can be:

reshape(cbind(dat1[1:2], stack(dat1, 3:4)), timevar = 'timeperiod',
dir = 'wide', idvar = c('name', 'ind'))

name ind values.Q1 values.Q2 values.Q3 values.Q4
1 firstName height 2 9 1 2
5 secondName height 11 15 16 10
9 firstName weight 1 4 2 8
13 secondName weight 2 9 1 2

If using other packages, consider recast function from reshape package:

reshape2::recast(dat1, name+variable~timeperiod, id.var = c('name', 'timeperiod'))
name variable Q1 Q2 Q3 Q4
1 firstName height 2 9 1 2
2 firstName weight 1 4 2 8
3 secondName height 11 15 16 10
4 secondName weight 2 9 1 2


Related Topics



Leave a reply



Submit