How to reshape data from long to wide format
Using reshape
function:
reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")
Problem when reshaping data from long to wide format in R
Reshaping data with stats::reshape
can be tedious. Hadley Wickham and
his team have spent quite some time on creating a comprehensive solution.
First there was the reshape2
package, then tidyr
had spread()
and gather()
,
those are now replaced complemented by pivot_wider()
and pivot_longer()
.
This is how you can use tidyr::pivot_wider()
to achieve the result, you seem to
be going for.
library(tidyr)
pivot_wider(
my_df,
id_cols = c(transcript, response),
names_from = hours,
values_from = exp.change,
names_prefix = "exp.change_"
)
#> # A tibble: 6 x 7
#> transcript response exp.change_0 exp.change_2 exp.change_8 exp.change_24
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TR100743-… Primary NA -43.2 -61.3 965.
#> 2 TR100987-… Primary NA -46.3 3.29 -100.
#> 3 TR101301-… Primary NA -29.6 522. 40.5
#> 4 TR102190-… Tertiary NA -18.8 5.49 55.1
#> 5 TR102346-… Primary NA -100. 789697313. 18.6
#> 6 TR102352-… Primary NA -31.3 9.65 28.5
#> # … with 1 more variable: exp.change_48 <dbl>
I think having dedicated commands with dedicated documentation for the two transformations (wide/long) makes the tidyr
commands much easier to use, compared to stats::reshape()
.
EDIT:stats::reshape()
is giving weird results, because it seems to be having issues dealing with my_df being a tibble
. Other than that your command was just fine. Just add in a as.data.frame()
and you are good to go.
reshape(
as.data.frame(my_df),
idvar = c("transcript", "response"),
timevar = "hours",
v.names = "exp.change",
direction = "wide"
)
#> transcript response exp.change.0 exp.change.2 exp.change.8
#> 1 TR100743-c0_g1_i3 Primary NA -43.19583 -6.130140e+01
#> 6 TR100987-c0_g1_i2 Primary NA -46.25638 3.293969e+00
#> 11 TR101301-c4_g1_i16 Primary NA -29.63413 5.222249e+02
#> 16 TR102190-c1_g1_i1 Tertiary NA -18.76708 5.494728e+00
#> 21 TR102346-c0_g2_i1 Primary NA -99.99996 7.896973e+08
#> 26 TR102352-c4_g2_i5 Primary NA -31.33341 9.647458e+00
#> exp.change.24 exp.change.48
#> 1 964.92512 -5.270607e+01
#> 6 -99.99947 1.067105e+08
#> 11 40.47377 -1.343882e+00
#> 16 55.10727 3.358246e+01
#> 21 18.63375 5.244430e+01
#> 26 28.48553 7.058088e+01
But since it seems that you are already using the tidyverse tidyr::pivot_wider()
seems like the best fit.
reshape R data frame long to wide
You can use pivot_wider
. Also, I added a more compact form of your toy data set using expand.grid
.
library(tidyr)
df <- data.frame(y=y, expand.grid(t=c(1,2,3), g=c("g1", "g2"), x=c("A","B")))
pivot_wider(df, values_from = y, names_from = c(x,t), names_sep = ".")
g A.1 A.2 A.3 B.1 B.2 B.3
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 g1 -0.626 0.184 -0.836 0.487 0.738 0.576
2 g2 1.60 0.330 -0.820 -0.305 1.51 0.390
Reshaping from long to wide with multiple columns
pivot_wider
may be easier
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(time = str_c('t', time)) %>%
pivot_wider(names_from = time, values_from = c(age, height))
-output
# A tibble: 2 × 5
PIN age_t1 age_t2 height_t1 height_t2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 84 86 58 58
2 1002 22 24 60 62
With reshape
from base R
, it may need a sequence column
out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
FUN = seq_along)), idvar = "PIN",
direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
PIN age_1 height_1 age_2 height_2
1 1001 84 58 86 58
3 1002 22 60 24 62
R Reshape data frame from long to wide format?
A possible solution is this
library(tidyverse)
df = read.table(text = "
year prod value
2015 PRODA test1
2015 PRODA blue
2015 PRODA 50
2015 PRODA 66
2015 PRODA 66
2018 PRODB test2
2018 PRODB yellow
2018 PRODB 70
2018 PRODB 88.8
2018 PRODB 88.8
2018 PRODA test3
2018 PRODA red
2018 PRODA 55
2018 PRODA 88
2018 PRODA 90
", header=T, stringsAsFactors=F)
df %>%
group_by(year, prod) %>% # for each year and prod combination
mutate(id = paste0("new_col_", row_number())) %>% # enumerate rows (this will be used as column names in the reshaped version)
ungroup() %>% # forget the grouping
spread(id, value) # reshape
# # A tibble: 3 x 7
# year prod new_col_1 new_col_2 new_col_3 new_col_4 new_col_5
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 2015 PRODA test1 blue 50 66 66
# 2 2018 PRODA test3 red 55 88 90
# 3 2018 PRODB test2 yellow 70 88.8 88.8
Reshape from Long to Wide Format by Multiple Factors
base R:
One way can be:
reshape(cbind(dat1[1:2], stack(dat1, 3:4)), timevar = 'timeperiod',
dir = 'wide', idvar = c('name', 'ind'))
name ind values.Q1 values.Q2 values.Q3 values.Q4
1 firstName height 2 9 1 2
5 secondName height 11 15 16 10
9 firstName weight 1 4 2 8
13 secondName weight 2 9 1 2
If using other packages, consider recast
function from reshape
package:
reshape2::recast(dat1, name+variable~timeperiod, id.var = c('name', 'timeperiod'))
name variable Q1 Q2 Q3 Q4
1 firstName height 2 9 1 2
2 firstName weight 1 4 2 8
3 secondName height 11 15 16 10
4 secondName weight 2 9 1 2
Reshaping data to wide format in R
Create a row number column for each id
and reshape the data to wide format.
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
mutate(col = row_number()) %>%
ungroup %>%
pivot_wider(names_from = col, values_from = x:stop)
# A tibble: 10 x 41
# id x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10
# <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 A B C D E F G H I J
# 2 2 A B C D E F G H I J
# 3 3 A B C D E F G H I J
# 4 4 A B C D E F G H I J
# 5 5 A B C D E F G H I J
# 6 6 A B C D E F G H I J
# 7 7 A B C D E F G H I J
# 8 8 A B C D E F G H I J
# 9 9 A B C D E F G H I J
#10 10 A B C D E F G H I J
# … with 30 more variables: y_1 <chr>, y_2 <chr>, y_3 <chr>,
# y_4 <chr>, y_5 <chr>, y_6 <chr>, y_7 <chr>, y_8 <chr>, y_9 <chr>,
# y_10 <chr>, start_1 <date>, start_2 <date>, start_3 <date>,
# start_4 <date>, start_5 <date>, start_6 <date>, start_7 <date>,
# start_8 <date>, start_9 <date>, start_10 <date>, stop_1 <date>,
# stop_2 <date>, stop_3 <date>, stop_4 <date>, stop_5 <date>,
# stop_6 <date>, stop_7 <date>, stop_8 <date>, stop_9 <date>,
# stop_10 <date>
Related Topics
Use Dynamic Name For New Column/Variable in 'Dplyr'
Split Delimited Strings in a Column and Insert as New Rows
Why Are My Dplyr Group_By & Summarize Not Working Properly? (Name-Collision With Plyr)
Reshape Three Column Data Frame to Matrix ("Long" to "Wide" Format)
How to Implement Coalesce Efficiently in R
Annotating Text on Individual Facet in Ggplot2
Interpreting "Condition Has Length ≫ 1" Warning from 'If' Function
Reorder Bars in Geom_Bar Ggplot2 by Value
Aggregating by Unique Identifier and Concatenating Related Values into a String
Subset Rows Corresponding to Max Value by Group Using Data.Table
What Are the Differences Between "=" and "≪-" Assignment Operators
Add Count of Unique/Distinct Values by Group to the Original Data
Replace Missing Values (Na) With Most Recent Non-Na by Group
For Each Row Return the Column Name of the Largest Value
Ggplot With 2 Y Axes on Each Side and Different Scales
Create and Assign Multiple New Dataframe Columns in Ifelse Statement