How to reshape data from long to wide format
Using reshape
function:
reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")
Problem when reshaping data from long to wide format in R
Reshaping data with stats::reshape
can be tedious. Hadley Wickham and
his team have spent quite some time on creating a comprehensive solution.
First there was the reshape2
package, then tidyr
had spread()
and gather()
,
those are now replaced complemented by pivot_wider()
and pivot_longer()
.
This is how you can use tidyr::pivot_wider()
to achieve the result, you seem to
be going for.
library(tidyr)
pivot_wider(
my_df,
id_cols = c(transcript, response),
names_from = hours,
values_from = exp.change,
names_prefix = "exp.change_"
)
#> # A tibble: 6 x 7
#> transcript response exp.change_0 exp.change_2 exp.change_8 exp.change_24
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TR100743-… Primary NA -43.2 -61.3 965.
#> 2 TR100987-… Primary NA -46.3 3.29 -100.
#> 3 TR101301-… Primary NA -29.6 522. 40.5
#> 4 TR102190-… Tertiary NA -18.8 5.49 55.1
#> 5 TR102346-… Primary NA -100. 789697313. 18.6
#> 6 TR102352-… Primary NA -31.3 9.65 28.5
#> # … with 1 more variable: exp.change_48 <dbl>
I think having dedicated commands with dedicated documentation for the two transformations (wide/long) makes the tidyr
commands much easier to use, compared to stats::reshape()
.
EDIT:stats::reshape()
is giving weird results, because it seems to be having issues dealing with my_df being a tibble
. Other than that your command was just fine. Just add in a as.data.frame()
and you are good to go.
reshape(
as.data.frame(my_df),
idvar = c("transcript", "response"),
timevar = "hours",
v.names = "exp.change",
direction = "wide"
)
#> transcript response exp.change.0 exp.change.2 exp.change.8
#> 1 TR100743-c0_g1_i3 Primary NA -43.19583 -6.130140e+01
#> 6 TR100987-c0_g1_i2 Primary NA -46.25638 3.293969e+00
#> 11 TR101301-c4_g1_i16 Primary NA -29.63413 5.222249e+02
#> 16 TR102190-c1_g1_i1 Tertiary NA -18.76708 5.494728e+00
#> 21 TR102346-c0_g2_i1 Primary NA -99.99996 7.896973e+08
#> 26 TR102352-c4_g2_i5 Primary NA -31.33341 9.647458e+00
#> exp.change.24 exp.change.48
#> 1 964.92512 -5.270607e+01
#> 6 -99.99947 1.067105e+08
#> 11 40.47377 -1.343882e+00
#> 16 55.10727 3.358246e+01
#> 21 18.63375 5.244430e+01
#> 26 28.48553 7.058088e+01
But since it seems that you are already using the tidyverse tidyr::pivot_wider()
seems like the best fit.
Reshape Data Long to Wide - understanding reshape parameters
You can use the function dcast
from package reshape2
. It's easier to understand. The left side of the formula is the one that stays long, while the right side is the one that goes wide.
The fun.aggregate is the function to apply in case that there is more than 1 number per case. If you're sure you don't have repeated cases, you can use mean
or sum
dcast(data, formula= dogid + home + school ~ month + year + trainingtype,
value.var = 'timeincomp',
fun.aggregate = sum)
I hope it works:
dogid home school 1_2014_1 2_2014_1 12_2015_2
1 12345 1 1 340 360 0
2 31323 7 3 500 520 440
Reshape data from long to wide format - more than one variable
The dcast()
statement given by the OP works almost perfect with the recent versions of the data.table
package as these allow for multiple measure variables to be used with dcast()
and melt()
:
library(data.table) # CRAN version 1.10.4
setDT(world) # coerce to data.table
data_wide <- dcast(world, Country ~ Year,
value.var = c("Growth", "Unemployment", "Population"))
data_wide
# Country Growth_2015 Growth_2016 Growth_2017 Unemployment_2015 Unemployment_2016 Unemployment_2017 Population_2015
#1: A 2.0 4.0 4.5 8.3 8.1 8.1 40
#2: B 3.0 3.5 4.4 9.2 9.0 8.4 32
#3: C 2.5 3.7 4.3 9.1 9.0 8.5 30
#4: D 1.5 3.1 4.2 6.1 5.3 5.2 27
# Population_2016 Population_2017
1: 42.0 42.5
2: 32.5 33.0
3: 31.0 30.0
4: 29.0 30.0
This is the same result as the tidyr
solution.
However, the OP has requested a specific column order for his ideal solution where the different measure variables of each year are grouped together.
If the proper order of columns is important, there are two ways to achieve this. The first approach is to reorder the columns appropriately using setcolorder()
:
new_ord <- CJ(world$Year, c("Growth","Unemployment","Population"),
sorted = FALSE, unique = TRUE)[, paste(V2, V1, sep = "_")]
setcolorder(data_wide, c("Country", new_ord))
data_wide
# Country Growth_2015 Unemployment_2015 Population_2015 Growth_2016 Unemployment_2016 Population_2016 Growth_2017
#1: A 2.0 8.3 40 4.0 8.1 42.0 4.5
#2: B 3.0 9.2 32 3.5 9.0 32.5 4.4
#3: C 2.5 9.1 30 3.7 9.0 31.0 4.3
#4: D 1.5 6.1 27 3.1 5.3 29.0 4.2
# Unemployment_2017 Population_2017
#1: 8.1 42.5
#2: 8.4 33.0
#3: 8.5 30.0
#4: 5.2 30.0
Note the the cross join function CJ()
is used to create the cross product of the vectors.
The other approach to achieve the desired column order is to melt and recast:
molten <- melt(world, id.vars = c("Country", "Year"))
dcast(molten, Country ~ Year + variable)
# Country 2015_Growth 2015_Unemployment 2015_Population 2016_Growth 2016_Unemployment 2016_Population 2017_Growth
#1: A 2.0 8.3 40 4.0 8.1 42.0 4.5
#2: B 3.0 9.2 32 3.5 9.0 32.5 4.4
#3: C 2.5 9.1 30 3.7 9.0 31.0 4.3
#4: D 1.5 6.1 27 3.1 5.3 29.0 4.2
# 2017_Unemployment 2017_Population
#1: 8.1 42.5
#2: 8.4 33.0
#3: 8.5 30.0
#4: 5.2 30.0
Reshape long to wide with two columns to expand in R data.table [R]
You may use dcast
-
library(data.table)
setDT(data_sample)
dcast(data_sample, code~rowid(code), value.var = c('name', 'numberdata'))
# code name_1 name_2 numberdata_1 numberdata_2
#1: 1 bill bob 100 400
#2: 2 rob john 300 -500
#3: 3 max joe -200 -400
#4: 4 mitch bart 300 100
Reshape from Long to Wide Format by Multiple Factors
base R:
One way can be:
reshape(cbind(dat1[1:2], stack(dat1, 3:4)), timevar = 'timeperiod',
dir = 'wide', idvar = c('name', 'ind'))
name ind values.Q1 values.Q2 values.Q3 values.Q4
1 firstName height 2 9 1 2
5 secondName height 11 15 16 10
9 firstName weight 1 4 2 8
13 secondName weight 2 9 1 2
If using other packages, consider recast
function from reshape
package:
reshape2::recast(dat1, name+variable~timeperiod, id.var = c('name', 'timeperiod'))
name variable Q1 Q2 Q3 Q4
1 firstName height 2 9 1 2
2 firstName weight 1 4 2 8
3 secondName height 11 15 16 10
4 secondName weight 2 9 1 2
Related Topics
MAC Os X R Error "Ld: Warning: Directory Not Found for Option"
Adding a Company Logo to Shinydashboard Header
How to Extract Just the Number from a Named Number (Without the Name)
R: How to Run Some Code on Load of Package
Sparse Matrix to a Data Frame in R
Ggplot Replace Count with Percentage in Geom_Bar
Insert a Logo in Upper Right Corner of R Markdown PDF Document
How to Use Map from Purrr with Dplyr::Mutate to Create Multiple New Columns Based on Column Pairs
Controlling Order of Facet_Grid/Facet_Wrap in Ggplot2
R: How to Get the Week Number of the Month
Bigrams Instead of Single Words in Termdocument Matrix Using R and Rweka
How to Access and Edit Rprofile
Analyzing Daily/Weekly Data Using Ts in R
Cut() Error - 'Breaks' Are Not Unique
Change Values in Multiple Columns of a Dataframe Using a Lookup Table