Converting a Long-Formated Dataframe to Wide Format Tidyverse

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

converting a long-formated dataframe to wide format tidyverse

Try this. Some elements of the function are not being understood properly. Placing the variables in the right argument allows obtaining the desired output. Here the code:

library(tidyverse)
#Code
Widedata <- dat %>%
  pivot_wider(names_from=year, values_from=grad_rate)

Output:

# A tibble: 51 x 9
   State                Abbr  SY2010_11 SY2011_12 SY2012_13 SY2013_14 SY2014_15 SY2015_16 SY2016_17
   <fct>                <fct>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
 1 Alabama              AL           72        75      80        86.3      89.3      87.1      89.3
 2 Alaska               AK           68        70      71.8      71.1      75.6      76.1      78.2
 3 Arizona              AZ           78        76      75.1      75.7      77.4      79.5      78  
 4 Arkansas             AR           81        84      84.9      86.9      84.9      87        88  
 5 California           CA           76        79      80.4      81        82        83        82.7
 6 Colorado             CO           74        75      76.9      77.3      77.3      78.9      79.1
 7 Connecticut          CT           83        85      85.5      87        87.2      87.4      87.9
 8 Delaware             DE           78        80      80.4      87        85.6      85.5      86.9
 9 District of Columbia DC           59        59      62.3      61.4      68.5      69.2      73.2
10 Florida              FL           71        75      75.6      76.1      77.9      80.7      82.3
# ... with 41 more rows

Problem when reshaping data from long to wide format in R

Reshaping data with stats::reshape can be tedious. Hadley Wickham and
his team have spent quite some time on creating a comprehensive solution.
First there was the reshape2 package, then tidyr had spread() and gather(),
those are now ~~replaced~~ complemented by pivot_wider() and pivot_longer().

This is how you can use tidyr::pivot_wider() to achieve the result, you seem to
be going for.

library(tidyr)
pivot_wider(
  my_df,
  id_cols = c(transcript, response),
  names_from = hours,
  values_from = exp.change,
  names_prefix = "exp.change_"
)
#> # A tibble: 6 x 7
#>   transcript response exp.change_0 exp.change_2 exp.change_8 exp.change_24
#>   <chr>      <chr>           <dbl>        <dbl>        <dbl>         <dbl>
#> 1 TR100743-… Primary            NA        -43.2       -61.3          965. 
#> 2 TR100987-… Primary            NA        -46.3         3.29        -100. 
#> 3 TR101301-… Primary            NA        -29.6       522.            40.5
#> 4 TR102190-… Tertiary           NA        -18.8         5.49          55.1
#> 5 TR102346-… Primary            NA       -100.  789697313.            18.6
#> 6 TR102352-… Primary            NA        -31.3         9.65          28.5
#> # … with 1 more variable: exp.change_48 <dbl>

I think having dedicated commands with dedicated documentation for the two transformations (wide/long) makes the tidyr commands much easier to use, compared to stats::reshape().

EDIT:
stats::reshape() is giving weird results, because it seems to be having issues dealing with my_df being a tibble. Other than that your command was just fine. Just add in a as.data.frame() and you are good to go.

reshape(
  as.data.frame(my_df),
  idvar = c("transcript", "response"),
  timevar   = "hours",
  v.names = "exp.change",
  direction = "wide"
)
#>            transcript response exp.change.0 exp.change.2  exp.change.8
#> 1   TR100743-c0_g1_i3  Primary           NA    -43.19583 -6.130140e+01
#> 6   TR100987-c0_g1_i2  Primary           NA    -46.25638  3.293969e+00
#> 11 TR101301-c4_g1_i16  Primary           NA    -29.63413  5.222249e+02
#> 16  TR102190-c1_g1_i1 Tertiary           NA    -18.76708  5.494728e+00
#> 21  TR102346-c0_g2_i1  Primary           NA    -99.99996  7.896973e+08
#> 26  TR102352-c4_g2_i5  Primary           NA    -31.33341  9.647458e+00
#>    exp.change.24 exp.change.48
#> 1      964.92512 -5.270607e+01
#> 6      -99.99947  1.067105e+08
#> 11      40.47377 -1.343882e+00
#> 16      55.10727  3.358246e+01
#> 21      18.63375  5.244430e+01
#> 26      28.48553  7.058088e+01

But since it seems that you are already using the tidyverse tidyr::pivot_wider() seems like the best fit.

How to transform long to wide reshape with tidyverse

You can do:

df %>%
 gather(var, val, -c(respid, member_id)) %>%
 mutate(var = paste(var, member_id, sep = "_")) %>%
 select(-member_id) %>%
 spread(var, val)

  respid dob_1 dob_2 dob_3 edu_1 edu_2 edu_3 gender_1 gender_2 gender_3
   <int> <int> <int> <int> <int> <int> <int>    <int>    <int>    <int>
1    100  1978  1980    NA     3     3    NA        1        1       NA
2    200  1974  1955    NA     4     5    NA        1        2       NA
3    300  1998  1999  2001     3     4     3        2        1        2

First, it is transforming the data from wide to long format. Second, it creates the new variable names. Finally, it returns it back to wide format.

Or using reshape2:

dcast(melt(df, id.vars = c("respid", "member_id")), respid~variable+member_id, value.var = "value")

  respid gender_1 gender_2 gender_3 edu_1 edu_2 edu_3 dob_1 dob_2 dob_3
1    100        1        1       NA     3     3    NA  1978  1980    NA
2    200        1        2       NA     4     5    NA  1974  1955    NA
3    300        2        1        2     3     4     3  1998  1999  2001

long to wide format aggregate R tidyverse

Not really sure how you get the 3 count for GENEa and READSb, but assuming you want the count, you can try the following:


library(tidyverse)

df <- tibble(
  READS = rep(c("READa", "READb", "READc"), each = 3), 
  GENE = rep(c("GENEa", "GENEb", "GENEc"), each = 3), 
  COMMENT = rep(c("CommentA", "CommentA", "CommentA"), each = 3)
)
df
#> # A tibble: 9 x 3
#>   READS GENE  COMMENT 
#>   <chr> <chr> <chr>   
#> 1 READa GENEa CommentA
#> 2 READa GENEa CommentA
#> 3 READa GENEa CommentA
#> 4 READb GENEb CommentA
#> 5 READb GENEb CommentA
#> 6 READb GENEb CommentA
#> 7 READc GENEc CommentA
#> 8 READc GENEc CommentA
#> 9 READc GENEc CommentA

df %>%
  count(READS, GENE) %>%
  pivot_wider(
    names_from = GENE, values_from = n,
    values_fill = list(n = 0)
  )
#> # A tibble: 3 x 4
#>   READS GENEa GENEb GENEc
#>   <chr> <int> <int> <int>
#> 1 READa     3     0     0
#> 2 READb     0     3     0
#> 3 READc     0     0     3

^{Created on 2019-12-13 by the reprex package (v0.3.0)}

Wide format dataframe to long format dataframe using R

That error is based on duplicates. We need a unique sequence id

library(dplyr)
library(tidyr)
library(data.table)
df %>% 
     mutate(rn = rowid(Y)) %>% 
     spread(Y, Z) %>%
     select(-rn)

-output

 A    B   C    D
1  ABC  A12 A45  X66
2  BCD  B12 B45  Y66
3  CDE  C12 C45  Z66
4  DEF <NA> D45 <NA>
5  EFG <NA> E45 <NA>
6  FGH <NA> F45 <NA>
7 <NA> <NA> G45 <NA>
8 <NA> <NA> H45 <NA>

rowid is from data.table which is a compact way to create a sequence id. If we want to use dplyr, then use row_number() after group_by. Also, spread is deprecated in favor of pivot_wider

df %>%
    group_by(Y) %>%
    mutate(rn = row_number()) %>%
    ungroup %>%
    pivot_wider(names_from = Y, values_from = Z) %>%
    select(-rn)

-ouput

# A tibble: 8 x 4
  A     B     C     D    
  <chr> <chr> <chr> <chr>
1 ABC   A12   A45   X66  
2 BCD   B12   B45   Y66  
3 CDE   C12   C45   Z66  
4 DEF   <NA>  D45   <NA> 
5 EFG   <NA>  E45   <NA> 
6 FGH   <NA>  F45   <NA> 
7 <NA>  <NA>  G45   <NA> 
8 <NA>  <NA>  H45   <NA>

Convert data from long format to wide format with multiple measure columns

In order to handle multiple variables like you want, you need to melt the data you have before casting it.

library("reshape2")

dcast(melt(my.df, id.vars=c("ID", "TIME")), ID~variable+TIME)

which gives

  ID X_1 X_2 X_3 X_4 X_5 Y_1 Y_2 Y_3 Y_4 Y_5
1  A   1   4   7  10  13  16  19  22  25  28
2  B   2   5   8  11  14  17  20  23  26  29
3  C   3   6   9  12  15  18  21  24  27  30

EDIT based on comment:

The data frame

num.id = 10 
num.time=10 
my.df <- data.frame(ID=rep(LETTERS[1:num.id], num.time), 
                    TIME=rep(1:num.time, each=num.id), 
                    X=1:(num.id*num.time), 
                    Y=(num.id*num.time)+1:(2*length(1:(num.id*num.time))))

gives a different result (all entries are 2) because the ID/TIME combination does not indicate a unique row. In fact, there are two rows with each ID/TIME combinations. reshape2 assumes a single value for each possible combination of the variables and will apply a summary function to create a single variable is there are multiple entries. That is why there is the warning

Aggregation function missing: defaulting to length

You can get something that works if you add another variable which breaks that redundancy.

my.df$cycle <- rep(1:2, each=num.id*num.time)
dcast(melt(my.df, id.vars=c("cycle", "ID", "TIME")), cycle+ID~variable+TIME)

This works because cycle/ID/time now uniquely defines a row in my.df.

Long to wide format using variable names

We can usepivor_longer %>% pivot_wider. separateis not needed if we set the appropriate parameters to pivor_longer.

library(tidyr)

dataset %>%
        pivot_longer(cols = matches('time\\d+$'), names_to = c('sport', 'time'), names_pattern = '(.*)\\.(.*)') %>%
        pivot_wider(names_from = sport, values_from = value)

# A tibble: 15 × 5
      id time  basketball volleyball vollyeball
   <dbl> <chr>      <dbl>      <dbl>      <dbl>
 1     1 time1          2          2         NA
 2     1 time2          3          3         NA
 3     1 time3          1         NA          1
 4     2 time1          5          3         NA
 5     2 time2          4          4         NA
 6     2 time3          8         NA          8
 7     3 time1          4          4         NA
 8     3 time2          5          3         NA
 9     3 time3          4         NA         12
10     4 time1          3          0         NA
11     4 time2          3          1         NA
12     4 time3          3         NA          2
13     5 time1          3          1         NA
14     5 time2          2          3         NA
15     5 time3          1         NA          3