Converting a Long-Formated Dataframe to Wide Format Tidyverse

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

converting a long-formated dataframe to wide format tidyverse

Try this. Some elements of the function are not being understood properly. Placing the variables in the right argument allows obtaining the desired output. Here the code:

library(tidyverse)
#Code
Widedata <- dat %>%
pivot_wider(names_from=year, values_from=grad_rate)

Output:

# A tibble: 51 x 9
State Abbr SY2010_11 SY2011_12 SY2012_13 SY2013_14 SY2014_15 SY2015_16 SY2016_17
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Alabama AL 72 75 80 86.3 89.3 87.1 89.3
2 Alaska AK 68 70 71.8 71.1 75.6 76.1 78.2
3 Arizona AZ 78 76 75.1 75.7 77.4 79.5 78
4 Arkansas AR 81 84 84.9 86.9 84.9 87 88
5 California CA 76 79 80.4 81 82 83 82.7
6 Colorado CO 74 75 76.9 77.3 77.3 78.9 79.1
7 Connecticut CT 83 85 85.5 87 87.2 87.4 87.9
8 Delaware DE 78 80 80.4 87 85.6 85.5 86.9
9 District of Columbia DC 59 59 62.3 61.4 68.5 69.2 73.2
10 Florida FL 71 75 75.6 76.1 77.9 80.7 82.3
# ... with 41 more rows

Problem when reshaping data from long to wide format in R

Reshaping data with stats::reshape can be tedious. Hadley Wickham and
his team have spent quite some time on creating a comprehensive solution.
First there was the reshape2 package, then tidyr had spread() and gather(),
those are now replaced complemented by pivot_wider() and pivot_longer().

This is how you can use tidyr::pivot_wider() to achieve the result, you seem to
be going for.

library(tidyr)
pivot_wider(
my_df,
id_cols = c(transcript, response),
names_from = hours,
values_from = exp.change,
names_prefix = "exp.change_"
)
#> # A tibble: 6 x 7
#> transcript response exp.change_0 exp.change_2 exp.change_8 exp.change_24
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TR100743-… Primary NA -43.2 -61.3 965.
#> 2 TR100987-… Primary NA -46.3 3.29 -100.
#> 3 TR101301-… Primary NA -29.6 522. 40.5
#> 4 TR102190-… Tertiary NA -18.8 5.49 55.1
#> 5 TR102346-… Primary NA -100. 789697313. 18.6
#> 6 TR102352-… Primary NA -31.3 9.65 28.5
#> # … with 1 more variable: exp.change_48 <dbl>

I think having dedicated commands with dedicated documentation for the two transformations (wide/long) makes the tidyr commands much easier to use, compared to stats::reshape().

EDIT:
stats::reshape() is giving weird results, because it seems to be having issues dealing with my_df being a tibble. Other than that your command was just fine. Just add in a as.data.frame() and you are good to go.

reshape(
as.data.frame(my_df),
idvar = c("transcript", "response"),
timevar = "hours",
v.names = "exp.change",
direction = "wide"
)
#> transcript response exp.change.0 exp.change.2 exp.change.8
#> 1 TR100743-c0_g1_i3 Primary NA -43.19583 -6.130140e+01
#> 6 TR100987-c0_g1_i2 Primary NA -46.25638 3.293969e+00
#> 11 TR101301-c4_g1_i16 Primary NA -29.63413 5.222249e+02
#> 16 TR102190-c1_g1_i1 Tertiary NA -18.76708 5.494728e+00
#> 21 TR102346-c0_g2_i1 Primary NA -99.99996 7.896973e+08
#> 26 TR102352-c4_g2_i5 Primary NA -31.33341 9.647458e+00
#> exp.change.24 exp.change.48
#> 1 964.92512 -5.270607e+01
#> 6 -99.99947 1.067105e+08
#> 11 40.47377 -1.343882e+00
#> 16 55.10727 3.358246e+01
#> 21 18.63375 5.244430e+01
#> 26 28.48553 7.058088e+01

But since it seems that you are already using the tidyverse tidyr::pivot_wider() seems like the best fit.

How to transform long to wide reshape with tidyverse

You can do:

df %>%
gather(var, val, -c(respid, member_id)) %>%
mutate(var = paste(var, member_id, sep = "_")) %>%
select(-member_id) %>%
spread(var, val)

respid dob_1 dob_2 dob_3 edu_1 edu_2 edu_3 gender_1 gender_2 gender_3
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 100 1978 1980 NA 3 3 NA 1 1 NA
2 200 1974 1955 NA 4 5 NA 1 2 NA
3 300 1998 1999 2001 3 4 3 2 1 2

First, it is transforming the data from wide to long format. Second, it creates the new variable names. Finally, it returns it back to wide format.

Or using reshape2:

dcast(melt(df, id.vars = c("respid", "member_id")), respid~variable+member_id, value.var = "value")

respid gender_1 gender_2 gender_3 edu_1 edu_2 edu_3 dob_1 dob_2 dob_3
1 100 1 1 NA 3 3 NA 1978 1980 NA
2 200 1 2 NA 4 5 NA 1974 1955 NA
3 300 2 1 2 3 4 3 1998 1999 2001

long to wide format aggregate R tidyverse

Not really sure how you get the 3 count for GENEa and READSb, but assuming you want the count, you can try the following:


library(tidyverse)

df <- tibble(
READS = rep(c("READa", "READb", "READc"), each = 3),
GENE = rep(c("GENEa", "GENEb", "GENEc"), each = 3),
COMMENT = rep(c("CommentA", "CommentA", "CommentA"), each = 3)
)
df
#> # A tibble: 9 x 3
#> READS GENE COMMENT
#> <chr> <chr> <chr>
#> 1 READa GENEa CommentA
#> 2 READa GENEa CommentA
#> 3 READa GENEa CommentA
#> 4 READb GENEb CommentA
#> 5 READb GENEb CommentA
#> 6 READb GENEb CommentA
#> 7 READc GENEc CommentA
#> 8 READc GENEc CommentA
#> 9 READc GENEc CommentA

df %>%
count(READS, GENE) %>%
pivot_wider(
names_from = GENE, values_from = n,
values_fill = list(n = 0)
)
#> # A tibble: 3 x 4
#> READS GENEa GENEb GENEc
#> <chr> <int> <int> <int>
#> 1 READa 3 0 0
#> 2 READb 0 3 0
#> 3 READc 0 0 3

Created on 2019-12-13 by the reprex package (v0.3.0)

Wide format dataframe to long format dataframe using R

That error is based on duplicates. We need a unique sequence id

library(dplyr)
library(tidyr)
library(data.table)
df %>%
mutate(rn = rowid(Y)) %>%
spread(Y, Z) %>%
select(-rn)

-output

 A    B   C    D
1 ABC A12 A45 X66
2 BCD B12 B45 Y66
3 CDE C12 C45 Z66
4 DEF <NA> D45 <NA>
5 EFG <NA> E45 <NA>
6 FGH <NA> F45 <NA>
7 <NA> <NA> G45 <NA>
8 <NA> <NA> H45 <NA>

rowid is from data.table which is a compact way to create a sequence id. If we want to use dplyr, then use row_number() after group_by. Also, spread is deprecated in favor of pivot_wider

df %>%
group_by(Y) %>%
mutate(rn = row_number()) %>%
ungroup %>%
pivot_wider(names_from = Y, values_from = Z) %>%
select(-rn)

-ouput

# A tibble: 8 x 4
A B C D
<chr> <chr> <chr> <chr>
1 ABC A12 A45 X66
2 BCD B12 B45 Y66
3 CDE C12 C45 Z66
4 DEF <NA> D45 <NA>
5 EFG <NA> E45 <NA>
6 FGH <NA> F45 <NA>
7 <NA> <NA> G45 <NA>
8 <NA> <NA> H45 <NA>

Convert data from long format to wide format with multiple measure columns

In order to handle multiple variables like you want, you need to melt the data you have before casting it.

library("reshape2")

dcast(melt(my.df, id.vars=c("ID", "TIME")), ID~variable+TIME)

which gives

  ID X_1 X_2 X_3 X_4 X_5 Y_1 Y_2 Y_3 Y_4 Y_5
1 A 1 4 7 10 13 16 19 22 25 28
2 B 2 5 8 11 14 17 20 23 26 29
3 C 3 6 9 12 15 18 21 24 27 30

EDIT based on comment:

The data frame

num.id = 10 
num.time=10
my.df <- data.frame(ID=rep(LETTERS[1:num.id], num.time),
TIME=rep(1:num.time, each=num.id),
X=1:(num.id*num.time),
Y=(num.id*num.time)+1:(2*length(1:(num.id*num.time))))

gives a different result (all entries are 2) because the ID/TIME combination does not indicate a unique row. In fact, there are two rows with each ID/TIME combinations. reshape2 assumes a single value for each possible combination of the variables and will apply a summary function to create a single variable is there are multiple entries. That is why there is the warning

Aggregation function missing: defaulting to length

You can get something that works if you add another variable which breaks that redundancy.

my.df$cycle <- rep(1:2, each=num.id*num.time)
dcast(melt(my.df, id.vars=c("cycle", "ID", "TIME")), cycle+ID~variable+TIME)

This works because cycle/ID/time now uniquely defines a row in my.df.

Long to wide format using variable names

We can usepivor_longer %>% pivot_wider. separateis not needed if we set the appropriate parameters to pivor_longer.

library(tidyr)

dataset %>%
pivot_longer(cols = matches('time\\d+$'), names_to = c('sport', 'time'), names_pattern = '(.*)\\.(.*)') %>%
pivot_wider(names_from = sport, values_from = value)

# A tibble: 15 × 5
id time basketball volleyball vollyeball
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 time1 2 2 NA
2 1 time2 3 3 NA
3 1 time3 1 NA 1
4 2 time1 5 3 NA
5 2 time2 4 4 NA
6 2 time3 8 NA 8
7 3 time1 4 4 NA
8 3 time2 5 3 NA
9 3 time3 4 NA 12
10 4 time1 3 0 NA
11 4 time2 3 1 NA
12 4 time3 3 NA 2
13 5 time1 3 1 NA
14 5 time2 2 3 NA
15 5 time3 1 NA 3


Related Topics



Leave a reply



Submit