How to reshape data from long to wide format
Using reshape
function:
reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")
converting a long-formated dataframe to wide format tidyverse
Try this. Some elements of the function are not being understood properly. Placing the variables in the right argument allows obtaining the desired output. Here the code:
library(tidyverse)
#Code
Widedata <- dat %>%
pivot_wider(names_from=year, values_from=grad_rate)
Output:
# A tibble: 51 x 9
State Abbr SY2010_11 SY2011_12 SY2012_13 SY2013_14 SY2014_15 SY2015_16 SY2016_17
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Alabama AL 72 75 80 86.3 89.3 87.1 89.3
2 Alaska AK 68 70 71.8 71.1 75.6 76.1 78.2
3 Arizona AZ 78 76 75.1 75.7 77.4 79.5 78
4 Arkansas AR 81 84 84.9 86.9 84.9 87 88
5 California CA 76 79 80.4 81 82 83 82.7
6 Colorado CO 74 75 76.9 77.3 77.3 78.9 79.1
7 Connecticut CT 83 85 85.5 87 87.2 87.4 87.9
8 Delaware DE 78 80 80.4 87 85.6 85.5 86.9
9 District of Columbia DC 59 59 62.3 61.4 68.5 69.2 73.2
10 Florida FL 71 75 75.6 76.1 77.9 80.7 82.3
# ... with 41 more rows
Problem when reshaping data from long to wide format in R
Reshaping data with stats::reshape
can be tedious. Hadley Wickham and
his team have spent quite some time on creating a comprehensive solution.
First there was the reshape2
package, then tidyr
had spread()
and gather()
,
those are now replaced complemented by pivot_wider()
and pivot_longer()
.
This is how you can use tidyr::pivot_wider()
to achieve the result, you seem to
be going for.
library(tidyr)
pivot_wider(
my_df,
id_cols = c(transcript, response),
names_from = hours,
values_from = exp.change,
names_prefix = "exp.change_"
)
#> # A tibble: 6 x 7
#> transcript response exp.change_0 exp.change_2 exp.change_8 exp.change_24
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TR100743-… Primary NA -43.2 -61.3 965.
#> 2 TR100987-… Primary NA -46.3 3.29 -100.
#> 3 TR101301-… Primary NA -29.6 522. 40.5
#> 4 TR102190-… Tertiary NA -18.8 5.49 55.1
#> 5 TR102346-… Primary NA -100. 789697313. 18.6
#> 6 TR102352-… Primary NA -31.3 9.65 28.5
#> # … with 1 more variable: exp.change_48 <dbl>
I think having dedicated commands with dedicated documentation for the two transformations (wide/long) makes the tidyr
commands much easier to use, compared to stats::reshape()
.
EDIT:stats::reshape()
is giving weird results, because it seems to be having issues dealing with my_df being a tibble
. Other than that your command was just fine. Just add in a as.data.frame()
and you are good to go.
reshape(
as.data.frame(my_df),
idvar = c("transcript", "response"),
timevar = "hours",
v.names = "exp.change",
direction = "wide"
)
#> transcript response exp.change.0 exp.change.2 exp.change.8
#> 1 TR100743-c0_g1_i3 Primary NA -43.19583 -6.130140e+01
#> 6 TR100987-c0_g1_i2 Primary NA -46.25638 3.293969e+00
#> 11 TR101301-c4_g1_i16 Primary NA -29.63413 5.222249e+02
#> 16 TR102190-c1_g1_i1 Tertiary NA -18.76708 5.494728e+00
#> 21 TR102346-c0_g2_i1 Primary NA -99.99996 7.896973e+08
#> 26 TR102352-c4_g2_i5 Primary NA -31.33341 9.647458e+00
#> exp.change.24 exp.change.48
#> 1 964.92512 -5.270607e+01
#> 6 -99.99947 1.067105e+08
#> 11 40.47377 -1.343882e+00
#> 16 55.10727 3.358246e+01
#> 21 18.63375 5.244430e+01
#> 26 28.48553 7.058088e+01
But since it seems that you are already using the tidyverse tidyr::pivot_wider()
seems like the best fit.
How to transform long to wide reshape with tidyverse
You can do:
df %>%
gather(var, val, -c(respid, member_id)) %>%
mutate(var = paste(var, member_id, sep = "_")) %>%
select(-member_id) %>%
spread(var, val)
respid dob_1 dob_2 dob_3 edu_1 edu_2 edu_3 gender_1 gender_2 gender_3
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 100 1978 1980 NA 3 3 NA 1 1 NA
2 200 1974 1955 NA 4 5 NA 1 2 NA
3 300 1998 1999 2001 3 4 3 2 1 2
First, it is transforming the data from wide to long format. Second, it creates the new variable names. Finally, it returns it back to wide format.
Or using reshape2
:
dcast(melt(df, id.vars = c("respid", "member_id")), respid~variable+member_id, value.var = "value")
respid gender_1 gender_2 gender_3 edu_1 edu_2 edu_3 dob_1 dob_2 dob_3
1 100 1 1 NA 3 3 NA 1978 1980 NA
2 200 1 2 NA 4 5 NA 1974 1955 NA
3 300 2 1 2 3 4 3 1998 1999 2001
long to wide format aggregate R tidyverse
Not really sure how you get the 3 count for GENEa
and READSb
, but assuming you want the count, you can try the following:
library(tidyverse)
df <- tibble(
READS = rep(c("READa", "READb", "READc"), each = 3),
GENE = rep(c("GENEa", "GENEb", "GENEc"), each = 3),
COMMENT = rep(c("CommentA", "CommentA", "CommentA"), each = 3)
)
df
#> # A tibble: 9 x 3
#> READS GENE COMMENT
#> <chr> <chr> <chr>
#> 1 READa GENEa CommentA
#> 2 READa GENEa CommentA
#> 3 READa GENEa CommentA
#> 4 READb GENEb CommentA
#> 5 READb GENEb CommentA
#> 6 READb GENEb CommentA
#> 7 READc GENEc CommentA
#> 8 READc GENEc CommentA
#> 9 READc GENEc CommentA
df %>%
count(READS, GENE) %>%
pivot_wider(
names_from = GENE, values_from = n,
values_fill = list(n = 0)
)
#> # A tibble: 3 x 4
#> READS GENEa GENEb GENEc
#> <chr> <int> <int> <int>
#> 1 READa 3 0 0
#> 2 READb 0 3 0
#> 3 READc 0 0 3
Created on 2019-12-13 by the reprex package (v0.3.0)
Wide format dataframe to long format dataframe using R
That error is based on duplicates. We need a unique sequence id
library(dplyr)
library(tidyr)
library(data.table)
df %>%
mutate(rn = rowid(Y)) %>%
spread(Y, Z) %>%
select(-rn)
-output
A B C D
1 ABC A12 A45 X66
2 BCD B12 B45 Y66
3 CDE C12 C45 Z66
4 DEF <NA> D45 <NA>
5 EFG <NA> E45 <NA>
6 FGH <NA> F45 <NA>
7 <NA> <NA> G45 <NA>
8 <NA> <NA> H45 <NA>
rowid
is from data.table
which is a compact way to create a sequence id. If we want to use dplyr
, then use row_number()
after group_by
. Also, spread
is deprecated in favor of pivot_wider
df %>%
group_by(Y) %>%
mutate(rn = row_number()) %>%
ungroup %>%
pivot_wider(names_from = Y, values_from = Z) %>%
select(-rn)
-ouput
# A tibble: 8 x 4
A B C D
<chr> <chr> <chr> <chr>
1 ABC A12 A45 X66
2 BCD B12 B45 Y66
3 CDE C12 C45 Z66
4 DEF <NA> D45 <NA>
5 EFG <NA> E45 <NA>
6 FGH <NA> F45 <NA>
7 <NA> <NA> G45 <NA>
8 <NA> <NA> H45 <NA>
Convert data from long format to wide format with multiple measure columns
In order to handle multiple variables like you want, you need to melt
the data you have before casting it.
library("reshape2")
dcast(melt(my.df, id.vars=c("ID", "TIME")), ID~variable+TIME)
which gives
ID X_1 X_2 X_3 X_4 X_5 Y_1 Y_2 Y_3 Y_4 Y_5
1 A 1 4 7 10 13 16 19 22 25 28
2 B 2 5 8 11 14 17 20 23 26 29
3 C 3 6 9 12 15 18 21 24 27 30
EDIT based on comment:
The data frame
num.id = 10
num.time=10
my.df <- data.frame(ID=rep(LETTERS[1:num.id], num.time),
TIME=rep(1:num.time, each=num.id),
X=1:(num.id*num.time),
Y=(num.id*num.time)+1:(2*length(1:(num.id*num.time))))
gives a different result (all entries are 2) because the ID
/TIME
combination does not indicate a unique row. In fact, there are two rows with each ID
/TIME
combinations. reshape2
assumes a single value for each possible combination of the variables and will apply a summary function to create a single variable is there are multiple entries. That is why there is the warning
Aggregation function missing: defaulting to length
You can get something that works if you add another variable which breaks that redundancy.
my.df$cycle <- rep(1:2, each=num.id*num.time)
dcast(melt(my.df, id.vars=c("cycle", "ID", "TIME")), cycle+ID~variable+TIME)
This works because cycle
/ID
/time
now uniquely defines a row in my.df
.
Long to wide format using variable names
We can usepivor_longer %>% pivot_wider
. separate
is not needed if we set the appropriate parameters to pivor_longer
.
library(tidyr)
dataset %>%
pivot_longer(cols = matches('time\\d+$'), names_to = c('sport', 'time'), names_pattern = '(.*)\\.(.*)') %>%
pivot_wider(names_from = sport, values_from = value)
# A tibble: 15 × 5
id time basketball volleyball vollyeball
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 time1 2 2 NA
2 1 time2 3 3 NA
3 1 time3 1 NA 1
4 2 time1 5 3 NA
5 2 time2 4 4 NA
6 2 time3 8 NA 8
7 3 time1 4 4 NA
8 3 time2 5 3 NA
9 3 time3 4 NA 12
10 4 time1 3 0 NA
11 4 time2 3 1 NA
12 4 time3 3 NA 2
13 5 time1 3 1 NA
14 5 time2 2 3 NA
15 5 time3 1 NA 3
Related Topics
How to Find Changing Points in a Dataset
Calculate a 2D Spline Curve in R
Select List Element Programmatically Using Name Stored as String
Change The Color of a Ggplot Geom a Posteriori (After Having Specified Another Color)
Label_Parsed of Facet_Grid in Ggplot2 Mixed with Spaces and Expressions
Ggplot2 Log Transformation for Data and Scales
What Does The "More Columns Than Column Names" Error Mean
Devtools::Install_Git Over Ssh
Creating Categorical Variables from Mutually Exclusive Dummy Variables
How to Use Different Color Palettes for Different Layers in Ggplot2
How to Keep Track of Total Transaction Amount Sent from an Account Each Last 6 Month
Plot Weighted Frequency Matrix
Data Table String Concatenation of Sd Columns for by Group Values
How to Append R Data Frame into Existing Excel Without Overwriting
Under What Circumstances Does R Recycle