Is it possible to use spread on multiple columns in tidyr similar to dcast?
One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste
, remove those columns with the select
and reshape from 'long' to 'wide' using spread
from tidyr
.
library(dplyr)
library(tidyr)
sdt %>%
mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
select(-Product, -Country)%>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
#1 1990 0.7878674 0.2486044
#2 1991 0.2343285 -1.1694878
Or we can avoid a couple of steps by using unite
from tidyr
(from @beetroot's comment) and reshape as before.
sdt%>%
unite(Prod_Count, Product,Country) %>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
# 1 1990 0.7878674 0.2486044
# 2 1991 0.2343285 -1.1694878
R spreading multiple columns with tidyr
Here's a possible both simple and very efficient solution using data.table
library(data.table) ## v >= 1.9.6
dcast(setDT(df), month ~ student, value.var = c("A", "B"))
# month Amy_A Bob_A Amy_B Bob_B
# 1: 1 9 8 6 5
# 2: 2 7 6 7 6
# 3: 3 6 9 8 7
Or a possible tidyr
solution
df %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 9 6 8 5
# 2 2 7 7 6 6
# 3 3 6 8 9 7
EDIT 22/10/2019
As mentioned in comments by @gjabel, newer tidyr versions (v1.0.0+)
have now pivot_wider
and pivot_longer
functions (currently in maturing state), hence, a newer approach would be
pivot_wider(data = df,
id_cols = month,
names_from = student,
values_from = c("A", "B"))
# # A tibble: 3 x 5
# month A_Amy A_Bob B_Amy B_Bob
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 9 8 6 5
# 2 2 7 6 7 6
# 3 3 6 9 8 7
Using spread to create two value columns with tidyr
I think what you want is another gather to break out the count and mean as separate observations, the gather(type, val, -source, -tone)
below.
gather(df, who, value) %>%
separate(who, into=c('source', 'tone')) %>%
group_by(source, tone) %>%
summarise(n=sum(value), avg=mean(value)) %>%
gather(type, val, -source, -tone) %>%
unite(stat, c(tone, type)) %>%
spread(stat, val)
Yields
Source: local data frame [2 x 5]
source Against_avg Against_n For_avg For_n
1 Activist 1.82 91 1.84 92
2 Politician 1.94 97 1.70 85
Spread multiple columns [tidyr]
We can use the na.rm=TRUE
in gather
, remove the 'variable' with select
and use spread
library(dplyr)
library(tidyr)
gather(dat, variable, val, -(ID:col1), na.rm=TRUE) %>%
select(-variable) %>%
spread(col1, val)
# ID A B C D E F G H I J
#1 1 d b b c b b b a 5 value
Update
With the devel version of tidyr
(tidyr_0.8.3.9000
), we can use pivot_wider
when there are multiple value columns to be considered
dat %>%
pivot_wider(names_from = col1, values_from = str_c("col", 2:4)) %>%
select_if(~ any(!is.na(.)))
# A tibble: 1 x 11
# ID col2_A col2_B col2_C col2_D col2_E col2_F col2_G col2_H col3_I col4_J
# <dbl> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
#1 1 a b c d e f g h 5 value
If we are using reshape2
, similar option is
library(reshape2)
dcast(melt(dat, measure = 3:5, na.rm=TRUE),
ID~col1, value.var='value')
How can I spread repeated measures of multiple variables into wide format?
Edit: I'm updating this answer since pivot_wider has been around for a while now and addresses the issue in this question and comments. You can now do
pivot_wider(
dat,
id_cols = 'Person',
names_from = 'Time',
values_from = c('Score1', 'Score2', 'Score3'),
names_glue = '{Time}.{.value}'
)
to get the desired result.
The original answer was
dat %>%
gather(temp, score, starts_with("Score")) %>%
unite(temp1, Time, temp, sep = ".") %>%
spread(temp1, score)
tidyr::spread() with multiple keys and values
Reshaping with multiple value variables can best be done with dcast
from data.table
or reshape
from base R
.
library(data.table)
out <- dcast(setDT(df), id ~ paste0("time", time), value.var = c("x", "y"), sep = "")
out
# id xtime1 xtime2 xtime3 ytime1 ytime2 ytime3
# 1: 1 0.4334921 -0.5205570 -1.44364515 0.49288757 -1.26955148 -0.83344256
# 2: 2 0.4785870 0.9261711 0.68173681 1.24639813 0.91805332 0.34346260
# 3: 3 -1.2067665 1.7309593 0.04923993 1.28184341 -0.69435556 0.01609261
# 4: 4 0.5240518 0.7481787 0.07966677 -1.36408357 1.72636849 -0.45827205
# 5: 5 0.3733316 -0.3689391 -0.11879819 -0.03276689 0.91824437 2.18084692
# 6: 6 0.2363018 -0.2358572 0.73389984 -1.10946940 -1.05379502 -0.82691626
# 7: 7 -1.4979165 0.9026397 0.84666801 1.02138768 -0.01072588 0.08925716
# 8: 8 0.3428946 -0.2235349 -1.21684977 0.40549497 0.68937085 -0.15793111
# 9: 9 -1.1304688 -0.3901419 -0.10722222 -0.54206830 0.34134397 0.48504564
#10: 10 -0.5275251 -1.1328937 -0.68059800 1.38790593 0.93199593 -1.77498807
Using reshape
we could do
# setDF(df) # in case df is a data.table now
reshape(df, idvar = "id", timevar = "time", direction = "wide")
How to create multiple columns from one column, maybe using dcast or tidyverse
We could use table
from base R
table(seq_len(nrow(data)), data$trimesterPeriod)
-output
first PP second third
1 1 0 0 0
2 0 0 1 0
3 0 0 0 1
4 0 1 0 0
5 0 0 0 1
6 0 0 1 0
7 0 1 0 0
8 1 0 0 0
Or using tidyverse
library(dplyr)
library(tidyr)
data %>%
mutate(ID = row_number()) %>%
pivot_wider(names_from = trimesterPeriod,
values_from = trimesterPeriod, values_fn = length,
values_fill = 0)
-output
# A tibble: 8 × 5
ID first second third PP
<int> <int> <int> <int> <int>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0 0 1 0
4 4 0 0 0 1
5 5 0 0 1 0
6 6 0 1 0 0
7 7 0 0 0 1
8 8 1 0 0 0
data
data <- structure(list(trimesterPeriod = c("first", "second", "third",
"PP", "third", "second", "PP", "first")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
Spread data and create new columns with names derived from multiple row
The trick is to use unite
to concatenate the columns Type
and Metric
, then use that new column as the key for spread
. I usually think about a task like this (had a few just like this at work this week) by figuring out where in my df each of those pieces of information are, such as where can I find "A" and where can I find "Percent", then how I can bring them together.
library(tidyverse)
a <- data_frame(
Type = c(rep("A",6),rep("B",6),c(rep("A",6),rep("B",6))),
Type2 = c(rep("x1",3),rep("X2",3),rep("x1",3),rep("X2",3),rep("x1",3),rep("X2",3),rep("x1",3),rep("X2",3)),
Color = rep(c("Red","Green","Yellow"),8),
Metric = c(rep("N",12),rep("Percent",12)),
Value = c(1:24)
)
a %>%
unite("type_metric", Type, Metric) %>%
spread(key = type_metric, value = Value)
#> # A tibble: 6 x 6
#> Type2 Color A_N A_Percent B_N B_Percent
#> <chr> <chr> <int> <int> <int> <int>
#> 1 x1 Green 2 14 8 20
#> 2 x1 Red 1 13 7 19
#> 3 x1 Yellow 3 15 9 21
#> 4 X2 Green 5 17 11 23
#> 5 X2 Red 4 16 10 22
#> 6 X2 Yellow 6 18 12 24
Created on 2018-05-10 by the reprex package (v0.2.0).
Related Topics
How to Convert R Markdown to HTML? I.E., What Does "Knit HTML" Do in Rstudio 0.96
How to Connect R with Access Database in 64-Bit Window
How to Calculate Combination and Permutation in R
How to Create a Marimekko/Mosaic Plot in Ggplot2
Get Last Row of Each Group in R
Why and Where Are \N Newline Characters Getting Introduced to C()
Add Empty Columns to a Dataframe with Specified Names from a Vector
Convert a Numeric Month to a Month Abbreviation
Different Size Facets Proportional of X Axis on Ggplot 2 R
How to Get Name of Variable in R (Substitute)
Moving Average of Previous Three Values in R
Difference Between Passing Options in Aes() and Outside of It in Ggplot2