How to Use Spread on Multiple Columns in Tidyr Similar to Dcast

Is it possible to use spread on multiple columns in tidyr similar to dcast?

One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste, remove those columns with the select and reshape from 'long' to 'wide' using spread from tidyr.

 library(dplyr)
library(tidyr)
sdt %>%
mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
select(-Product, -Country)%>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
#1 1990 0.7878674 0.2486044
#2 1991 0.2343285 -1.1694878

Or we can avoid a couple of steps by using unite from tidyr (from @beetroot's comment) and reshape as before.

 sdt%>% 
unite(Prod_Count, Product,Country) %>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
# 1 1990 0.7878674 0.2486044
# 2 1991 0.2343285 -1.1694878

R spreading multiple columns with tidyr

Here's a possible both simple and very efficient solution using data.table

library(data.table) ## v >= 1.9.6
dcast(setDT(df), month ~ student, value.var = c("A", "B"))
# month Amy_A Bob_A Amy_B Bob_B
# 1: 1 9 8 6 5
# 2: 2 7 6 7 6
# 3: 3 6 9 8 7

Or a possible tidyr solution

df %>% 
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)

# month Amy_A Amy_B Bob_A Bob_B
# 1 1 9 6 8 5
# 2 2 7 7 6 6
# 3 3 6 8 9 7

EDIT 22/10/2019

As mentioned in comments by @gjabel, newer tidyr versions (v1.0.0+)
have now pivot_wider and pivot_longer functions (currently in maturing state), hence, a newer approach would be

pivot_wider(data = df, 
id_cols = month,
names_from = student,
values_from = c("A", "B"))
# # A tibble: 3 x 5
# month A_Amy A_Bob B_Amy B_Bob
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 9 8 6 5
# 2 2 7 6 7 6
# 3 3 6 9 8 7

Using spread to create two value columns with tidyr

I think what you want is another gather to break out the count and mean as separate observations, the gather(type, val, -source, -tone) below.

gather(df, who, value) %>%
separate(who, into=c('source', 'tone')) %>%
group_by(source, tone) %>%
summarise(n=sum(value), avg=mean(value)) %>%
gather(type, val, -source, -tone) %>%
unite(stat, c(tone, type)) %>%
spread(stat, val)

Yields

Source: local data frame [2 x 5]

source Against_avg Against_n For_avg For_n
1 Activist 1.82 91 1.84 92
2 Politician 1.94 97 1.70 85

Spread multiple columns [tidyr]

We can use the na.rm=TRUE in gather, remove the 'variable' with select and use spread

library(dplyr)
library(tidyr)
gather(dat, variable, val, -(ID:col1), na.rm=TRUE) %>%
select(-variable) %>%
spread(col1, val)
# ID A B C D E F G H I J
#1 1 d b b c b b b a 5 value

Update

With the devel version of tidyr (tidyr_0.8.3.9000), we can use pivot_wider when there are multiple value columns to be considered

dat %>%
pivot_wider(names_from = col1, values_from = str_c("col", 2:4)) %>%
select_if(~ any(!is.na(.)))
# A tibble: 1 x 11
# ID col2_A col2_B col2_C col2_D col2_E col2_F col2_G col2_H col3_I col4_J
# <dbl> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
#1 1 a b c d e f g h 5 value

If we are using reshape2, similar option is

library(reshape2)
dcast(melt(dat, measure = 3:5, na.rm=TRUE),
ID~col1, value.var='value')

How can I spread repeated measures of multiple variables into wide format?

Edit: I'm updating this answer since pivot_wider has been around for a while now and addresses the issue in this question and comments. You can now do

pivot_wider(
dat,
id_cols = 'Person',
names_from = 'Time',
values_from = c('Score1', 'Score2', 'Score3'),
names_glue = '{Time}.{.value}'
)

to get the desired result.


The original answer was

dat %>% 
gather(temp, score, starts_with("Score")) %>%
unite(temp1, Time, temp, sep = ".") %>%
spread(temp1, score)

tidyr::spread() with multiple keys and values

Reshaping with multiple value variables can best be done with dcast from data.table or reshape from base R.

library(data.table)
out <- dcast(setDT(df), id ~ paste0("time", time), value.var = c("x", "y"), sep = "")
out
# id xtime1 xtime2 xtime3 ytime1 ytime2 ytime3
# 1: 1 0.4334921 -0.5205570 -1.44364515 0.49288757 -1.26955148 -0.83344256
# 2: 2 0.4785870 0.9261711 0.68173681 1.24639813 0.91805332 0.34346260
# 3: 3 -1.2067665 1.7309593 0.04923993 1.28184341 -0.69435556 0.01609261
# 4: 4 0.5240518 0.7481787 0.07966677 -1.36408357 1.72636849 -0.45827205
# 5: 5 0.3733316 -0.3689391 -0.11879819 -0.03276689 0.91824437 2.18084692
# 6: 6 0.2363018 -0.2358572 0.73389984 -1.10946940 -1.05379502 -0.82691626
# 7: 7 -1.4979165 0.9026397 0.84666801 1.02138768 -0.01072588 0.08925716
# 8: 8 0.3428946 -0.2235349 -1.21684977 0.40549497 0.68937085 -0.15793111
# 9: 9 -1.1304688 -0.3901419 -0.10722222 -0.54206830 0.34134397 0.48504564
#10: 10 -0.5275251 -1.1328937 -0.68059800 1.38790593 0.93199593 -1.77498807

Using reshape we could do

# setDF(df) # in case df is a data.table now
reshape(df, idvar = "id", timevar = "time", direction = "wide")

How to create multiple columns from one column, maybe using dcast or tidyverse

We could use table from base R

table(seq_len(nrow(data)), data$trimesterPeriod)

-output

    first PP second third
1 1 0 0 0
2 0 0 1 0
3 0 0 0 1
4 0 1 0 0
5 0 0 0 1
6 0 0 1 0
7 0 1 0 0
8 1 0 0 0

Or using tidyverse

library(dplyr)
library(tidyr)
data %>%
mutate(ID = row_number()) %>%
pivot_wider(names_from = trimesterPeriod,
values_from = trimesterPeriod, values_fn = length,
values_fill = 0)

-output

# A tibble: 8 × 5
ID first second third PP
<int> <int> <int> <int> <int>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0 0 1 0
4 4 0 0 0 1
5 5 0 0 1 0
6 6 0 1 0 0
7 7 0 0 0 1
8 8 1 0 0 0

data

data <- structure(list(trimesterPeriod = c("first", "second", "third", 
"PP", "third", "second", "PP", "first")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))

Spread data and create new columns with names derived from multiple row

The trick is to use unite to concatenate the columns Type and Metric, then use that new column as the key for spread. I usually think about a task like this (had a few just like this at work this week) by figuring out where in my df each of those pieces of information are, such as where can I find "A" and where can I find "Percent", then how I can bring them together.

library(tidyverse)

a <- data_frame(
Type = c(rep("A",6),rep("B",6),c(rep("A",6),rep("B",6))),
Type2 = c(rep("x1",3),rep("X2",3),rep("x1",3),rep("X2",3),rep("x1",3),rep("X2",3),rep("x1",3),rep("X2",3)),
Color = rep(c("Red","Green","Yellow"),8),
Metric = c(rep("N",12),rep("Percent",12)),
Value = c(1:24)
)

a %>%
unite("type_metric", Type, Metric) %>%
spread(key = type_metric, value = Value)
#> # A tibble: 6 x 6
#> Type2 Color A_N A_Percent B_N B_Percent
#> <chr> <chr> <int> <int> <int> <int>
#> 1 x1 Green 2 14 8 20
#> 2 x1 Red 1 13 7 19
#> 3 x1 Yellow 3 15 9 21
#> 4 X2 Green 5 17 11 23
#> 5 X2 Red 4 16 10 22
#> 6 X2 Yellow 6 18 12 24

Created on 2018-05-10 by the reprex package (v0.2.0).



Related Topics



Leave a reply



Submit