R Spreading Multiple Columns With Tidyr

R spreading multiple columns with tidyr

Here's a possible both simple and very efficient solution using data.table

library(data.table) ## v >= 1.9.6
dcast(setDT(df), month ~ student, value.var = c("A", "B"))
# month Amy_A Bob_A Amy_B Bob_B
# 1: 1 9 8 6 5
# 2: 2 7 6 7 6
# 3: 3 6 9 8 7

Or a possible tidyr solution

df %>% 
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)

# month Amy_A Amy_B Bob_A Bob_B
# 1 1 9 6 8 5
# 2 2 7 7 6 6
# 3 3 6 8 9 7

EDIT 22/10/2019

As mentioned in comments by @gjabel, newer tidyr versions (v1.0.0+)
have now pivot_wider and pivot_longer functions (currently in maturing state), hence, a newer approach would be

pivot_wider(data = df, 
id_cols = month,
names_from = student,
values_from = c("A", "B"))
# # A tibble: 3 x 5
# month A_Amy A_Bob B_Amy B_Bob
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 9 8 6 5
# 2 2 7 6 7 6
# 3 3 6 9 8 7

Is it possible to use spread on multiple columns in tidyr similar to dcast?

One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste, remove those columns with the select and reshape from 'long' to 'wide' using spread from tidyr.

 library(dplyr)
library(tidyr)
sdt %>%
mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
select(-Product, -Country)%>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
#1 1990 0.7878674 0.2486044
#2 1991 0.2343285 -1.1694878

Or we can avoid a couple of steps by using unite from tidyr (from @beetroot's comment) and reshape as before.

 sdt%>% 
unite(Prod_Count, Product,Country) %>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
# 1 1990 0.7878674 0.2486044
# 2 1991 0.2343285 -1.1694878

tidyr::spread() with multiple keys and values

Reshaping with multiple value variables can best be done with dcast from data.table or reshape from base R.

library(data.table)
out <- dcast(setDT(df), id ~ paste0("time", time), value.var = c("x", "y"), sep = "")
out
# id xtime1 xtime2 xtime3 ytime1 ytime2 ytime3
# 1: 1 0.4334921 -0.5205570 -1.44364515 0.49288757 -1.26955148 -0.83344256
# 2: 2 0.4785870 0.9261711 0.68173681 1.24639813 0.91805332 0.34346260
# 3: 3 -1.2067665 1.7309593 0.04923993 1.28184341 -0.69435556 0.01609261
# 4: 4 0.5240518 0.7481787 0.07966677 -1.36408357 1.72636849 -0.45827205
# 5: 5 0.3733316 -0.3689391 -0.11879819 -0.03276689 0.91824437 2.18084692
# 6: 6 0.2363018 -0.2358572 0.73389984 -1.10946940 -1.05379502 -0.82691626
# 7: 7 -1.4979165 0.9026397 0.84666801 1.02138768 -0.01072588 0.08925716
# 8: 8 0.3428946 -0.2235349 -1.21684977 0.40549497 0.68937085 -0.15793111
# 9: 9 -1.1304688 -0.3901419 -0.10722222 -0.54206830 0.34134397 0.48504564
#10: 10 -0.5275251 -1.1328937 -0.68059800 1.38790593 0.93199593 -1.77498807

Using reshape we could do

# setDF(df) # in case df is a data.table now
reshape(df, idvar = "id", timevar = "time", direction = "wide")

tidyr spread values values from two columns (and rename columns)

library(dplyr)
library(tidyr)
example %>%
pivot_wider(names_from = category,
values_from = c(value1, value2)) %>%
unnest()

tidyr:: gather multiple columns different types

I assume your expected output is incomplete as I don't see any entries for ID = 2 and ID = 3.

You could do the following

df %>%
gather(k, v, -ID) %>%
separate(k, into = c("tmp", "X_num", "ss"), sep = "_") %>%
select(-tmp) %>%
spread(ss, v)
# ID X_num abc xyz
#1 1 1 1 1
#2 1 2 2 2
#3 1 3 2 1
#4 2 1 1 2
#5 2 2 1 0
#6 2 3 1 NA
#7 3 1 1 2
#8 3 2 1 1
#9 3 3 NA 0

R - tidyr - mutate and spread multiple columns

Rather than spread(), you can use the new pivot_wider() that was added in the recent tidyr 1.0.0 release. It has a values_from argument that allows you to specify multiple columns at once:

library(dplyr)
library(tidyr)

my_df_test %>%
group_by(V1, V2) %>%
mutate(new = V3, V3 = toString(V3)) %>%
pivot_wider(
names_from = new,
values_from = c(V6, V7)
)
#> # A tibble: 2 x 9
#> # Groups: V1, V2 [4]
#> V1 V2 V3 V4 V5 V6_S1 V6_S2 V7_S1 V7_S2
#> <dbl> <fct> <chr> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 1 A S1, S2 x y A C D F
#> 2 2 B S1 x y B <NA> E <NA>

Created on 2019-09-18 by the reprex package (v0.3.0)

Elegant solution for casting (spreading) multiple columns of character vectors

Are you looking for?

library(dplyr)
library(tidyr)

tmp %>%
gather(key, value, -municipality, na.rm = TRUE) %>%
mutate(key = gsub("\\d+", "", key)) %>%
group_by(municipality, key) %>%
mutate(row = row_number()) %>%
spread(key, value) %>%
select(-row)

# municipality name phone
# <chr> <chr> <chr>
#1 M1 n1 p1
#2 M1 n3 p3
#3 M2 n2 p2
#4 M2 n4 p4
#5 M2 n5 p5

We can use gather to bring the data in long format dropping NA values. Remove numbers from individual column names so that they share the same key, create a column group_by municipality and key to spread the data into wide format.

How to spread a single column based on multiple columns in R?

You can group_by the columns, mutate to make the new column headers and then spread (or pivot_wider):

library(dplyr)

mydata %>%
group_by(Year, Site, Quadrant, Species) %>%
mutate(Var = paste0("Val", row_number())) %>%
spread(Var, Val) %>%
ungroup()

Result:

# A tibble: 4 x 6
Year Site Quadrant Species Val1 Val2
<int> <int> <int> <chr> <int> <int>
1 2019 1 1 A 20 30
2 2019 1 1 B 20 25
3 2019 1 2 A 20 10
4 2019 1 2 B 11 22

Data:

mydata <- read.table(text = "Year Site Quadrant Species Val
2019 1 1 A 20
2019 1 1 A 30
2019 1 1 B 20
2019 1 1 B 25
2019 1 2 A 20
2019 1 2 A 10
2019 1 2 B 11
2019 1 2 B 22", header = TRUE)

Spread multiple columns in a function

We'll return to the answer provided in the question linked to, but for the moment let's start with a more naive approach.

One idea would be to spread each value column individually, and then join the results, i.e.

library(dplyr)
library(tidyr)
library(tibble)

dat_avg <- dat %>%
select(-sd) %>%
spread(key = grp,value = avg) %>%
rename(a_avg = a,
b_avg = b)

dat_sd <- dat %>%
select(-avg) %>%
spread(key = grp,value = sd) %>%
rename(a_sd = a,
b_sd = b)

> full_join(dat_avg,
dat_sd,
by = 'id')

# A tibble: 2 x 5
id a_avg b_avg a_sd b_sd
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123

(I used a full_join just in case we run into situations where not all combinations of the join columns appear in all of them.)

Let's start with a function that works like spread but allows you to pass the key and value columns as characters:

spread_chr <- function(data, key_col, value_cols, fill = NA, 
convert = FALSE,drop = TRUE,sep = NULL){
n_val <- length(value_cols)
result <- vector(mode = "list", length = n_val)
id_cols <- setdiff(names(data), c(key_col,value_cols))

for (i in seq_along(result)){
result[[i]] <- spread(data = data[,c(id_cols,key_col,value_cols[i]),drop = FALSE],
key = !!key_col,
value = !!value_cols[i],
fill = fill,
convert = convert,
drop = drop,
sep = paste0(sep,value_cols[i],sep))
}

result %>%
purrr::reduce(.f = full_join, by = id_cols)
}

> dat %>%
spread_chr(key_col = "grp",
value_cols = c("avg","sd"),
sep = "_")

# A tibble: 2 x 5
id grp_avg_a grp_avg_b grp_sd_a grp_sd_b
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123

The key ideas here are to unquote the arguments key_col and value_cols[i] using the !! operator, and using the sep argument in spread to control the resulting value column names.

If we wanted to convert this function to accept unquoted arguments for the key and value columns, we could modify it like so:

spread_nq <- function(data, key_col,..., fill = NA, 
convert = FALSE, drop = TRUE, sep = NULL){
val_quos <- rlang::quos(...)
key_quo <- rlang::enquo(key_col)
value_cols <- unname(tidyselect::vars_select(names(data),!!!val_quos))
key_col <- unname(tidyselect::vars_select(names(data),!!key_quo))

n_val <- length(value_cols)
result <- vector(mode = "list",length = n_val)
id_cols <- setdiff(names(data),c(key_col,value_cols))

for (i in seq_along(result)){
result[[i]] <- spread(data = data[,c(id_cols,key_col,value_cols[i]),drop = FALSE],
key = !!key_col,
value = !!value_cols[i],
fill = fill,
convert = convert,
drop = drop,
sep = paste0(sep,value_cols[i],sep))
}

result %>%
purrr::reduce(.f = full_join,by = id_cols)
}

> dat %>%
spread_nq(key_col = grp,avg,sd,sep = "_")

# A tibble: 2 x 5
id grp_avg_a grp_avg_b grp_sd_a grp_sd_b
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123

The change here is that we capture the unquoted arguments with rlang::quos and rlang::enquo and then simply convert them back to characters using tidyselect::vars_select.

Returning to the solution in the linked question that uses a sequence of gather, unite and spread, we can use what we've learned to make a function like this:

spread_nt <- function(data,key_col,...,fill = NA,
convert = TRUE,drop = TRUE,sep = "_"){
key_quo <- rlang::enquo(key_col)
val_quos <- rlang::quos(...)
value_cols <- unname(tidyselect::vars_select(names(data),!!!val_quos))
key_col <- unname(tidyselect::vars_select(names(data),!!key_quo))

data %>%
gather(key = ..var..,value = ..val..,!!!val_quos) %>%
unite(col = ..grp..,c(key_col,"..var.."),sep = sep) %>%
spread(key = ..grp..,value = ..val..,fill = fill,
convert = convert,drop = drop,sep = NULL)
}

> dat %>%
spread_nt(key_col = grp,avg,sd,sep = "_")

# A tibble: 2 x 5
id a_avg a_sd b_avg b_sd
* <int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 0.6569923 -0.5646982 0.7050648
2 2 0.3631284 0.4577418 0.6328626 0.7191123

This relies on the same techniques from rlang from the last example. We're using some unusual names like ..var.. for our intermediate variables in order to reduce the chances of name collisions with existing columns in our data frame.

Also, we're using the sep argument in unite to control the resulting column names, so in this case when we spread we force sep = NULL.



Related Topics



Leave a reply



Submit