R spreading multiple columns with tidyr
Here's a possible both simple and very efficient solution using data.table
library(data.table) ## v >= 1.9.6
dcast(setDT(df), month ~ student, value.var = c("A", "B"))
# month Amy_A Bob_A Amy_B Bob_B
# 1: 1 9 8 6 5
# 2: 2 7 6 7 6
# 3: 3 6 9 8 7
Or a possible tidyr
solution
df %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 9 6 8 5
# 2 2 7 7 6 6
# 3 3 6 8 9 7
EDIT 22/10/2019
As mentioned in comments by @gjabel, newer tidyr versions (v1.0.0+)
have now pivot_wider
and pivot_longer
functions (currently in maturing state), hence, a newer approach would be
pivot_wider(data = df,
id_cols = month,
names_from = student,
values_from = c("A", "B"))
# # A tibble: 3 x 5
# month A_Amy A_Bob B_Amy B_Bob
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 9 8 6 5
# 2 2 7 6 7 6
# 3 3 6 9 8 7
Is it possible to use spread on multiple columns in tidyr similar to dcast?
One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste
, remove those columns with the select
and reshape from 'long' to 'wide' using spread
from tidyr
.
library(dplyr)
library(tidyr)
sdt %>%
mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
select(-Product, -Country)%>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
#1 1990 0.7878674 0.2486044
#2 1991 0.2343285 -1.1694878
Or we can avoid a couple of steps by using unite
from tidyr
(from @beetroot's comment) and reshape as before.
sdt%>%
unite(Prod_Count, Product,Country) %>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
# 1 1990 0.7878674 0.2486044
# 2 1991 0.2343285 -1.1694878
tidyr::spread() with multiple keys and values
Reshaping with multiple value variables can best be done with dcast
from data.table
or reshape
from base R
.
library(data.table)
out <- dcast(setDT(df), id ~ paste0("time", time), value.var = c("x", "y"), sep = "")
out
# id xtime1 xtime2 xtime3 ytime1 ytime2 ytime3
# 1: 1 0.4334921 -0.5205570 -1.44364515 0.49288757 -1.26955148 -0.83344256
# 2: 2 0.4785870 0.9261711 0.68173681 1.24639813 0.91805332 0.34346260
# 3: 3 -1.2067665 1.7309593 0.04923993 1.28184341 -0.69435556 0.01609261
# 4: 4 0.5240518 0.7481787 0.07966677 -1.36408357 1.72636849 -0.45827205
# 5: 5 0.3733316 -0.3689391 -0.11879819 -0.03276689 0.91824437 2.18084692
# 6: 6 0.2363018 -0.2358572 0.73389984 -1.10946940 -1.05379502 -0.82691626
# 7: 7 -1.4979165 0.9026397 0.84666801 1.02138768 -0.01072588 0.08925716
# 8: 8 0.3428946 -0.2235349 -1.21684977 0.40549497 0.68937085 -0.15793111
# 9: 9 -1.1304688 -0.3901419 -0.10722222 -0.54206830 0.34134397 0.48504564
#10: 10 -0.5275251 -1.1328937 -0.68059800 1.38790593 0.93199593 -1.77498807
Using reshape
we could do
# setDF(df) # in case df is a data.table now
reshape(df, idvar = "id", timevar = "time", direction = "wide")
tidyr spread values values from two columns (and rename columns)
library(dplyr)
library(tidyr)
example %>%
pivot_wider(names_from = category,
values_from = c(value1, value2)) %>%
unnest()
tidyr:: gather multiple columns different types
I assume your expected output is incomplete as I don't see any entries for ID = 2
and ID = 3
.
You could do the following
df %>%
gather(k, v, -ID) %>%
separate(k, into = c("tmp", "X_num", "ss"), sep = "_") %>%
select(-tmp) %>%
spread(ss, v)
# ID X_num abc xyz
#1 1 1 1 1
#2 1 2 2 2
#3 1 3 2 1
#4 2 1 1 2
#5 2 2 1 0
#6 2 3 1 NA
#7 3 1 1 2
#8 3 2 1 1
#9 3 3 NA 0
R - tidyr - mutate and spread multiple columns
Rather than spread()
, you can use the new pivot_wider()
that was added in the recent tidyr 1.0.0 release. It has a values_from
argument that allows you to specify multiple columns at once:
library(dplyr)
library(tidyr)
my_df_test %>%
group_by(V1, V2) %>%
mutate(new = V3, V3 = toString(V3)) %>%
pivot_wider(
names_from = new,
values_from = c(V6, V7)
)
#> # A tibble: 2 x 9
#> # Groups: V1, V2 [4]
#> V1 V2 V3 V4 V5 V6_S1 V6_S2 V7_S1 V7_S2
#> <dbl> <fct> <chr> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 1 A S1, S2 x y A C D F
#> 2 2 B S1 x y B <NA> E <NA>
Created on 2019-09-18 by the reprex package (v0.3.0)
Elegant solution for casting (spreading) multiple columns of character vectors
Are you looking for?
library(dplyr)
library(tidyr)
tmp %>%
gather(key, value, -municipality, na.rm = TRUE) %>%
mutate(key = gsub("\\d+", "", key)) %>%
group_by(municipality, key) %>%
mutate(row = row_number()) %>%
spread(key, value) %>%
select(-row)
# municipality name phone
# <chr> <chr> <chr>
#1 M1 n1 p1
#2 M1 n3 p3
#3 M2 n2 p2
#4 M2 n4 p4
#5 M2 n5 p5
We can use gather
to bring the data in long format dropping NA
values. Remove numbers from individual column names so that they share the same key
, create a column group_by
municipality
and key
to spread
the data into wide format.
How to spread a single column based on multiple columns in R?
You can group_by
the columns, mutate
to make the new column headers and then spread
(or pivot_wider
):
library(dplyr)
mydata %>%
group_by(Year, Site, Quadrant, Species) %>%
mutate(Var = paste0("Val", row_number())) %>%
spread(Var, Val) %>%
ungroup()
Result:
# A tibble: 4 x 6
Year Site Quadrant Species Val1 Val2
<int> <int> <int> <chr> <int> <int>
1 2019 1 1 A 20 30
2 2019 1 1 B 20 25
3 2019 1 2 A 20 10
4 2019 1 2 B 11 22
Data:
mydata <- read.table(text = "Year Site Quadrant Species Val
2019 1 1 A 20
2019 1 1 A 30
2019 1 1 B 20
2019 1 1 B 25
2019 1 2 A 20
2019 1 2 A 10
2019 1 2 B 11
2019 1 2 B 22", header = TRUE)
Spread multiple columns in a function
We'll return to the answer provided in the question linked to, but for the moment let's start with a more naive approach.
One idea would be to spread
each value column individually, and then join the results, i.e.
library(dplyr)
library(tidyr)
library(tibble)
dat_avg <- dat %>%
select(-sd) %>%
spread(key = grp,value = avg) %>%
rename(a_avg = a,
b_avg = b)
dat_sd <- dat %>%
select(-avg) %>%
spread(key = grp,value = sd) %>%
rename(a_sd = a,
b_sd = b)
> full_join(dat_avg,
dat_sd,
by = 'id')
# A tibble: 2 x 5
id a_avg b_avg a_sd b_sd
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
(I used a full_join
just in case we run into situations where not all combinations of the join columns appear in all of them.)
Let's start with a function that works like spread
but allows you to pass the key
and value
columns as characters:
spread_chr <- function(data, key_col, value_cols, fill = NA,
convert = FALSE,drop = TRUE,sep = NULL){
n_val <- length(value_cols)
result <- vector(mode = "list", length = n_val)
id_cols <- setdiff(names(data), c(key_col,value_cols))
for (i in seq_along(result)){
result[[i]] <- spread(data = data[,c(id_cols,key_col,value_cols[i]),drop = FALSE],
key = !!key_col,
value = !!value_cols[i],
fill = fill,
convert = convert,
drop = drop,
sep = paste0(sep,value_cols[i],sep))
}
result %>%
purrr::reduce(.f = full_join, by = id_cols)
}
> dat %>%
spread_chr(key_col = "grp",
value_cols = c("avg","sd"),
sep = "_")
# A tibble: 2 x 5
id grp_avg_a grp_avg_b grp_sd_a grp_sd_b
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
The key ideas here are to unquote the arguments key_col
and value_cols[i]
using the !!
operator, and using the sep
argument in spread
to control the resulting value column names.
If we wanted to convert this function to accept unquoted arguments for the key and value columns, we could modify it like so:
spread_nq <- function(data, key_col,..., fill = NA,
convert = FALSE, drop = TRUE, sep = NULL){
val_quos <- rlang::quos(...)
key_quo <- rlang::enquo(key_col)
value_cols <- unname(tidyselect::vars_select(names(data),!!!val_quos))
key_col <- unname(tidyselect::vars_select(names(data),!!key_quo))
n_val <- length(value_cols)
result <- vector(mode = "list",length = n_val)
id_cols <- setdiff(names(data),c(key_col,value_cols))
for (i in seq_along(result)){
result[[i]] <- spread(data = data[,c(id_cols,key_col,value_cols[i]),drop = FALSE],
key = !!key_col,
value = !!value_cols[i],
fill = fill,
convert = convert,
drop = drop,
sep = paste0(sep,value_cols[i],sep))
}
result %>%
purrr::reduce(.f = full_join,by = id_cols)
}
> dat %>%
spread_nq(key_col = grp,avg,sd,sep = "_")
# A tibble: 2 x 5
id grp_avg_a grp_avg_b grp_sd_a grp_sd_b
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
The change here is that we capture the unquoted arguments with rlang::quos
and rlang::enquo
and then simply convert them back to characters using tidyselect::vars_select
.
Returning to the solution in the linked question that uses a sequence of gather
, unite
and spread
, we can use what we've learned to make a function like this:
spread_nt <- function(data,key_col,...,fill = NA,
convert = TRUE,drop = TRUE,sep = "_"){
key_quo <- rlang::enquo(key_col)
val_quos <- rlang::quos(...)
value_cols <- unname(tidyselect::vars_select(names(data),!!!val_quos))
key_col <- unname(tidyselect::vars_select(names(data),!!key_quo))
data %>%
gather(key = ..var..,value = ..val..,!!!val_quos) %>%
unite(col = ..grp..,c(key_col,"..var.."),sep = sep) %>%
spread(key = ..grp..,value = ..val..,fill = fill,
convert = convert,drop = drop,sep = NULL)
}
> dat %>%
spread_nt(key_col = grp,avg,sd,sep = "_")
# A tibble: 2 x 5
id a_avg a_sd b_avg b_sd
* <int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 0.6569923 -0.5646982 0.7050648
2 2 0.3631284 0.4577418 0.6328626 0.7191123
This relies on the same techniques from rlang from the last example. We're using some unusual names like ..var..
for our intermediate variables in order to reduce the chances of name collisions with existing columns in our data frame.
Also, we're using the sep
argument in unite
to control the resulting column names, so in this case when we spread
we force sep = NULL
.
Related Topics
Plotting Contours on an Irregular Grid
Select Equivalent Rows [A-B & B-A]
Unlist Data Frame Column Preserving Information from Other Column
Plot Multiple Lines in One Graph
Remove Columns With Zero Values from a Dataframe
How to Put a Transformed Scale on the Right Side of a Ggplot2
How to Listen For More Than One Event Expression Within a Shiny Eventreactive Handler
Can Dplyr Summarise Over Several Variables Without Listing Each One
Using the %≫% Pipe, and Dot (.) Notation
Table of Interactions - Case With Pets and Houses
Efficient Way to Rbind Data.Frames With Different Columns
How to Change the Default Library Path For R Packages
Find How Many Times Duplicated Rows Repeat in R Data Frame
How to Display Only Integer Values on an Axis Using Ggplot2