How to Spread or Cast Multiple Values in R

how to spread or cast multiple values in r

We could do this using dplyr/tidyr. We reshape the 'data' from 'wide' to 'long' format with gather specifying the columns (starts_with('value')) to be combined to a key/value column pair ('Var/Val'), unite the 'Var' and 'y' column to create a single 'Var1' column, and reconvert back to 'wide' format with spread.

 library(dplyr)
library(tidyr)
data %>%
gather(Var, val, starts_with("value")) %>%
unite(Var1,Var, y) %>%
spread(Var1, val)

# x value.1_a value.1_b value.1_c value.1_d value.2_a value.2_b value.2_c
#1 blue 5 6 7 8 17 18 19
#2 green 9 10 11 12 21 22 23
#3 red 1 2 3 4 13 14 15
# value.2_d
#1 20
#2 24
#3 16

Update

(After 6 months)

Reshaping multiple value columns to wide is now possible with dcast from data.table_1.9.5 without using the melt. We can install the devel version from here

 library(data.table)
dcast(setDT(data), x~y, value.var=c('value.1', 'value.2'))
# x a_value.1 b_value.1 c_value.1 d_value.1 a_value.2 b_value.2 c_value.2
#1: blue 5 6 7 8 17 18 19
#2: green 9 10 11 12 21 22 23
#3: red 1 2 3 4 13 14 15
# d_value.2
#1: 20
#2: 24
#3: 16

tidyr::spread() with multiple keys and values

Reshaping with multiple value variables can best be done with dcast from data.table or reshape from base R.

library(data.table)
out <- dcast(setDT(df), id ~ paste0("time", time), value.var = c("x", "y"), sep = "")
out
# id xtime1 xtime2 xtime3 ytime1 ytime2 ytime3
# 1: 1 0.4334921 -0.5205570 -1.44364515 0.49288757 -1.26955148 -0.83344256
# 2: 2 0.4785870 0.9261711 0.68173681 1.24639813 0.91805332 0.34346260
# 3: 3 -1.2067665 1.7309593 0.04923993 1.28184341 -0.69435556 0.01609261
# 4: 4 0.5240518 0.7481787 0.07966677 -1.36408357 1.72636849 -0.45827205
# 5: 5 0.3733316 -0.3689391 -0.11879819 -0.03276689 0.91824437 2.18084692
# 6: 6 0.2363018 -0.2358572 0.73389984 -1.10946940 -1.05379502 -0.82691626
# 7: 7 -1.4979165 0.9026397 0.84666801 1.02138768 -0.01072588 0.08925716
# 8: 8 0.3428946 -0.2235349 -1.21684977 0.40549497 0.68937085 -0.15793111
# 9: 9 -1.1304688 -0.3901419 -0.10722222 -0.54206830 0.34134397 0.48504564
#10: 10 -0.5275251 -1.1328937 -0.68059800 1.38790593 0.93199593 -1.77498807

Using reshape we could do

# setDF(df) # in case df is a data.table now
reshape(df, idvar = "id", timevar = "time", direction = "wide")

Reshape multiple value columns to wide format

Your best option is to reshape your data to long format, using melt, and then to dcast:

library(reshape2)

meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)

The first few lines of output:

             expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000

Is it possible to use spread on multiple columns in tidyr similar to dcast?

One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste, remove those columns with the select and reshape from 'long' to 'wide' using spread from tidyr.

 library(dplyr)
library(tidyr)
sdt %>%
mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
select(-Product, -Country)%>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
#1 1990 0.7878674 0.2486044
#2 1991 0.2343285 -1.1694878

Or we can avoid a couple of steps by using unite from tidyr (from @beetroot's comment) and reshape as before.

 sdt%>% 
unite(Prod_Count, Product,Country) %>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
# 1 1990 0.7878674 0.2486044
# 2 1991 0.2343285 -1.1694878

How to cast multiple columns and values of a data.table?

I make the assumption that each Id maps to a unique group and get rid of that variable, but otherwise this is essentially the same as @user227710's answer.

Idg <- unique(DT[,.(Id,group)])
DT[,group:=NULL]

res <- dcast(
melt(DT, id.vars = c("Id","Date")),
variable+Id ~ Date,
value.var = "value",
fill = 0,
margins = "Date",
fun.aggregate = sum
)

# and if you want the group back...
setDT(res) # needed before data.table 1.9.5, where using dcast.data.table is another option
setkey(res,Id)
res[Idg][order(variable,Id)]

which gives

   variable  Id 1997-01-01 1997-01-02 1997-01-03 1997-01-04 (all) group
1: Price.1 1 29 25 14 26 94 1
2: Price.2 1 4 5 6 6 21 1
3: Price.1 10 0 30 0 0 30 1
4: Price.2 10 0 8 0 0 8 1
5: Price.1 100 0 16 0 13 29 2
6: Price.2 100 0 2 0 3 5 2
7: Price.1 101 0 0 62 18 80 2
8: Price.2 101 0 0 5 15 20 2

can the value.var in dcast be a list or have multiple value variables?

From v1.9.6 of data.table, we can cast multiple value.var columns simultaneously (and also use multiple aggregation functions in fun.aggregate). Please see ?dcast and the Efficient reshaping using data.tables vignette for more.

Here's how we could use dcast:

dcast(setDT(mydf), x1 ~ x2, value.var=c("salt", "sugar"))
# x1 salt_1 salt_2 salt_3 sugar_1 sugar_2 sugar_3
# 1: 1 3 4 6 1 2 2
# 2: 2 10 3 9 5 3 6
# 3: 3 10 7 7 4 6 7

Casting with more that one value variable R

You could do this in base R

reshape(data, idvar='day', timevar='site',direction='wide')
# day value.1.a value.2.a value.1.b value.2.b
#1 1 1 5 9 6
#2 2 2 4 4 9
#3 3 5 7 2 4
#4 4 7 6 8 2
#5 5 5 2 1 5
#6 6 3 4 8 6

Reshape multiple values at once

In "reshape2", you can use recast (though in my experience, this isn't a widely known function).

library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
# id transactions_expense transactions_income amount_expense amount_income
# 1 20 25 20 95 100
# 2 30 45 50 250 300

You can also use base R's reshape:

reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
# id transactions.income amount.income transactions.expense amount.expense
# 1 20 20 100 25 95
# 3 30 50 300 45 250

Or, you can melt and dcast, like this (here with "data.table"):

library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")),
id ~ variable + type, value.var = "value")
# id transactions_expense transactions_income amount_expense amount_income
# 1: 20 25 20 95 100
# 2: 30 45 50 250 300

In later versions of dcast.data.table from "data.table" (1.9.8) you will be able to do this directly. If I understand correctly, what @Arun is trying to implement would be doing the reshaping without first having to melt the data, which is what happens presently with recast, which is essentially a wrapper for a melt + dcast sequence of operations.


And, for thoroughness, here's the tidyr approach:

library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, transactions:amount) %>%
unite(var2, type, var) %>%
spread(var2, val)
# id expense_amount expense_transactions income_amount income_transactions
# 1 20 95 25 100 20
# 2 30 250 45 300 50


Related Topics



Leave a reply



Submit