how to spread or cast multiple values in r
We could do this using dplyr/tidyr
. We reshape the 'data' from 'wide' to 'long' format with gather
specifying the columns (starts_with('value')
) to be combined to a key/value column pair ('Var/Val'), unite
the 'Var' and 'y' column to create a single 'Var1' column, and reconvert back to 'wide' format with spread
.
library(dplyr)
library(tidyr)
data %>%
gather(Var, val, starts_with("value")) %>%
unite(Var1,Var, y) %>%
spread(Var1, val)
# x value.1_a value.1_b value.1_c value.1_d value.2_a value.2_b value.2_c
#1 blue 5 6 7 8 17 18 19
#2 green 9 10 11 12 21 22 23
#3 red 1 2 3 4 13 14 15
# value.2_d
#1 20
#2 24
#3 16
Update
(After 6 months)
Reshaping multiple value columns to wide is now possible with dcast
from data.table_1.9.5
without using the melt
. We can install the devel version from here
library(data.table)
dcast(setDT(data), x~y, value.var=c('value.1', 'value.2'))
# x a_value.1 b_value.1 c_value.1 d_value.1 a_value.2 b_value.2 c_value.2
#1: blue 5 6 7 8 17 18 19
#2: green 9 10 11 12 21 22 23
#3: red 1 2 3 4 13 14 15
# d_value.2
#1: 20
#2: 24
#3: 16
tidyr::spread() with multiple keys and values
Reshaping with multiple value variables can best be done with dcast
from data.table
or reshape
from base R
.
library(data.table)
out <- dcast(setDT(df), id ~ paste0("time", time), value.var = c("x", "y"), sep = "")
out
# id xtime1 xtime2 xtime3 ytime1 ytime2 ytime3
# 1: 1 0.4334921 -0.5205570 -1.44364515 0.49288757 -1.26955148 -0.83344256
# 2: 2 0.4785870 0.9261711 0.68173681 1.24639813 0.91805332 0.34346260
# 3: 3 -1.2067665 1.7309593 0.04923993 1.28184341 -0.69435556 0.01609261
# 4: 4 0.5240518 0.7481787 0.07966677 -1.36408357 1.72636849 -0.45827205
# 5: 5 0.3733316 -0.3689391 -0.11879819 -0.03276689 0.91824437 2.18084692
# 6: 6 0.2363018 -0.2358572 0.73389984 -1.10946940 -1.05379502 -0.82691626
# 7: 7 -1.4979165 0.9026397 0.84666801 1.02138768 -0.01072588 0.08925716
# 8: 8 0.3428946 -0.2235349 -1.21684977 0.40549497 0.68937085 -0.15793111
# 9: 9 -1.1304688 -0.3901419 -0.10722222 -0.54206830 0.34134397 0.48504564
#10: 10 -0.5275251 -1.1328937 -0.68059800 1.38790593 0.93199593 -1.77498807
Using reshape
we could do
# setDF(df) # in case df is a data.table now
reshape(df, idvar = "id", timevar = "time", direction = "wide")
Reshape multiple value columns to wide format
Your best option is to reshape your data to long format, using melt
, and then to dcast
:
library(reshape2)
meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)
The first few lines of output:
expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000
Is it possible to use spread on multiple columns in tidyr similar to dcast?
One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste
, remove those columns with the select
and reshape from 'long' to 'wide' using spread
from tidyr
.
library(dplyr)
library(tidyr)
sdt %>%
mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
select(-Product, -Country)%>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
#1 1990 0.7878674 0.2486044
#2 1991 0.2343285 -1.1694878
Or we can avoid a couple of steps by using unite
from tidyr
(from @beetroot's comment) and reshape as before.
sdt%>%
unite(Prod_Count, Product,Country) %>%
spread(Prod_Count, value)%>%
head(2)
# Year A_AI B_EI
# 1 1990 0.7878674 0.2486044
# 2 1991 0.2343285 -1.1694878
How to cast multiple columns and values of a data.table?
I make the assumption that each Id
maps to a unique group
and get rid of that variable, but otherwise this is essentially the same as @user227710's answer.
Idg <- unique(DT[,.(Id,group)])
DT[,group:=NULL]
res <- dcast(
melt(DT, id.vars = c("Id","Date")),
variable+Id ~ Date,
value.var = "value",
fill = 0,
margins = "Date",
fun.aggregate = sum
)
# and if you want the group back...
setDT(res) # needed before data.table 1.9.5, where using dcast.data.table is another option
setkey(res,Id)
res[Idg][order(variable,Id)]
which gives
variable Id 1997-01-01 1997-01-02 1997-01-03 1997-01-04 (all) group
1: Price.1 1 29 25 14 26 94 1
2: Price.2 1 4 5 6 6 21 1
3: Price.1 10 0 30 0 0 30 1
4: Price.2 10 0 8 0 0 8 1
5: Price.1 100 0 16 0 13 29 2
6: Price.2 100 0 2 0 3 5 2
7: Price.1 101 0 0 62 18 80 2
8: Price.2 101 0 0 5 15 20 2
can the value.var in dcast be a list or have multiple value variables?
From v1.9.6 of data.table, we can cast multiple value.var
columns simultaneously (and also use multiple aggregation functions in fun.aggregate
). Please see ?dcast
and the Efficient reshaping using data.tables vignette for more.
Here's how we could use dcast
:
dcast(setDT(mydf), x1 ~ x2, value.var=c("salt", "sugar"))
# x1 salt_1 salt_2 salt_3 sugar_1 sugar_2 sugar_3
# 1: 1 3 4 6 1 2 2
# 2: 2 10 3 9 5 3 6
# 3: 3 10 7 7 4 6 7
Casting with more that one value variable R
You could do this in base R
reshape(data, idvar='day', timevar='site',direction='wide')
# day value.1.a value.2.a value.1.b value.2.b
#1 1 1 5 9 6
#2 2 2 4 4 9
#3 3 5 7 2 4
#4 4 7 6 8 2
#5 5 5 2 1 5
#6 6 3 4 8 6
Reshape multiple values at once
In "reshape2", you can use recast
(though in my experience, this isn't a widely known function).
library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
# id transactions_expense transactions_income amount_expense amount_income
# 1 20 25 20 95 100
# 2 30 45 50 250 300
You can also use base R's reshape
:
reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
# id transactions.income amount.income transactions.expense amount.expense
# 1 20 20 100 25 95
# 3 30 50 300 45 250
Or, you can melt
and dcast
, like this (here with "data.table"):
library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")),
id ~ variable + type, value.var = "value")
# id transactions_expense transactions_income amount_expense amount_income
# 1: 20 25 20 95 100
# 2: 30 45 50 250 300
In later versions of dcast.data.table
from "data.table" (1.9.8) you will be able to do this directly. If I understand correctly, what @Arun is trying to implement would be doing the reshaping without first having to melt
the data, which is what happens presently with recast
, which is essentially a wrapper for a melt
+ dcast
sequence of operations.
And, for thoroughness, here's the tidyr
approach:
library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, transactions:amount) %>%
unite(var2, type, var) %>%
spread(var2, val)
# id expense_amount expense_transactions income_amount income_transactions
# 1 20 95 25 100 20
# 2 30 250 45 300 50
Related Topics
Exactly Storing Large Integers
How to Deal with "Data of Class Uneval" Error from Ggplot2
Replace All Values in a Matrix <0.1 with 0
Display Exact Value of a Variable in R
How to Combine 2 Plots (Ggplot) into One Plot
Adding a Company Logo to Shinydashboard Header
How to Index an Element of a List Object in R
How to Shade a Region Under a Curve Using Ggplot2
Dplyr - Group by and Select Top X %
Select Row with Most Recent Date by Group
Plot a Line Chart with Conditional Colors Depending on Values
How to Spread or Cast Multiple Values in R
What Is About the First Column in R's Dataset Mtcars
How to Read CSV File in R Where Some Values Contain the Percent Symbol (%)
How to Make a Discontinuous Axis in R with Ggplot2
Data.Frame Merge and Selection of Values Which Are Common in 2 Data.Frames