can the value.var in dcast be a list or have multiple value variables?
From v1.9.6 of data.table, we can cast multiple value.var
columns simultaneously (and also use multiple aggregation functions in fun.aggregate
). Please see ?dcast
and the Efficient reshaping using data.tables vignette for more.
Here's how we could use dcast
:
dcast(setDT(mydf), x1 ~ x2, value.var=c("salt", "sugar"))
# x1 salt_1 salt_2 salt_3 sugar_1 sugar_2 sugar_3
# 1: 1 3 4 6 1 2 2
# 2: 2 10 3 9 5 3 6
# 3: 3 10 7 7 4 6 7
dcast with multiple variables in val.var option
The error occurs when we are using reshape2::dcast
instead of data.table::dcast
because reshape2::dcast
doesn't support more than one value.var
.
The documentation for ?reshape2::dcast
gives
value.var - name of column which stores values, see guess_value for default strategies to figure this out.
while in ?data.table::dcast
it is
value.var - Name of the column whose values will be filled to cast. Function guess() tries to, well, guess this column automatically, if none is provided. Cast multiple value.var columns simultaneously by passing their names as a character vector. See Examples.
With a small reproducible example
data(mtcars)
dcast(mtcars, vs + am ~ carb, fun.aggregate = sum, value.var = c('mpg', 'disp'))
Error in .subset2(x, i, exact = exact) : subscript out of bounds
In addition: Warning messages:
1: In dcast(mtcars, vs + am ~ carb, fun.aggregate = sum, value.var = c("mpg",
If we convert to data.table
library(data.table)
dcast(as.data.table(mtcars), vs + am ~ carb, fun.aggregate = sum, value.var = c('mpg', 'disp'))
# vs am mpg_1 mpg_2 mpg_3 mpg_4 mpg_6 mpg_8 disp_1 disp_2 disp_3 disp_4 disp_6 disp_8
#1: 0 0 0.0 68.6 48.9 63.1 0.0 0 0.0 1382.0 827.4 2082.0 0 0
#2: 0 1 0.0 26.0 0.0 57.8 19.7 15 0.0 120.3 0.0 671.0 145 301
#3: 1 0 61.0 47.2 0.0 37.0 0.0 0 603.1 287.5 0.0 335.2 0 0
#4: 1 1 116.4 82.2 0.0 0.0 0.0 0 336.8 291.8 0.0 0.0 0 0
In the OP's code, it would be
summary_out <- dcast(setDT(DB1),
REGION_ID + REGION_NAME ~ STATUS,
fun.aggregate = sum,
value.var = c("SALES","PROFIT"))
Why can't one have several `value.var` in `dcast`?
This question is very much related to your other question from earlier today.
@beginneR wrote in the comments that "As long as the existing data is already in long-format, I don't see any general need to melt it before casting." In my answer posted at your other question, I gave an example of when melt
would be required, or rather, how to decide whether your data are long enough.
This question here is another example of when further melt
ing would be required since point 3 in my answer is not satisfied.
To get the behavior you want, try the following:
C93L <- melt(Cars93, measure.vars = c("Price", "Weight"))
dcast(C93L, AirBags ~ DriveTrain + variable, mean, value.var = "value")
# AirBags 4WD_Price 4WD_Weight Front_Price Front_Weight
# 1 Driver & Passenger NaN NaN 26.17273 3393.636
# 2 Driver only 21.38 3623 18.69286 2996.250
# 3 None 13.88 2987 12.98571 2703.036
# Rear_Price Rear_Weight
# 1 33.20 3515.0
# 2 28.23 3463.5
# 3 14.90 3610.0
An alternative is to use aggregate
to calculate the mean
s, and then use reshape
or dcast
to go from "long" to "wide". Both are required since reshape
does not perform any aggregation:
temp <- aggregate(cbind(Price, Weight) ~ AirBags + DriveTrain,
Cars93, mean)
# AirBags DriveTrain Price Weight
# 1 Driver only 4WD 21.38000 3623.000
# 2 None 4WD 13.88000 2987.000
# 3 Driver & Passenger Front 26.17273 3393.636
# 4 Driver only Front 18.69286 2996.250
# 5 None Front 12.98571 2703.036
# 6 Driver & Passenger Rear 33.20000 3515.000
# 7 Driver only Rear 28.23000 3463.500
# 8 None Rear 14.90000 3610.000
reshape(temp, direction = "wide",
idvar = "AirBags", timevar = "DriveTrain")
# AirBags Price.4WD Weight.4WD Price.Front Weight.Front
# 1 Driver only 21.38 3623 18.69286 2996.250
# 2 None 13.88 2987 12.98571 2703.036
# 3 Driver & Passenger NA NA 26.17273 3393.636
# Price.Rear Weight.Rear
# 1 28.23 3463.5
# 2 14.90 3610.0
# 3 33.20 3515.0
on dcast() argument value.var
Both reshape2
and spread
have been deprecated or retired - the tidyverse
now wants you to use pivot_wider
. I'm not up to date on that syntax, but dcast
still does what you want it to with data.table
.
library(data.table)
d1 <- data.table(ID = c(11,11,11,12,12,12),
codes = c('a', 'a', 'a', 'b', 'a', 'a'),
gfreq = c(.5,.5,.5,NA,.5,.5))
dcast(d1, ID ~ codes)
#> Using 'gfreq' as value column. Use 'value.var' to override
#> Aggregate function missing, defaulting to 'length'
#> ID a b
#> 1: 11 3 0
#> 2: 12 2 1
d2 <- data.table(ID = c(11,11,11,12,12,12),
codes = c('a', 'a', 'a', 'b', 'a', 'a'))
dcast(d2, ID ~ codes)
#> Using 'codes' as value column. Use 'value.var' to override
#> Aggregate function missing, defaulting to 'length'
#> ID a b
#> 1: 11 3 0
#> 2: 12 2 1
## If you only want 1's and 0's
dcast(unique(d2), ID ~ codes,
fun.aggregate = length)
#> Using 'codes' as value column. Use 'value.var' to override
#> ID a b
#> 1: 11 1 0
#> 2: 12 1 1
Created on 2019-10-16 by the reprex package (v0.3.0)
Apply dcast multiple times for different variables
Here is an option with cSplit_e
library(splitstackshape)
cSplit_e(mydf, 'V1', type = 'character', fill = '0') %>%
cSplit_e('V2', type = 'character', fill = '0')
# A V1 V2 V1_x V1_y V2_u V2_v V2_w
#1: A x u 1 0 1 0 0
#2: B x v 1 0 0 1 0
#3: C y w 0 1 0 0 1
#4: D x v 1 0 0 1 0
#5: E y u 0 1 1 0 0
Or with table
from base R
do.call(cbind, lapply(2:3, function(i) table(mydf$A, mydf[[i]])))
Or the same approach in data.table
syntax
nm1 <- names(mydf)[-1]
out <- mydf[, lapply(.SD, function(x)
as.data.frame.matrix(table(A, x))), .SDcols = nm1]
mydf[, names(out) := out][]
# A V1 V2 V1.x V1.y V2.u V2.v V2.w
#1: A x u 1 0 1 0 0
#2: B x v 1 0 0 1 0
#3: C y w 0 1 0 0 1
#4: D x v 1 0 0 1 0
#5: E y u 0 1 1 0 0
dcast With multiple Ids and variables
A tidyverse
solution, using gather
and spread
from tidyr
pacakge:
library(dplyr)
library(tidyr) #version 1.0.0 which has pivot_wider
df1 %>%
group_by(Type) %>%
mutate(name_x = row_number()) %>%
gather(key=var, value=val, c(Score, Time)) %>%
mutate(var = paste(var, name_x, sep="_")) %>%
select(-name_x) %>%
spread(key=var, value=val)
#> # A tibble: 3 x 11
#> # Groups: Type [3]
#> id Date Type Score_1 Score_2 Score_3 Score_4 Time_1 Time_2 Time_3 Time_4
#> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 2001~ aaa 123 456 789 NA 12:12 13:12 14:12 <NA>
#> 2 2 2001~ ddd 113 145 NA NA 15:12 16:12 <NA> <NA>
#> 3 3 2001~ bbb 789 145 113 145 17:12 18:12 19:12 20:12
You can do the same with pivot_wider
much more conveniently:
df1 %>%
group_by(Type) %>%
mutate(name_x = row_number()) %>%
pivot_wider(id_cols = c("id","Date", "Type"),
names_from = c("name_x"),
values_from = c("Score", "Time"))
Data:
df1 <- data.frame(id=c(1,1,1,2,2,3,3,3,3),
Date = c(rep("2001-01-13", 3), rep("2001-01-16", 2), rep("2001-01-18", 4)),
Type = c(rep("aaa",3), rep("ddd", 2), rep("bbb",4)),
Score = c(123,456,789,113,145,789,145,113,145),
Time = paste0(12:20, ":12"),
stringsAsFactors = F)
reshape2: dcast when there are multiple values for one cell but keep this values
This can be done with dcast
(here from data.table
) though you need a row identifier.
library(data.table)
dcast(dt, HLA_Status + rowid(HLA_Status, variable) ~ variable)
# HLA_Status HLA_Status_1 CCL24 SPP1
#1: PC 1 5.698 2.698
#2: PC 2 89.457 9.457
#3: PC 3 78.230 8.230
#4: PP 1 9.645 23.120
#5: PP 2 56.320 36.320
#6: PP 3 7.268 17.268
data
dt <- fread(" HLA_Status variable value
PP CCL24 9.645
PP CCL24 56.32
PP CCL24 7.268
PC CCL24 5.698
PC CCL24 89.457
PC CCL24 78.23
PP SPP1 23.12
PP SPP1 36.32
PP SPP1 17.268
PC SPP1 2.698
PC SPP1 9.457
PC SPP1 8.23")
dcast with value being text
Since you had dcast
in your title, I'll assume data.table
:
data.table::dcast(question ~ employeeid, data = df, value.var = "Answer")
# question 1 2
# 1 do you like apples? No No
# 2 do you like milk? Yes No
but an alternative:
tidyr::spread(df, employeeid, Answer)
# question 1 2
# 1 do you like apples? No No
# 2 do you like milk? Yes No
Edit: since it appears you have dupes in the data, you can find the "most-occurring" answer with:
most <- function(x) names(sort(table(x)))[1]
data.table::dcast(question~employeeid, data=df, value.var="Answer", fun.aggregate = most)
# question 1 2
# 1 do you like apples? Yes Yes
# 2 do you like milk? No Yes
dcast function taking arguments from two value variables
Not sure if I understood your goal but from my interpretation, a quick and dirty way is to group by cars and state first, create the new column, then dcast the new data table
mycars <- as.data.table(mycars)
temp <- mycars[, .(z = car_PS_var(PS_mean, PS_stdv)),
by = c("cars", "state")]
dcast(temp, cars ~ state)
cars 1 2
1: A 1.449275 1.449275
2: B 4.325825 4.325825
3: C 4.545340 4.545340
Is it possible to use dcast without variable column?
With dcast
, we can create formula on the fly with an expression created with paste
and rowid
library(data.table)
dcast(dt, id ~ paste0('var_', rowid(id)))
-output
id var_1 var_2
1: 1 100 300
2: 2 200 NA
Related Topics
How to Change .Libpaths() Permanently in R
An Na in Subsetting a Data.Frame Does Something Unexpected
How to Use Empty Space Produced by Facet_Wrap
How to Facet a Plot_Ly() Chart
Combining Duplicated Rows in R and Adding New Column Containing Ids of Duplicates
Why I Get This Error Writing Data to a File
Package "Rvest" for Web Scraping Https Site with Proxy
Determine the Number of Na Values in a Column
R for Loop Skip to Next Iteration Ifelse
Creating Multi Column Legend in Ggplot
Add Values to a Reactive Table in Shiny
Check If a Date Is Within an Interval in R
How to Add Annotations Below the X Axis in Ggplot2
Pie Charts in Ggplot2 with Variable Pie Sizes
Display Row Names in a Data.Table Object