Why Can't One Have Several 'Value.Var' in 'Dcast'

can the value.var in dcast be a list or have multiple value variables?

From v1.9.6 of data.table, we can cast multiple value.var columns simultaneously (and also use multiple aggregation functions in fun.aggregate). Please see ?dcast and the Efficient reshaping using data.tables vignette for more.

Here's how we could use dcast:

dcast(setDT(mydf), x1 ~ x2, value.var=c("salt", "sugar"))
#    x1 salt_1 salt_2 salt_3 sugar_1 sugar_2 sugar_3
# 1:  1      3      4      6       1       2       2
# 2:  2     10      3      9       5       3       6
# 3:  3     10      7      7       4       6       7

Why can't one have several `value.var` in `dcast`?

This question is very much related to your other question from earlier today.

@beginneR wrote in the comments that "As long as the existing data is already in long-format, I don't see any general need to melt it before casting." In my answer posted at your other question, I gave an example of when melt would be required, or rather, how to decide whether your data are long enough.

This question here is another example of when further melting would be required since point 3 in my answer is not satisfied.

To get the behavior you want, try the following:

C93L <- melt(Cars93, measure.vars = c("Price", "Weight"))
dcast(C93L, AirBags ~ DriveTrain + variable, mean, value.var = "value")
#              AirBags 4WD_Price 4WD_Weight Front_Price Front_Weight
# 1 Driver & Passenger       NaN        NaN    26.17273     3393.636
# 2        Driver only     21.38       3623    18.69286     2996.250
# 3               None     13.88       2987    12.98571     2703.036
#   Rear_Price Rear_Weight
# 1      33.20      3515.0
# 2      28.23      3463.5
# 3      14.90      3610.0

An alternative is to use aggregate to calculate the means, and then use reshape or dcast to go from "long" to "wide". Both are required since reshape does not perform any aggregation:

temp <- aggregate(cbind(Price, Weight) ~ AirBags + DriveTrain, 
                  Cars93, mean)
#              AirBags DriveTrain    Price   Weight
# 1        Driver only        4WD 21.38000 3623.000
# 2               None        4WD 13.88000 2987.000
# 3 Driver & Passenger      Front 26.17273 3393.636
# 4        Driver only      Front 18.69286 2996.250
# 5               None      Front 12.98571 2703.036
# 6 Driver & Passenger       Rear 33.20000 3515.000
# 7        Driver only       Rear 28.23000 3463.500
# 8               None       Rear 14.90000 3610.000

reshape(temp, direction = "wide", 
        idvar = "AirBags", timevar = "DriveTrain")
#              AirBags Price.4WD Weight.4WD Price.Front Weight.Front
# 1        Driver only     21.38       3623    18.69286     2996.250
# 2               None     13.88       2987    12.98571     2703.036
# 3 Driver & Passenger        NA         NA    26.17273     3393.636
#   Price.Rear Weight.Rear
# 1      28.23      3463.5
# 2      14.90      3610.0
# 3      33.20      3515.0

dcast with multiple variables in val.var option

The error occurs when we are using reshape2::dcast instead of data.table::dcast because reshape2::dcast doesn't support more than one value.var.

The documentation for ?reshape2::dcast gives

value.var - name of column which stores values, see guess_value for default strategies to figure this out.

while in ?data.table::dcast it is

value.var - Name of the column whose values will be filled to cast. Function guess() tries to, well, guess this column automatically, if none is provided. Cast multiple value.var columns simultaneously by passing their names as a character vector. See Examples.

With a small reproducible example

data(mtcars)
dcast(mtcars, vs + am ~ carb, fun.aggregate = sum, value.var = c('mpg', 'disp'))

Error in .subset2(x, i, exact = exact) : subscript out of bounds
In addition: Warning messages:
1: In dcast(mtcars, vs + am ~ carb, fun.aggregate = sum, value.var = c("mpg",

If we convert to data.table

library(data.table)
dcast(as.data.table(mtcars), vs + am ~ carb, fun.aggregate = sum, value.var = c('mpg', 'disp'))
#   vs am mpg_1 mpg_2 mpg_3 mpg_4 mpg_6 mpg_8 disp_1 disp_2 disp_3 disp_4 disp_6 disp_8
#1:  0  0   0.0  68.6  48.9  63.1   0.0     0    0.0 1382.0  827.4 2082.0      0      0
#2:  0  1   0.0  26.0   0.0  57.8  19.7    15    0.0  120.3    0.0  671.0    145    301
#3:  1  0  61.0  47.2   0.0  37.0   0.0     0  603.1  287.5    0.0  335.2      0      0
#4:  1  1 116.4  82.2   0.0   0.0   0.0     0  336.8  291.8    0.0    0.0      0      0

In the OP's code, it would be

summary_out <- dcast(setDT(DB1), 
                 REGION_ID + REGION_NAME ~ STATUS,
                 fun.aggregate = sum, 
                 value.var = c("SALES","PROFIT"))

Error using dcast with multiple value.var

I encountered this same thing and it was frustrating as heck.

The answer/problem is that you need to "force" the data.table dcast function otherwise it will use the reshape2 function

The only way I was successfull was running dcast as follows:

# multiple value.var
data.table::dcast(dt, x + y ~ z, fun=sum, value.var=c("d1","d2"))

reshape2: dcast when there are multiple values for one cell but keep this values

This can be done with dcast (here from data.table) though you need a row identifier.

library(data.table)
dcast(dt, HLA_Status + rowid(HLA_Status, variable) ~ variable)
#   HLA_Status HLA_Status_1  CCL24   SPP1
#1:         PC            1  5.698  2.698
#2:         PC            2 89.457  9.457
#3:         PC            3 78.230  8.230
#4:         PP            1  9.645 23.120
#5:         PP            2 56.320 36.320
#6:         PP            3  7.268 17.268

data

dt <- fread("    HLA_Status    variable      value
     PP            CCL24       9.645
     PP            CCL24       56.32
     PP            CCL24       7.268
     PC            CCL24       5.698
     PC            CCL24       89.457
     PC            CCL24       78.23
     PP            SPP1        23.12
     PP            SPP1        36.32
     PP            SPP1        17.268
     PC            SPP1        2.698
     PC            SPP1        9.457
     PC            SPP1        8.23")

dcast data.table with multiple value.var's of different classes

An imperfect method:

inDT[, rn := rowid(id)]
Filter(function(z) !all(is.na(z)),
       dcast(inDT, rn ~ id, value.var = list("int_value", "num_value", "timestamp_value")))
#        rn int_value_int_id_1 int_value_int_id_2 num_value_num_id timestamp_value_timestamp_id
#     <int>              <int>              <int>            <num>                       <POSc>
#  1:     1               2020                  1              0.1          2021-09-23 09:15:41
#  2:     2                 NA                  2              0.2          2021-09-23 09:15:40
#  3:     3                 NA                  3              0.3          2021-09-23 09:15:39
#  4:     4                 NA                  4              0.4          2021-09-23 09:15:38
#  5:     5                 NA                  5              0.5          2021-09-23 09:15:37
#  6:     6                 NA                  6              0.6          2021-09-23 09:15:36
#  7:     7                 NA                  7              0.7          2021-09-23 09:15:35
#  8:     8                 NA                  8              0.8          2021-09-23 09:15:34
#  9:     9                 NA                  9              0.9          2021-09-23 09:15:33
# 10:    10                 NA                 10              1.0          2021-09-23 09:15:32

Note: I had to add rn, a column indicating row number within each id, since pivoting operations require the premise of associating rows together.

dcast specific column and keep all

This might not be exactly what you want because you have a separate column for value. Then, what do you put under PPT, TMAX and TMIN?

Here's how to put the value under the appropriate column with dplyr and tidyr:

library(dplyr)
library(tidyr)
df1 %>%
spread(element,value)
        date year month day gridNumber    PPT    TMAX    TMIN
1 1899-12-15 1899    12  15     526228 0.0000 43.4782 21.7403
2 1899-12-16 1899    12  16     526228 0.0000 43.3297 20.7510
3 1899-12-17 1899    12  17     526229 0.0000 57.3625 25.8157
4 1899-12-18 1899    12  18     526229 0.2105      NA      NA

Can be written in one line using tidyr only:

spread(df1,element,value)

dcast for numeric and character columns in R - returning length by default

We can specify length in fun.aggregate if the length is needed

library(data.table)
dcast(setDT(data), zip + date + calories ~ data_source, 
       value.var=c("user","price"), length)

Based on the data showed, there are no duplicates, so it would work

dcast(setDT(data), zip + date + calories ~ data_source, value.var=c("user","price"))

If there are duplicates, make a correction to have unique combinations by adding rowid for the grouping variable

dcast(setDT(data), rowid(zip, date, calories) + zip + date + calories 
          ~ data_source, value.var=c("user","price"))