How to Use "Cast" in Reshape Without Aggregation

reshape2 dcast without aggregation - problems with seq

You may need to create a sequence column

mymelt$indx <- with(mymelt, ave(seq_along(species), species, FUN=seq_along))
dcast(mymelt, sample+variable+indx~species, value.var='value', fill=0)
#        sample variable indx  M. edulis Mytilus sp.
#1  41411elba_2  cell_nr    2    5107.51        0.00
#2  41411elba_2   biovol    4 1021502.16        0.00
#3   41442bay_1  cell_nr    1       0.00        6.22
#4   41442bay_1   biovol    4       0.00     1243.04
#5   41443bay_2  cell_nr    1     599.14        0.00
#6   41443bay_2   biovol    3  114028.15        0.00
#7   41502bay_3  cell_nr    3       0.00        2.74
#8   41502bay_3   biovol    6       0.00      548.21
#9  41502elba_1  cell_nr    2       0.00        1.35
#10 41502elba_1   biovol    5       0.00      260.64

EDIT

If the dataset still has duplicates, try

 mymelt$indx <- with( mymelt,
                      ave(seq_along(species),
                          species,
                          sample,
                          FUN=seq_along
                      )
                )
 dcast(mymelt, sample+variable+indx~species, value.var='value', fill=0)

data

 mydata <- structure(
               list(sample  = c("41442bay_1", "41502elba_1", "41502bay_3", "41443bay_2", "41411elba_2"),
                    species = c("Mytilus sp.", "Mytilus sp.", "Mytilus sp.", "M. edulis", "M. edulis"),
                    cell_nr = c(6.22, 1.35, 2.74, 599.14, 5107.51),
                    biovol  = c(1243.04, 260.64, 548.21, 114028.15, 1021502.16)
               ),
               .Names    = c("sample", "species", "cell_nr", "biovol"),
               class     = "data.frame",
               row.names = c("1", "2", "3", "4", "5")
           )

Reshape in R without aggregation (for example MTurk response strings)

If your data is in a data.table ~~it's a one-liner~~ can be done as follows:

library(data.table)    
mturk.dt <- as.data.table(mturk)

mturk.dt[, as.list(
         rbind(c(Answer.Q1thing, AssignmentStatus))
         )
        , by=list(Id=Input.id, State=Input.State)]

Note that the by argument handles the name-changing too!

If you want to properly name the other columns, use setnames after the fact or, more dynamically, using setattr within the j=.. argument as follows:

After the Fact:

## Assuming 'res' is the reshaped data.table form above:
## Change the names of the six V1, V2.. columns 
setnames(res, paste0("V", 1:6), c(paste0("Answer", 1:3), paste0("Status", 1:3)))

Dynamically, in `j=..`

## Use `as.data.table` instead of `as.list`, to preserve new names
mturk.dt[, as.data.table(
         rbind(c(
              setattr(Answer.Q1thing,   "names", paste0("Answer", seq(Answer.Q1thing  )))
            , setattr(AssignmentStatus, "names", paste0("Status", seq(AssignmentStatus)))
            ))
         )
        , by=list(Id=Input.id, State=Input.State)]

       Id State Answer1 Answer2  Answer3  Status1  Status2  Status3
1: 134231    NY Myguess Myguess BadGuess Approved Approved Approved
2: 134812    CA Myguess Myguess BadGuess Approved Approved Approved

Can dcast be used without an aggregate function?

I don't think there is a way to do it directly but we can add in an additional column which will help us out

df2 <- structure(list(id = c("A", "B", "C", "A", "B", "C", "C"), cat = c("SS", 
"SS", "SS", "SV", "SV", "SV", "SV"), val = c(220L, 222L, 223L, 
224L, 225L, 220L, 1L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA, 
-7L))

library(reshape2)
library(plyr)
# Add a variable for how many times the id*cat combination has occured
tmp <- ddply(df2, .(id, cat), transform, newid = paste(id, seq_along(cat)))
# Aggregate using this newid and toss in the id so we don't lose it
out <- dcast(tmp, id + newid ~ cat, value.var = "val")
# Remove newid if we want
out <- out[,-which(colnames(out) == "newid")]
> out
#  id  SS  SV
#1  A 220 224
#2  B 222 225
#3  C 223 220
#4  C  NA   1

Reshape DF from long to wide in R using Reshape2 without an aggregation function

We can use dcast from data.table, which can take multiple value.var columns. Convert the 'data.frame' to 'data.table' (setDT(df)), use the dcast with formula and value.var specified.

library(data.table)
dcast(setDT(df), id~gid, value.var=names(df)[2:6])

NOTE: The data.table method would be faster compared to the reshape2

Reshape data in R, cast function arguments

The OP asked for help with the arguments to the cast() function of the reshape package. However, the reshape package was superseded by the reshape2 package from the same package author. According to the package description, the reshape2 package is

A Reboot of the Reshape Package

Using reshape2, the desired result can be produced with

reshape2::dcast(wc, PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length, 
                value.var = "TARGET_TYPE")
#  PARENT_MOL_CHEMBL_ID ABL EGFR TP53
#1                  C10   1    1    0
#2                 C939   0    0    1

BTW: The data.table package has implemented (and enhanced) dcast() as well. So, the same result can be produced with

data.table::dcast(wc, PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length, 
                  value.var = "TARGET_TYPE")

Additional columns

The OP mentioned other columns in the data frame which should be shown together with the spread or wide data. Unfortunately, the OP hasn't supplied particular sample data, so we have to consider two use cases.

Case 1: Additional columns go along with the id column

The data could look like

wc
#  PARENT_MOL_CHEMBL_ID TARGET_TYPE extra_col1
#1                  C10         ABL          a
#2                  C10        EGFR          a
#3                 C939        TP53          b

Note that the values in extra_col1 are in line with PARENT_MOL_CHEMBL_ID.

This is an easy case, because the formula in dcast() accepts ... which represents all other variables not used in the formula:

reshape2::dcast(wc, ... ~ TARGET_TYPE, fun.aggregate = length, 
                value.var = "TARGET_TYPE")
#  PARENT_MOL_CHEMBL_ID extra_col1 ABL EGFR TP53
#1                  C10          a   1    1    0
#2                 C939          b   0    0    1

The resulting data.frame does contain all other columns.

Case2: Additional columns don't go along with the id column

Now, another column is added:

wc
#  PARENT_MOL_CHEMBL_ID TARGET_TYPE extra_col1 extra_col2
#1                  C10         ABL          a          1
#2                  C10        EGFR          a          2
#3                 C939        TP53          b          3

Note that extra_col2 has two different values for C10. This will cause the simple approach to fail. So, a two step approach has to be implemented: reshaping first and joining afterwards with the original data frame. The data.table package is used for both steps, now:

library(data.table)
# reshape from long to wide, result has only one row per id column
wide <- dcast(setDT(wc), PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length, 
                value.var = "TARGET_TYPE")
# right join, i.e., all rows of wc are included
wide[wc, on = "PARENT_MOL_CHEMBL_ID"]
#   PARENT_MOL_CHEMBL_ID ABL EGFR TP53 TARGET_TYPE extra_col1 extra_col2
#1:                  C10   1    1    0         ABL          a          1
#2:                  C10   1    1    0        EGFR          a          2
#3:                 C939   0    0    1        TP53          b          3

The result shows the aggregated values in wide format together with any other columns.

Using Reshape Cast

reshape and reshape2 are both deprecated packages at this point. If you use Hadley's latest version, tidyr:

spread(df, key = companyName, value = val)

  market bar foo
1      a   4   1
2      b   5   2
3      c   6   3

Casting and Melting with reshape in R

Perhaps you're interested in a tidyverse alternative

library(tidyverse)
df %>%
    gather(Year, val, -Variables, -Country) %>%
    spread(Variables, val)
#  Country Year Happiness Power
#1  France 2000      1872  1213
#2  France 2001      2345  1234
#3      UK 2000      2234  1726
#4      UK 2001      9082  6433

Or using reshape2::melt and reshape2::dcast

reshape2::dcast(
    reshape2::melt(df, id.vars = c("Country", "Variables"), variable.name = "Year"),
    Country + Year ~ Variables)
#        Country Year Happiness Power
#1  France 2000      1872  1213
#2  France 2001      2345  1234
#3      UK 2000      2234  1726
#4      UK 2001      9082  6433

Or (identically) using data.table::melt and data.table::dcast

data.table::dcast(
    data.table::melt(df, id.vars = c("Country", "Variables"), variable.name = "Year"), 
    Country + Year ~ Variables)
#  Country Year Happiness Power
#1  France 2000      1872  1213
#2  France 2001      2345  1234
#3      UK 2000      2234  1726
#4      UK 2001      9082  6433

In terms of performance/runtime, I imagine the data.table or tidyr solutions to be the most efficient. You can check by running a microbenchmark on some larger sample data.

Sample data

df <-read.table(text =
    "  Variables Country  2000  2001
1 Power     France   1213  1234
2 Happiness France   1872  2345
3 Power     UK       1726  6433
4 Happiness UK       2234  9082", header = T)
colnames(df)[3:4] <- c("2000", "2001")

Benchmark analysis

Following results from a microbenchmark analysis of the four methods, based on a (slightly) larger 78x22 sample dataset.

set.seed(2017)
df <- data.frame(
    Variables = rep(c("Power", "Happiness", "something_else"), 26),
    Country = rep(LETTERS[1:26], each = 3),
    matrix(sample(10000, 20 * 26 * 3), nrow = 26 * 3))
colnames(df)[3:ncol(df)] <- 2000:2019

library(microbenchmark)
library(tidyr)

res <- microbenchmark(
    reshape2 = {
        reshape2::dcast(
            reshape2::melt(df, id.vars = c("Country", "Variables"), variable.name = "Year"),
            Country + Year ~ Variables)
    },
    tidyr = {
        df %>%
            gather(Year, val, -Variables, -Country) %>%
            spread(Variables, val)
    },
    datatable = {
        data.table::dcast(
            data.table::melt(df, id.vars = c("Country", "Variables"), variable.name = "Year"),
            Country + Year ~ Variables)
    },
    reshape = {
        reshape::cast(reshape::melt(df), Country + variable ~ Variables)
    }
)
res
#Unit: milliseconds
#      expr       min        lq      mean    median        uq       max neval
#  reshape2  3.088740  3.449686  4.313044  3.919372  5.112560  7.856902   100
#     tidyr  4.482361  4.982017  6.215872  5.771133  6.931964 28.293377   100
# datatable  3.179035  3.511542  4.861192  4.040188  5.123103 46.010810   100
#   reshape 27.371094 30.226222 32.425667 32.504644 34.118499 41.286803   100

library(ggplot2)
autoplot(res)

Sample Image

How to Use "Cast" in Reshape Without Aggregation

reshape2 dcast without aggregation - problems with seq

EDIT

data

Reshape in R without aggregation (for example MTurk response strings)

After the Fact:

Dynamically, in `j=..`

Can dcast be used without an aggregate function?

Reshape DF from long to wide in R using Reshape2 without an aggregation function

Reshape data in R, cast function arguments

Additional columns

Case 1: Additional columns go along with the id column

Case2: Additional columns don't go along with the id column

Using Reshape Cast

Casting and Melting with reshape in R

Sample data

Benchmark analysis

Related Topics

Leave a reply

reshape2 dcast without aggregation - problems with seq

EDIT

data

Reshape in R without aggregation (for example MTurk response strings)

After the Fact:

Dynamically, in j=..

Can dcast be used without an aggregate function?

Reshape DF from long to wide in R using Reshape2 without an aggregation function

Reshape data in R, cast function arguments

Additional columns

Case 1: Additional columns go along with the id column

Case2: Additional columns don't go along with the id column

Using Reshape Cast

Casting and Melting with reshape in R

Sample data

Benchmark analysis

Related Topics

Leave a reply

Dynamically, in `j=..`