How to Use a Character Vector of Column Names in the Formula Argument of Dcast (Reshape2)

How to use a character vector of column names in the formula argument of dcast (reshape2)

You can use as.formula to construct a formula.

Here's an example:

library(reshape2)
## Example from `melt.data.frame`
names(airquality) <- tolower(names(airquality))
df_id <- c("month", "day")
aq <- melt(airquality, id = df_id)

## Constructing the formula
f <- as.formula(paste(paste(df_id, collapse = " + "), "~ variable"))

## Applying it....
dcast(aq, f, value.var = "value", fun.aggregate = mean)

reshape2: Passing args to dcast

dcast has 'formula' as one of the arguments. So, as an intermediate step you could create a formula-string using your colnames, and put that in the call to dcast:

data <- expand.grid(a=LETTERS[1:5],b=c("A","B"))
data$count=1:10

fields <- colnames(data)
casting_formula = sprintf("%s ~ %s", fields[1],fields[2])

dcast(data=data,value.var="count",formula=casting_formula)

a A B
1 A 1 6
2 B 2 7
3 C 3 8
4 D 4 9
5 E 5 10

Reshape data frame in R to wide format

You can create the formula as a string, and then use as.formula()

In lhs I'm grabbing all the column names that aren't DATA or measurement by using setdiff().

library(reshape2)
lhs <- paste0(setdiff(names(DF), c("DATA", "measurement")), collapse = "+")

dcast(DF, as.formula(paste0(lhs, "~ measurement")), fun.aggregate = mean, value.var = "DATA", drop = TRUE)
# X Y A B C
# 1 2 -1 200 100 200
# 2 3 -3 200 450 450

Make the `drop` argument in `dcast` only look at the RHS of the formula

Just implemented in data.table development version v1.9.7, commit 2113, closes #1512.

require(data.table) # v1.9.7, commit 2113+
dcast(DT, ... ~ v2, value.var = "v3", drop = c(TRUE, FALSE))
# v1 ID 1 2 3 4 5 6
# 1: 1.105 1 NA 3 2 NA 2 NA
# 2: 2.012 2 5 4 NA NA NA 3

reshape2: multiple results of aggregation function?

This question has multiple answers, due to the flexibility of the 'reshape2' and 'plyr' packages. I will show one of the easiest examples to understand here:

library(reshape2)
library(plyr)

aqm <- melt(airquality, id=c("Month", "Day"), na.rm=TRUE)
aqm_ply <- ddply(aqm, .(Month, variable), summarize, min=min(value), max=max(value))
aqm_melt <- melt(aqm_ply, id=c("Month", "variable"), variable.name="variable2")
dcast(aqm_melt, Month ~ variable + variable2)

# Month Ozone_min Ozone_max Solar.R_min Solar.R_max Wind_min Wind_max Temp_min Temp_max
# 1 5 1 115 8 334 5.7 20.1 56 81
# 2 6 12 71 31 332 1.7 20.7 65 93
# 3 7 7 135 7 314 4.1 14.9 73 92
# 4 8 9 168 24 273 2.3 15.5 72 97
# 5 9 7 96 14 259 2.8 16.6 63 93

Step 1: Let's break it down into steps. First, let's leave the definition of 'aqm' alone and work from the melted data. This will make the example easier to understand.

aqm <- melt(airquality, id=c("Month", "Day"), na.rm=TRUE)

# Month Day variable value
# 1 5 1 Ozone 41.0
# 2 5 2 Ozone 36.0
# 3 5 3 Ozone 12.0
# 4 5 4 Ozone 18.0
# ...
# 612 9 30 Temp 68.0

Step 2: Now, we want to replace the 'value' column with 'min' and 'max' columns. We can accomplish this with the 'ddply' function from the 'plyr' package. To do this, we use the 'ddply' function (data frame as input, data frame as output, hence "dd"-ply). We first specify the data.

ddply(aqm,

And then we specify the variables we want to use to group our data, 'Month' and 'variable'. We use the . function to refer to this variables directly, instead of referring to the values they contain.

ddply(aqm, .(Month, variable),

Now we need to choose an aggregating function. We choose the summarize function here, because we have columns ('Day' and 'value') that we don't want to include in our final data. The summarize function will strip away all of the original, non-grouping columns.

ddply(aqm, .(Month, variable), summarize,

Finally, we specify the calculation to do for each group. We can refer to the columns of the original data frame ('aqm'), even though they will not be contained in our final data frame. This is how it looks:

aqm_ply <- ddply(aqm, .(Month, variable), summarize, min=min(value), max=max(value))

# Month variable min max
# 1 5 Ozone 1.0 115.0
# 2 5 Solar.R 8.0 334.0
# 3 5 Wind 5.7 20.1
# 4 5 Temp 56.0 81.0
# 5 6 Ozone 12.0 71.0
# 6 6 Solar.R 31.0 332.0
# 7 6 Wind 1.7 20.7
# 8 6 Temp 65.0 93.0
# 9 7 Ozone 7.0 135.0
# 10 7 Solar.R 7.0 314.0
# 11 7 Wind 4.1 14.9
# 12 7 Temp 73.0 92.0
# 13 8 Ozone 9.0 168.0
# 14 8 Solar.R 24.0 273.0
# 15 8 Wind 2.3 15.5
# 16 8 Temp 72.0 97.0
# 17 9 Ozone 7.0 96.0
# 18 9 Solar.R 14.0 259.0
# 19 9 Wind 2.8 16.6
# 20 9 Temp 63.0 93.0

Step 3: We can see that the data is vastly reduced, since the ddply function has aggregated the lines. Now we need to melt the data again, so we can get our second variable for the final data frame. Note that we need to specify a new variable.name argument, so we don't have two columns named "variable".

aqm_melt <- melt(aqm_ply, id=c("Month", "variable"), variable.name="variable2")

# Month variable variable2 value
# 1 5 Ozone min 1.0
# 2 5 Solar.R min 8.0
# 3 5 Wind min 5.7
# 4 5 Temp min 56.0
# 5 6 Ozone min 12.0
# ...
# 37 9 Ozone max 96.0
# 38 9 Solar.R max 259.0
# 39 9 Wind max 16.6
# 40 9 Temp max 93.0

Step 4: And we can finally wrap it all up by casting our data into the final form.

dcast(aqm_melt, Month ~ variable + variable2)

# Month Ozone_min Ozone_max Solar.R_min Solar.R_max Wind_min Wind_max Temp_min Temp_max
# 1 5 1 115 8 334 5.7 20.1 56 81
# 2 6 12 71 31 332 1.7 20.7 65 93
# 3 7 7 135 7 314 4.1 14.9 73 92
# 4 8 9 168 24 273 2.3 15.5 72 97
# 5 9 7 96 14 259 2.8 16.6 63 93

Hopefully, this example will give you enough understanding to get you started. Be aware that a new, data frame-optimized version of the 'plyr' package is being actively developed under the name 'dplyr', so you may want to be ready to convert your code to the new package after it becomes more fully fledged.

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

Prevent reshape2 from converting column headings to numbers

Include as.is=TRUE in your melt function:

dat<-(melt(mdat, as.is=TRUE))
colnames(dat)<-c("from","to","km")

This will keep the column names as strings through the melting process.



Related Topics



Leave a reply



Submit