Dcast' with Empty Rhs

`dcast` with empty RHS

This is now possible using the rowid function:

dcast(DT, id ~ rowid(id), value.var = "var")
# id 1 2 3
# 1: 6 1.1050942 0.1271620 1.3051373
# 2: 7 -0.5441056 -0.6866828 -0.8083762
# 3: 8 -0.6812820 -1.1934716 -1.3913903
# 4: 9 -0.3462497 -0.8229276 -1.0884394
# 5: 10 -0.4600681 0.6173795 -1.0125658

See ?rowid for more options, examples, and explanation.

Make the `drop` argument in `dcast` only look at the RHS of the formula

Just implemented in data.table development version v1.9.7, commit 2113, closes #1512.

require(data.table) # v1.9.7, commit 2113+
dcast(DT, ... ~ v2, value.var = "v3", drop = c(TRUE, FALSE))
# v1 ID 1 2 3 4 5 6
# 1: 1.105 1 NA 3 2 NA 2 NA
# 2: 2.012 2 5 4 NA NA NA 3

R: include factors with no entries when using dcast

Try adding drop = FALSE to your dcast call, so that unused factor levels are not dropped:

dcast(dataDF, id ~ x + y, fill = 0, drop = FALSE)

id t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 0 0 0 0 0 0 0
2 2 0 1 0 0 0 0 0 0
3 3 1 0 0 0 0 0 0 0
4 4 0 0 0 0 0 1 0 0
5 5 0 0 0 0 0 1 0 0
6 6 0 0 0 0 0 0 1 0

For your aside, yes, we just need to tell dcast what you want using a function to aggregate, in this case you want length:

data2 <- dataDF[,1:3]
dcast(data2, id ~ x + y, fill = 0, drop = FALSE, fun.aggregate = length)

For your edit, I'd use tidyr and dplyr rather than reshape2:

library(tidyr)
library(dplyr)

dataDF %>% left_join(expand.grid(x = levels(dataDF$x), y = levels(dataDF$y)), .) %>%
unite(z, x, y) %>%
spread(z, value, fill = 0) %>%
na.omit

First we complete all combination of x and y using expand.grid and merging, then we unite them into one column, z, then we spread them out, then remove the NAs from the id columns:

  id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 1 0 0 0 0 0 0 0
2 2 2 0 1 0 0 0 0 0 0
3 3 3 1 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 1 0 0
5 5 2 0 0 0 0 0 1 0 0
6 6 3 0 0 0 0 0 0 1 0

Transpose / reshape dataframe without timevar from long to wide format

Assuming your data is in the object dataset:

library(plyr)
## Add a medication index
data_with_index <- ddply(dataset, .(Name), mutate,
index = paste0('medication', 1:length(Name)))
dcast(data_with_index, Name ~ index, value.var = 'MedName')

## Name medication1 medication2 medication3
## 1 Name1 atenolol 25mg aspirin 81mg sildenafil 100mg
## 2 Name2 atenolol 50mg enalapril 20mg <NA>

Rearrange DT by id, so that all different values are put in columns next to each other

We can use dcast

library(data.table)
res <- dcast(DT, id~rowid(id), value.var = c("X", "Y", "A"))
res[, c("id", names(res)[-1][order(sub("\\D+", "", names(res)[-1]))]), with = FALSE]

R spread data frame

let x be your data frame

library(data.table)
library(lubridate)
dt <- data.table(x)
# date should not be factors
dt[, Date.y := ymd(Date.y)]
setorder(dt, Account.Name, -Date.y)
dt[, col_index := 0:(.N-1L), by = Account.Name]
dt_casted <- dcast(dt, Account.Name ~ col_index, value.var = "EI")

Note I didn't use "date_0" format because I believe you will want them sorted, while "date_10" will have wrong order compare to "date_2". Better keep the index as numeric, or pad with leading 0.

How to change type of target column when doing := by group in a data.table in R?

We can convert the class of 'x' column to 'numeric' before assigning the 'mean(y)' to 'x' as the class of 'x' is 'integer'. This may be useful if we are replacing 'x' with the mean of any other numeric variable (including 'x').

db[, x:= as.numeric(x)][, x:= mean(y), by=id][]

Or assign to a new column, and change the column name afterwards

setnames(db[, x1:= mean(y),by=id][,x:=NULL],'x1', 'x')

Or we can assign 'x' to 'NULL' and then create 'x' as the mean of 'y' ( @David Arenburg's suggestion)

db[, x:=NULL][, x:= mean(y), by= id][]


Related Topics



Leave a reply



Submit