`dcast` with empty RHS
This is now possible using the rowid
function:
dcast(DT, id ~ rowid(id), value.var = "var")
# id 1 2 3
# 1: 6 1.1050942 0.1271620 1.3051373
# 2: 7 -0.5441056 -0.6866828 -0.8083762
# 3: 8 -0.6812820 -1.1934716 -1.3913903
# 4: 9 -0.3462497 -0.8229276 -1.0884394
# 5: 10 -0.4600681 0.6173795 -1.0125658
See ?rowid
for more options, examples, and explanation.
Make the `drop` argument in `dcast` only look at the RHS of the formula
Just implemented in data.table development version v1.9.7, commit 2113, closes #1512.
require(data.table) # v1.9.7, commit 2113+
dcast(DT, ... ~ v2, value.var = "v3", drop = c(TRUE, FALSE))
# v1 ID 1 2 3 4 5 6
# 1: 1.105 1 NA 3 2 NA 2 NA
# 2: 2.012 2 5 4 NA NA NA 3
R: include factors with no entries when using dcast
Try adding drop = FALSE
to your dcast call, so that unused factor levels are not dropped:
dcast(dataDF, id ~ x + y, fill = 0, drop = FALSE)
id t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 0 0 0 0 0 0 0
2 2 0 1 0 0 0 0 0 0
3 3 1 0 0 0 0 0 0 0
4 4 0 0 0 0 0 1 0 0
5 5 0 0 0 0 0 1 0 0
6 6 0 0 0 0 0 0 1 0
For your aside, yes, we just need to tell dcast
what you want using a function to aggregate
, in this case you want length
:
data2 <- dataDF[,1:3]
dcast(data2, id ~ x + y, fill = 0, drop = FALSE, fun.aggregate = length)
For your edit, I'd use tidyr
and dplyr
rather than reshape2
:
library(tidyr)
library(dplyr)
dataDF %>% left_join(expand.grid(x = levels(dataDF$x), y = levels(dataDF$y)), .) %>%
unite(z, x, y) %>%
spread(z, value, fill = 0) %>%
na.omit
First we complete all combination of x and y using expand.grid
and merging, then we unite
them into one column, z, then we spread
them out, then remove the NAs from the id columns:
id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 1 0 0 0 0 0 0 0
2 2 2 0 1 0 0 0 0 0 0
3 3 3 1 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 1 0 0
5 5 2 0 0 0 0 0 1 0 0
6 6 3 0 0 0 0 0 0 1 0
Transpose / reshape dataframe without timevar from long to wide format
Assuming your data is in the object dataset
:
library(plyr)
## Add a medication index
data_with_index <- ddply(dataset, .(Name), mutate,
index = paste0('medication', 1:length(Name)))
dcast(data_with_index, Name ~ index, value.var = 'MedName')
## Name medication1 medication2 medication3
## 1 Name1 atenolol 25mg aspirin 81mg sildenafil 100mg
## 2 Name2 atenolol 50mg enalapril 20mg <NA>
Rearrange DT by id, so that all different values are put in columns next to each other
We can use dcast
library(data.table)
res <- dcast(DT, id~rowid(id), value.var = c("X", "Y", "A"))
res[, c("id", names(res)[-1][order(sub("\\D+", "", names(res)[-1]))]), with = FALSE]
R spread data frame
let x be your data frame
library(data.table)
library(lubridate)
dt <- data.table(x)
# date should not be factors
dt[, Date.y := ymd(Date.y)]
setorder(dt, Account.Name, -Date.y)
dt[, col_index := 0:(.N-1L), by = Account.Name]
dt_casted <- dcast(dt, Account.Name ~ col_index, value.var = "EI")
Note I didn't use "date_0" format because I believe you will want them sorted, while "date_10" will have wrong order compare to "date_2". Better keep the index as numeric, or pad with leading 0.
How to change type of target column when doing := by group in a data.table in R?
We can convert the class of 'x' column to 'numeric' before assigning the 'mean(y)' to 'x' as the class of 'x' is 'integer'. This may be useful if we are replacing 'x' with the mean
of any other numeric variable (including 'x').
db[, x:= as.numeric(x)][, x:= mean(y), by=id][]
Or assign to a new column, and change the column name afterwards
setnames(db[, x1:= mean(y),by=id][,x:=NULL],'x1', 'x')
Or we can assign 'x' to 'NULL' and then create 'x' as the mean
of 'y' ( @David Arenburg's suggestion)
db[, x:=NULL][, x:= mean(y), by= id][]
Related Topics
How to Cross-Paste All Combinations of Two Vectors (Each-To-Each)
Get Connected Components Using Igraph in R
Make Sequential Numeric Column Names Prefixed with a Letter
What Evaluates to True/False in R
How to Deal with Spaces in Column Names
Return Df with a Columns Values That Occur More Than Once
Geom_Tile and Facet_Grid/Facet_Wrap for Same Height of Tiles
Why the Built-In Lm Function Is So Slow in R
How to Facet a Plot_Ly() Chart
Plot Data Over Background Image with Ggplot
Convert Factor to Date/Time in R
R Shiny Table Not Rendering HTML
How to Detect Free Variable Names in R Functions
How to Generalize Outer to N Dimensions
Removing Specific Rows from a Dataframe
How to Resolve the "No Font Name" Issue When Importing Fonts into R Using Extrafont