R reshape2 'Aggregation function missing: defaulting to length'
Thanks to @akrun who pointed it out.
Well, there's a high chance that your data has duplicate row that look either like this:
student test score
Adam Exam1 80
Adam Exam1 85
Adam Exam2 90
John Exam1 70
John Exam2 60
Or like this:
student class test score
Adam Biology Exam1 80
Adam Theology Exam1 85
Adam Theology Exam2 90
John Biology Exam1 70
John Theology Exam2 60
When you cast it like this: dcast(data, student + class ~ test, value.var='score')
Reshaping data in R with multiple variable levels - aggregate function missing warning
The data.table package extended dcast
with rowid
and allowing multiple value.var
, so...
library(data.table)
dcast(setDT(DF), id ~ rowid(id), value.var=setdiff(names(DF), "id"))
id visit.date_1 visit.date_2 visit.id_1 visit.id_2 bill.num_1 bill.num_2 dx.code_1 dx.code_2 FY_1 FY_2 Dx.num_1 Dx.num_2
1: 1 1/2/12 3/4/12 203 506 1234 4567 409 512 2012 2013 1 1
2: 2 5/6/18 5/6/18 222 222 3452 3452 488 122 2018 2018 1 2
3: 3 2/9/14 <NA> 567 NA 6798 NA 923 NA 2014 NA 1 NA
Can dcast be used without an aggregate function?
I don't think there is a way to do it directly but we can add in an additional column which will help us out
df2 <- structure(list(id = c("A", "B", "C", "A", "B", "C", "C"), cat = c("SS",
"SS", "SS", "SV", "SV", "SV", "SV"), val = c(220L, 222L, 223L,
224L, 225L, 220L, 1L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA,
-7L))
library(reshape2)
library(plyr)
# Add a variable for how many times the id*cat combination has occured
tmp <- ddply(df2, .(id, cat), transform, newid = paste(id, seq_along(cat)))
# Aggregate using this newid and toss in the id so we don't lose it
out <- dcast(tmp, id + newid ~ cat, value.var = "val")
# Remove newid if we want
out <- out[,-which(colnames(out) == "newid")]
> out
# id SS SV
#1 A 220 224
#2 B 222 225
#3 C 223 220
#4 C NA 1
dcast for numeric and character columns in R - returning length by default
We can specify length
in fun.aggregate
if the length
is needed
library(data.table)
dcast(setDT(data), zip + date + calories ~ data_source,
value.var=c("user","price"), length)
Based on the data showed, there are no duplicates, so it would work
dcast(setDT(data), zip + date + calories ~ data_source, value.var=c("user","price"))
If there are duplicates, make a correction to have unique combinations by adding rowid
for the grouping variable
dcast(setDT(data), rowid(zip, date, calories) + zip + date + calories
~ data_source, value.var=c("user","price"))
Related Topics
Twitter Sentiment Analysis W R Using German Language Set Sentiws
Installing R Studio with Anaconda
R- Plot Numbers Instead of Points
Unpacking and Merging Lists in a Column in Data.Frame
Prevent Knitr/Rmarkdown from Interleaving Chunk Output with Code
Unexpected Behaviour with Argument Defaults
Rmarkdown::Render() in a Loop - Cannot Allocate Vector of Size
How to Configure Box.Color in Directlabels "Draw.Rects"
Removing Attributes of Columns in Data.Frames on Multilevel Lists in R
Flatten Nested Lists in a List
How to Put a Complicated Equation into a R Formula
Assign Column Names to List of Dataframes
Find the Nearest X,Y Coordinate Using R
Knitr Inline Chunk Options (No Evaluation) or Just Render Highlighted Code