R Reshape2 'Aggregation Function Missing: Defaulting to Length'

R reshape2 'Aggregation function missing: defaulting to length'

Thanks to @akrun who pointed it out.

Well, there's a high chance that your data has duplicate row that look either like this:

student    test    score
Adam Exam1 80
Adam Exam1 85
Adam Exam2 90
John Exam1 70
John Exam2 60

Or like this:

student   class     test    score
Adam Biology Exam1 80
Adam Theology Exam1 85
Adam Theology Exam2 90
John Biology Exam1 70
John Theology Exam2 60

When you cast it like this: dcast(data, student + class ~ test, value.var='score')

Reshaping data in R with multiple variable levels - aggregate function missing warning

The data.table package extended dcast with rowid and allowing multiple value.var, so...

library(data.table)
dcast(setDT(DF), id ~ rowid(id), value.var=setdiff(names(DF), "id"))

id visit.date_1 visit.date_2 visit.id_1 visit.id_2 bill.num_1 bill.num_2 dx.code_1 dx.code_2 FY_1 FY_2 Dx.num_1 Dx.num_2
1: 1 1/2/12 3/4/12 203 506 1234 4567 409 512 2012 2013 1 1
2: 2 5/6/18 5/6/18 222 222 3452 3452 488 122 2018 2018 1 2
3: 3 2/9/14 <NA> 567 NA 6798 NA 923 NA 2014 NA 1 NA

Can dcast be used without an aggregate function?

I don't think there is a way to do it directly but we can add in an additional column which will help us out

df2 <- structure(list(id = c("A", "B", "C", "A", "B", "C", "C"), cat = c("SS", 
"SS", "SS", "SV", "SV", "SV", "SV"), val = c(220L, 222L, 223L,
224L, 225L, 220L, 1L)), .Names = c("id", "cat", "val"), class = "data.frame", row.names = c(NA,
-7L))

library(reshape2)
library(plyr)
# Add a variable for how many times the id*cat combination has occured
tmp <- ddply(df2, .(id, cat), transform, newid = paste(id, seq_along(cat)))
# Aggregate using this newid and toss in the id so we don't lose it
out <- dcast(tmp, id + newid ~ cat, value.var = "val")
# Remove newid if we want
out <- out[,-which(colnames(out) == "newid")]
> out
# id SS SV
#1 A 220 224
#2 B 222 225
#3 C 223 220
#4 C NA 1

dcast for numeric and character columns in R - returning length by default

We can specify length in fun.aggregate if the length is needed

library(data.table)
dcast(setDT(data), zip + date + calories ~ data_source,
value.var=c("user","price"), length)

Based on the data showed, there are no duplicates, so it would work

dcast(setDT(data), zip + date + calories ~ data_source, value.var=c("user","price"))

If there are duplicates, make a correction to have unique combinations by adding rowid for the grouping variable

dcast(setDT(data), rowid(zip, date, calories) + zip + date + calories 
~ data_source, value.var=c("user","price"))


Related Topics



Leave a reply



Submit