Object not found error with ddply inside a function
You can do this with a combination of do.call
and call
to construct the call in an environment where NewColName
is still visible:
myFunction <- function(x,y){
NewColName <- "a"
z <- do.call("ddply",list(x, y, summarize, Ave = call("mean",as.symbol(NewColName),na.rm=TRUE)))
return(z)
}
myFunction(d.f,sv)
b Ave
1 0 1.5
2 1 3.5
R - Object not found error when using ddply
One option would be using ave
from base R
test$ecdf_val <- with(test, ave(yearly_test_count, country,
FUN = function(x) ecdf(x)(x)*length(x)))
Object not found using ddply
The column name is 'C1' and 'C2' and not c.C1
and c.C2
plyr::ddply(dat_oe, c("subjects"), plyr::summarise,
C1 = mean(C1), C2 = mean(C2))
-output
# subjects C1 C2
#1 0_1_K05914 -0.00389484 5.64273e-04
#2 0_2_K06757 0.00760754 -2.54041e-04
#3 0_3_K06768 0.00268124 1.02475e-03
#4 0_4_K07479 -0.00302735 1.64945e-03
#5 0_5_K05811 0.00546419 -2.87371e-03
#6 0_6_K06786 0.00246136 -6.49155e-05
For multiple columns, we can use colwise
plyr::ddply(dat_oe, c("subjects"), plyr::colwise(mean, c("C1", "C2")))
-output
# subjects C1 C2
#1 0_1_K05914 -0.00389484 5.64273e-04
#2 0_2_K06757 0.00760754 -2.54041e-04
#3 0_3_K06768 0.00268124 1.02475e-03
#4 0_4_K07479 -0.00302735 1.64945e-03
#5 0_5_K05811 0.00546419 -2.87371e-03
#6 0_6_K06786 0.00246136 -6.49155e-05
With dplyr
, we can use (version >= 1.0
)
library(dplyr)
dat_oe %>%
group_by(subjects) %>%
summarise(across(c(C1, C2), mean), .groups = 'drop')
The .(subjects)
and .(C1, C2)
should also work. We loaded dplyr
as well, so used c("subjects")
and c("C1, "C2")
data
dat_oe <- structure(list(subjects = c("0_1_K05914", "0_2_K06757", "0_3_K06768",
"0_4_K07479", "0_5_K05811", "0_6_K06786"), PHENO = c(1L, 2L,
NA, 1L, 1L, 1L), AGE = c(-1.0912233, -0.2053317, 0, 1.2711544,
-0.6482775, 0.8282086), SEX = c(2L, 1L, NA, 1L, 1L, 1L), mean_HBA1C = c(0.15392621,
-0.30112172, -4.54101273, -0.09165522, -0.19277698, 0.24782498
), DDuration = c(2.5936581, 0.807564, 0, 1.691769, 0.8606163,
0.3300932), REN_INSF = c(0L, 0L, NA, 0L, 0L, 0L), C1 = c(-0.00389484,
0.00760754, 0.00268124, -0.00302735, 0.00546419, 0.00246136),
C2 = c(0.000564273, -0.000254041, 0.00102475, 0.00164945,
-0.00287371, -6.49155e-05), C3 = c(0.0105879, 0.000225929,
0.00397415, -0.00575519, -0.0134994, 0.00205081), C4 = c(0.00523132,
0.00701527, 0.0102865, 0.00229313, 0.00587083, -0.00726134
), C5 = c(-0.00652487, 0.00365001, -0.000763843, 0.00242112,
0.013513, -0.00206848), C6 = c(0.000303767, -0.00130774,
0.0023347, -0.00214576, 0.0104223, 0.00592337), C7 = c(-0.00292409,
0.00437073, -0.00981626, -0.00560128, 0.00447568, 0.000567228
)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6"))
ddply error when the aggregation function is defined within another function
Try:
result <- ddply(dfx, .(group, sex), here(summarize), max_age = helper(age))
From the help page for here
:
This function captures the current context, making it easier to use **ply with functions that do special evaluation and need access to the environment where ddply was called from.
summarize
is one such special function.
ddply not returning values from function split by variable
What you need to be doing:
ddply(adhd_p, "pid", summarise,
hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])),
falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(sdt=="Correct Reject")[[2,2]])))
Why you need to be doing it:
When you call ddply
, the function works within the .data
(adhd_p
in your case) as the local namespace. This is similar to calling attach(adhd_p)
; calling the name of a column without referencing the dataframe explicitly still calls the correct column.
When you supply the summarise
argument, the function splits up vectors in the local namespace based on the the id columns supplied (in this case, pid
). So, if you reference columns without referencing the dataframe explicitly as above, calculations will be done with the portion of the sdt
column corresponding to each pid
. However, if you reference the column and dataframe explictly (adhd_p$sdt
in your case), it just pulls in the entire vector from the global namespace and doesn't split it appropriately.
Edit: the code below is both less messy and won't raise an error if one of the values is missing:
ddply(adhd_p, .(pid, time), summarise,
hitrate=(sum(sdt=="Hit"))/(sum(sdt=="Hit"))+(sum(sdt=="Miss")),
falsealarmrate=(sum(sdt=="False Alarm"))/(sum(sdt=="False Alarm"))+(sum(sdt=="Correct Reject")))
I do not understand error object not found inside the function
The problem is that splom
evaluates its groups
argument in a nonstandard way.A quick fix is to rewrite your function so that it constructs the call with the appropriate syntax:
f <- function(data, id)
eval(substitute(splom(data, groups=.id), list(.id=id)))
# test it
ir <- iris[-5]
sp <- iris[, 5]
f(ir, sp)
Related Topics
In Ggplot2, What Do the End of the Boxplot Lines Represent
Reading Multiple CSV Files from a Folder into a Single Dataframe in R
How to Access and Edit Rprofile
Select Row with Most Recent Date by Group
Converting a \U Escaped Unicode String to Ascii
R Ifelse Avoiding Change in Date Format
Data Input via Shinytable in R Shiny Application
Remove Empty Documents from Documenttermmatrix in R Topicmodels
Case-Insensitive Search of a List in R
R: Lm() Result Differs When Using 'Weights' Argument and When Using Manually Reweighted Data
Improve Centering County Names Ggplot & Maps
Too Few Periods for Decompose()
Subsetting a Dataframe for a Specified Month and Year
R Fuzzy String Match to Return Specific Column Based on Matched String