Object Not Found Error with Ddply Inside a Function

Object not found error with ddply inside a function

You can do this with a combination of do.call and call to construct the call in an environment where NewColName is still visible:

myFunction <- function(x,y){
NewColName <- "a"
z <- do.call("ddply",list(x, y, summarize, Ave = call("mean",as.symbol(NewColName),na.rm=TRUE)))
return(z)
}

myFunction(d.f,sv)
b Ave
1 0 1.5
2 1 3.5

R - Object not found error when using ddply

One option would be using ave from base R

test$ecdf_val <-  with(test, ave(yearly_test_count, country, 
FUN = function(x) ecdf(x)(x)*length(x)))

Object not found using ddply

The column name is 'C1' and 'C2' and not c.C1 and c.C2

plyr::ddply(dat_oe, c("subjects"), plyr::summarise,
C1 = mean(C1), C2 = mean(C2))

-output

#  subjects          C1           C2
#1 0_1_K05914 -0.00389484 5.64273e-04
#2 0_2_K06757 0.00760754 -2.54041e-04
#3 0_3_K06768 0.00268124 1.02475e-03
#4 0_4_K07479 -0.00302735 1.64945e-03
#5 0_5_K05811 0.00546419 -2.87371e-03
#6 0_6_K06786 0.00246136 -6.49155e-05

For multiple columns, we can use colwise

plyr::ddply(dat_oe, c("subjects"), plyr::colwise(mean, c("C1", "C2")))

-output

# subjects          C1           C2
#1 0_1_K05914 -0.00389484 5.64273e-04
#2 0_2_K06757 0.00760754 -2.54041e-04
#3 0_3_K06768 0.00268124 1.02475e-03
#4 0_4_K07479 -0.00302735 1.64945e-03
#5 0_5_K05811 0.00546419 -2.87371e-03
#6 0_6_K06786 0.00246136 -6.49155e-05

With dplyr, we can use (version >= 1.0)

library(dplyr)
dat_oe %>%
group_by(subjects) %>%
summarise(across(c(C1, C2), mean), .groups = 'drop')

The .(subjects) and .(C1, C2) should also work. We loaded dplyr as well, so used c("subjects") and c("C1, "C2")

data

dat_oe <- structure(list(subjects = c("0_1_K05914", "0_2_K06757", "0_3_K06768", 
"0_4_K07479", "0_5_K05811", "0_6_K06786"), PHENO = c(1L, 2L,
NA, 1L, 1L, 1L), AGE = c(-1.0912233, -0.2053317, 0, 1.2711544,
-0.6482775, 0.8282086), SEX = c(2L, 1L, NA, 1L, 1L, 1L), mean_HBA1C = c(0.15392621,
-0.30112172, -4.54101273, -0.09165522, -0.19277698, 0.24782498
), DDuration = c(2.5936581, 0.807564, 0, 1.691769, 0.8606163,
0.3300932), REN_INSF = c(0L, 0L, NA, 0L, 0L, 0L), C1 = c(-0.00389484,
0.00760754, 0.00268124, -0.00302735, 0.00546419, 0.00246136),
C2 = c(0.000564273, -0.000254041, 0.00102475, 0.00164945,
-0.00287371, -6.49155e-05), C3 = c(0.0105879, 0.000225929,
0.00397415, -0.00575519, -0.0134994, 0.00205081), C4 = c(0.00523132,
0.00701527, 0.0102865, 0.00229313, 0.00587083, -0.00726134
), C5 = c(-0.00652487, 0.00365001, -0.000763843, 0.00242112,
0.013513, -0.00206848), C6 = c(0.000303767, -0.00130774,
0.0023347, -0.00214576, 0.0104223, 0.00592337), C7 = c(-0.00292409,
0.00437073, -0.00981626, -0.00560128, 0.00447568, 0.000567228
)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6"))

ddply error when the aggregation function is defined within another function

Try:

result <- ddply(dfx, .(group, sex), here(summarize), max_age = helper(age))

From the help page for here:

This function captures the current context, making it easier to use **ply with functions that do special evaluation and need access to the environment where ddply was called from.

summarize is one such special function.

ddply not returning values from function split by variable

What you need to be doing:

ddply(adhd_p, "pid", summarise,
hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])),
falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(sdt=="Correct Reject")[[2,2]])))

Why you need to be doing it:

When you call ddply, the function works within the .data (adhd_p in your case) as the local namespace. This is similar to calling attach(adhd_p); calling the name of a column without referencing the dataframe explicitly still calls the correct column.

When you supply the summarise argument, the function splits up vectors in the local namespace based on the the id columns supplied (in this case, pid). So, if you reference columns without referencing the dataframe explicitly as above, calculations will be done with the portion of the sdt column corresponding to each pid. However, if you reference the column and dataframe explictly (adhd_p$sdt in your case), it just pulls in the entire vector from the global namespace and doesn't split it appropriately.

Edit: the code below is both less messy and won't raise an error if one of the values is missing:

ddply(adhd_p, .(pid, time), summarise,
hitrate=(sum(sdt=="Hit"))/(sum(sdt=="Hit"))+(sum(sdt=="Miss")),
falsealarmrate=(sum(sdt=="False Alarm"))/(sum(sdt=="False Alarm"))+(sum(sdt=="Correct Reject")))

I do not understand error object not found inside the function

The problem is that splom evaluates its groups argument in a nonstandard way.A quick fix is to rewrite your function so that it constructs the call with the appropriate syntax:

f <- function(data, id)
eval(substitute(splom(data, groups=.id), list(.id=id)))

# test it
ir <- iris[-5]
sp <- iris[, 5]
f(ir, sp)


Related Topics



Leave a reply



Submit