Sending in Column Name to Ddply from Function

Sending in Column Name to ddply from Function

There has got to be a better way. And I couldn't figure out how to make it work with summarise.

my.fun <- function(df, count.column) { 
  ddply(df, .(x), function(d) sum(d[[count.column]]))
}

dat <- data.frame(x=letters[1:2], y=1:10)

> my.fun(dat, 'y')
  x V1
1 a 25
2 b 30
>

How to specify a column name in ddply via character variable?

Thank you all for putting effort into answering my question. With your suggestions, I have found the solution. Below is the code to what I was trying to achieve by grouping sample_id and condition and passing state through a variable.

state_mark <- c("pPCLg2", "STAT1", "STAT5", "AKT")

for(state in state_mark){
    dat_state <- dat_clust_stim[,c("sample_id", "condition", state)]

    # I had to use !!ensym() to convert a character to a symbol.
    dat_med <- group_by(dat_state, sample_id, condition) %>% 
               summarise(med = median(!!ensym(state)))

    dat_med <- ungroup(dat_med)
    x <- dat_med[dat_med$condition == "case", "med"]
    y <- dat_med[dat_med$condition == "control", "med"]
    t_test <- t.test(x$med, y$med)
}

Set column name ddply

Perhaps you are looking for summarize (or mutate or transform, depending on what you want to do).

A small example:

set.seed(1)
data <- data.frame(col1 = c(1, 2, 2, 3, 3, 4),
                   col2 = c(1, 2, 2, 1, 2, 1),
                   z = rnorm(6))
ddply(data,.(col1,col2), summarize, 
      number = length(z), newcol = mean(z))
#   col1 col2 number     newcol
# 1    1    1      1 -0.6264538
# 2    2    2      2 -0.3259926
# 3    3    1      1  1.5952808
# 4    3    2      1  0.3295078
# 5    4    1      1 -0.8204684

ddply + summarise function column name input

Although this is probably not the intended usage for summarize and there must be much better approaches to your problem, the direct answer to your question is to use get:

ddply(t.df.l, c("day","variable"), summarise, cor(get(colnames(t.df)[2]), value))

Edit: here is for example one approach that is in my opinion better suited to your problem:

ddply(t.df.l, c("day", "variable"), function(x)cor(x["X1"], x["value"]))

Above, "X1" can be also replaced by 2 or the name of a variable holding "X1", etc. It depends how you want to programmatically access the column.

How can I use variable names to refer to data frame columns with ddply?

The arguments to ddply are expressions which are evaluated in the context of the each part the original data frame is split into. Your df[myval] addresses the whole data frame, so you cannot pass it as-is (btw, why do you need those as.numeric(as.character()) stuff - they are completely useless).

The easiest way will be to write your own function which will does everything inside and pass the column name down, e.g.

df <- ddply(df, 
            .(year), 
            .fun = function(x, colname) transform(x, cum_sales = cumsum(x[,colname])), 
            colname = "sales")

ddply aggregated column names

You can use summarise:

agg_data <- ddply(raw_data, .(id, date, classification), summarise, "no_entries" = nrow(piece))

or you can use length(<column_name>) if nrow(piece) doesn't work. For instance, here's an example that should be runnable by anyone:

ddply(baseball, .(year), summarise, newColumn = nrow(piece))

ddply(baseball, .(year), summarise, newColumn = length(year))

EDIT

Or as Joshua comments, the all caps version, NROW does the checking for you.

How to pass columns as parameters to sum() in ddply?

You could try dplyr

library(dplyr)
library(lazyeval)
mydf %>% 
    group_by(colors) %>% 
   summarise_(sum_val=interp(~sum(var), var=as.name(mycol)))
#   colors sum_val
#1   Blue       5
#2  Green       9
#3    Red       7

Or using ddply from plyr

library(plyr)
ddply(mydf, .(colors), summarize,
   sum_val=eval(substitute(sum(var), list(var=as.name(mycol)))) )
#   colors sum_val
#1   Blue       5
#2  Green       9
#3    Red       7

Regarding the error in one of the codes,

ddply(Cars,'model', summarize, sum(get(mycol)))
#Error: object 'weight1' not found

the Cars object is not defined, but the below works for the example data.

ddply(mydf,'colors', summarize, sum_val=sum(get(mycol)))
#  colors sum_val
#1   Blue       5
#2  Green       9
#3    Red       7

Sending in Column Name to Ddply from Function