How to Use Write.Table() and Ddply, Together

Is it possible to use write.table() and ddply, together?

Continuing from Joshua's answer, the plyr function to use is d_ply which does not expect to return anything. You can do something like this:

d_ply(df, .(a),
function(sdf) write.csv(sdf,
file=paste(sdf$a[[1]],".csv",sep="")))

The file argument to write.csv is constructed such that each subset gets a different filename.

R Dynamically build list in data.table (or ddply)

Another way is to use .SDcols to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e to be summed by type where as, b,g should have mean taken and c,f its median, then,

# constructing an example data.table:
set.seed(45)
dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9),
b = rnorm(9), c=runif(9), d=sample(9), e=sample(9),
f = runif(9), g=rnorm(9))

# type a b c d e f g
# 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
# 2: hello 3 1.1773119 0.6559926 3 3 0.4586280 -0.8376586
# 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597 1.7216593
# 4: bye 8 -0.2260640 0.3924327 8 2 0.1271187 0.4360063
# 5: bye 7 -1.0720503 0.3256450 7 8 0.5774691 0.7571990
# 6: bye 5 -0.7131021 0.4855804 6 9 0.2687791 1.5398858
# 7: ok 1 -0.4680549 0.8476840 2 4 0.5633317 1.5393945
# 8: ok 4 0.4183264 0.4402595 4 1 0.7592801 2.1829996
# 9: ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758

# 1) set key
setkey(dt, "type")

# 2) group col-ids by similar operations
id1 <- which(names(dt) %in% c("a", "d", "e"))
id2 <- which(names(dt) %in% c("b","g"))
id3 <- which(names(dt) %in% c("c","f"))

# 3) now use these ids in with .SDcols parameter
dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]

# 4) merge them.
dt1[dt2[dt3]]

# type a d e b g c f
# 1: bye 20 21 19 -0.6704055 0.9110304 0.3924327 0.2687791
# 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
# 3: ok 14 11 10 -0.5104907 0.9090061 0.5080116 0.5633317

If/when you have many many column, making a list like the one you've might be cumbersome.

Creating a nested table with ddply

Try

prop.table(with(g, table(response, Item, School)), margin = 2) 

This gives a 4x10x20 array (responses, items, schools). You can use as.data.fame on the result for conversion if needed.

How to subset data for a specific column with ddply?

With plyr, you can do it as follows:

ddply(df,
.(Condition), summarise,
N = length(Response),
nAccurate = sum(Accuracy),
RT = mean(RT[Accuracy==1]))

this gives:

   Condition N nAccurate     RT
1: 1 6 4 127.50
2: 2 6 4 300.25

If you use data.table, then this is an alternative way:

library(data.table)
setDT(df)[, .(N = .N,
nAccurate = sum(Accuracy),
RT = mean(RT[Accuracy==1])),
by = Condition]

Aggregate sum and mean in R with ddply

Antoher solution using dplyr. First you apply both aggregate functions on every variable you want to be aggregated. Of the resulting variables you select only the desired function/variable combination.

library(dplyr)
library(ggplot2)

diamonds %>%
group_by(cut) %>%
summarise_each(funs(sum, mean), x:z, price) %>%
select(cut, matches("[xyz]_sum"), price_mean)

Write list of data.frames to separate CSV files with lapply

Try this:

sapply(names(df.daily), 
function (x) write.table(df.daily[[x]], file=paste(x, "txt", sep=".") ) )

You should see the names ("1", "2", "3") spit out one by one, but the NULLs are the evidence that the side-effect of writing to disk files was done. (Edit: changed [] to [[]].)

plyr package writing the same function over multiple columns

The plyr-centred approach is to use colwise

eg

 ddply(data, .(TYPE), colwise(sum))
TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1 1 319.8977 60.80317
2 2 621.6745 37.05863

You can pass the column names as the argument .col if you want only a subset

You can also use numcolwise or catcolwise to act on numeric or categorical columns only.

note that you could use sapply in place of the most basic use of colwise

ddply(data, .(TYPE), sapply, FUN = 'mean') 

The idiomatic data.table approach is to use lapply(.SD, fun)

eg

dt <- data.table(data)
dt[,lapply(.SD, sum) ,by = TYPE]
TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1: 2 621.6745 37.05863
2: 1 319.8977 60.80317

(How) can I use ddply to summarize a dataframe grouped by two factors?

Just remove the c in the .variables argument, so your code is:

library(plyr)
ddply(ExampleData, .(Condition, Block), summarize, Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))

By the way, you might want to switch to using dplyr instead of plyr.
https://blog.rstudio.com/2014/01/17/introducing-dplyr/

If you were to do this in dplyr:

summarize(group_by(ExampleData, Condition, Block), Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))

You could also use the piping so this could be:

ExampleData %>% 
group_by(Condition, Block) %>%
summarise(Average=mean(Var1, na.rm=TRUE),
SD=sd(Var1),
N=length(Var1),
Med =median(Var1))

How to replicate a ddply behavior that uses a custom function with dplyr?

As shown in ?do, you can refer to a group with . in your expression. The following will replicate your ddply output:

iris %>% group_by(Species) %>% do(.[1:5, ])

# Source: local data frame [15 x 5]
# Groups: Species
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 7.0 3.2 4.7 1.4 versicolor
# 7 6.4 3.2 4.5 1.5 versicolor
# 8 6.9 3.1 4.9 1.5 versicolor
# 9 5.5 2.3 4.0 1.3 versicolor
# 10 6.5 2.8 4.6 1.5 versicolor
# 11 6.3 3.3 6.0 2.5 virginica
# 12 5.8 2.7 5.1 1.9 virginica
# 13 7.1 3.0 5.9 2.1 virginica
# 14 6.3 2.9 5.6 1.8 virginica
# 15 6.5 3.0 5.8 2.2 virginica

More generally, to apply a custom function to groups with dplyr, you can do something like the following (thanks @docendodiscimus):

iris %>% group_by(Species) %>% do(mm(.))


Related Topics



Leave a reply



Submit