Is it possible to use write.table() and ddply, together?
Continuing from Joshua's answer, the plyr
function to use is d_ply
which does not expect to return anything. You can do something like this:
d_ply(df, .(a),
function(sdf) write.csv(sdf,
file=paste(sdf$a[[1]],".csv",sep="")))
The file
argument to write.csv
is constructed such that each subset gets a different filename.
R Dynamically build list in data.table (or ddply)
Another way is to use .SDcols
to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e
to be summed by type
where as, b,g
should have mean
taken and c,f
its median, then,
# constructing an example data.table:
set.seed(45)
dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9),
b = rnorm(9), c=runif(9), d=sample(9), e=sample(9),
f = runif(9), g=rnorm(9))
# type a b c d e f g
# 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
# 2: hello 3 1.1773119 0.6559926 3 3 0.4586280 -0.8376586
# 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597 1.7216593
# 4: bye 8 -0.2260640 0.3924327 8 2 0.1271187 0.4360063
# 5: bye 7 -1.0720503 0.3256450 7 8 0.5774691 0.7571990
# 6: bye 5 -0.7131021 0.4855804 6 9 0.2687791 1.5398858
# 7: ok 1 -0.4680549 0.8476840 2 4 0.5633317 1.5393945
# 8: ok 4 0.4183264 0.4402595 4 1 0.7592801 2.1829996
# 9: ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758
# 1) set key
setkey(dt, "type")
# 2) group col-ids by similar operations
id1 <- which(names(dt) %in% c("a", "d", "e"))
id2 <- which(names(dt) %in% c("b","g"))
id3 <- which(names(dt) %in% c("c","f"))
# 3) now use these ids in with .SDcols parameter
dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]
# 4) merge them.
dt1[dt2[dt3]]
# type a d e b g c f
# 1: bye 20 21 19 -0.6704055 0.9110304 0.3924327 0.2687791
# 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
# 3: ok 14 11 10 -0.5104907 0.9090061 0.5080116 0.5633317
If/when you have many many column, making a list like the one you've might be cumbersome.
Creating a nested table with ddply
Try
prop.table(with(g, table(response, Item, School)), margin = 2)
This gives a 4x10x20 array (responses, items, schools). You can use as.data.fame
on the result for conversion if needed.
How to subset data for a specific column with ddply?
With plyr
, you can do it as follows:
ddply(df,
.(Condition), summarise,
N = length(Response),
nAccurate = sum(Accuracy),
RT = mean(RT[Accuracy==1]))
this gives:
Condition N nAccurate RT
1: 1 6 4 127.50
2: 2 6 4 300.25
If you use data.table
, then this is an alternative way:
library(data.table)
setDT(df)[, .(N = .N,
nAccurate = sum(Accuracy),
RT = mean(RT[Accuracy==1])),
by = Condition]
Aggregate sum and mean in R with ddply
Antoher solution using dplyr
. First you apply both aggregate functions on every variable you want to be aggregated. Of the resulting variables you select only the desired function/variable combination.
library(dplyr)
library(ggplot2)
diamonds %>%
group_by(cut) %>%
summarise_each(funs(sum, mean), x:z, price) %>%
select(cut, matches("[xyz]_sum"), price_mean)
Write list of data.frames to separate CSV files with lapply
Try this:
sapply(names(df.daily),
function (x) write.table(df.daily[[x]], file=paste(x, "txt", sep=".") ) )
You should see the names ("1", "2", "3") spit out one by one, but the NULLs are the evidence that the side-effect of writing to disk files was done. (Edit: changed [] to [[]].)
plyr package writing the same function over multiple columns
The plyr
-centred approach is to use colwise
eg
ddply(data, .(TYPE), colwise(sum))
TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1 1 319.8977 60.80317
2 2 621.6745 37.05863
You can pass the column names as the argument .col
if you want only a subset
You can also use numcolwise
or catcolwise
to act on numeric or categorical columns only.
note that you could use sapply
in place of the most basic use of colwise
ddply(data, .(TYPE), sapply, FUN = 'mean')
The idiomatic data.table approach is to use lapply(.SD, fun)
eg
dt <- data.table(data)
dt[,lapply(.SD, sum) ,by = TYPE]
TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1: 2 621.6745 37.05863
2: 1 319.8977 60.80317
(How) can I use ddply to summarize a dataframe grouped by two factors?
Just remove the c in the .variables
argument, so your code is:
library(plyr)
ddply(ExampleData, .(Condition, Block), summarize, Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))
By the way, you might want to switch to using dplyr
instead of plyr
.
https://blog.rstudio.com/2014/01/17/introducing-dplyr/
If you were to do this in dplyr
:
summarize(group_by(ExampleData, Condition, Block), Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))
You could also use the piping so this could be:
ExampleData %>%
group_by(Condition, Block) %>%
summarise(Average=mean(Var1, na.rm=TRUE),
SD=sd(Var1),
N=length(Var1),
Med =median(Var1))
How to replicate a ddply behavior that uses a custom function with dplyr?
As shown in ?do
, you can refer to a group with .
in your expression. The following will replicate your ddply
output:
iris %>% group_by(Species) %>% do(.[1:5, ])
# Source: local data frame [15 x 5]
# Groups: Species
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 7.0 3.2 4.7 1.4 versicolor
# 7 6.4 3.2 4.5 1.5 versicolor
# 8 6.9 3.1 4.9 1.5 versicolor
# 9 5.5 2.3 4.0 1.3 versicolor
# 10 6.5 2.8 4.6 1.5 versicolor
# 11 6.3 3.3 6.0 2.5 virginica
# 12 5.8 2.7 5.1 1.9 virginica
# 13 7.1 3.0 5.9 2.1 virginica
# 14 6.3 2.9 5.6 1.8 virginica
# 15 6.5 3.0 5.8 2.2 virginica
More generally, to apply a custom function to groups with dplyr
, you can do something like the following (thanks @docendodiscimus):
iris %>% group_by(Species) %>% do(mm(.))
Related Topics
How to Start Ggplot2 Geom_Bar from Different Origin
How to Apply Geom_Smooth() for Every Group
How to Create a Vector of Functions
Retain Attributes When Using Gather from Tidyr (Attributes Are Not Identical)
Plot Multiple Datasets with Ggplot
Display Frequency Instead of Count with Geom_Bar() in Ggplot
Flexdashboard - Change Title Bar Color
User Defined Colour Palette in R and Ggpairs
R Cumulative Sum with a Condition and a Reset
What If I Want to Web Scrape with R for a Page with Parameters
Suppress Automatic Output to Console in R
Sum Non Na Elements Only, But If All Na Then Return Na
Data Difference in 'As.Posixct' with Excel
Calculate Using Dplyr, Percentage of Na's in Each Column