doing a plyr operation on every row of a data frame in R
Just treat it like an array and work on each row:
adply(df, 1, transform, max = max(x, y))
Applying a function to every row of a table using dplyr?
As of dplyr 0.2 (I think) rowwise()
is implemented, so the answer to this problem becomes:
iris %>%
rowwise() %>%
mutate(Max.Len= max(Sepal.Length,Petal.Length))
Non rowwise
alternative
Five years (!) later this answer still gets a lot of traffic. Since it was given, rowwise
is increasingly not recommended, although lots of people seem to find it intuitive. Do yourself a favour and go through Jenny Bryan's Row-oriented workflows in R with the tidyverse material to get a good handle on this topic.
The most straightforward way I have found is based on one of Hadley's examples using pmap
:
iris %>%
mutate(Max.Len= purrr::pmap_dbl(list(Sepal.Length, Petal.Length), max))
Using this approach, you can give an arbitrary number of arguments to the function (.f
) inside pmap
.
pmap
is a good conceptual approach because it reflects the fact that when you're doing row wise operations you're actually working with tuples from a list of vectors (the columns in a dataframe).
Call apply-like function on each row of dataframe with multiple arguments from each row
You can apply apply
to a subset of the original data.
dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
apply(dat[,c('x','z')], 1, function(x) sum(x) )
or if your function is just sum use the vectorized version:
rowSums(dat[,c('x','z')])
[1] 6 8
If you want to use testFunc
testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(x) testFunc(x[1],x[2]))
EDIT To access columns by name and not index you can do something like this:
testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(y) testFunc(y['z'],y['x']))
Apply a function to every row of a matrix or a data frame
You simply use the apply()
function:
R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16
R>
This takes a matrix and applies a (silly) function to each row. You pass extra arguments to the function as fourth, fifth, ... arguments to apply()
.
Applying a function on each row of a data frame in R
You may have to use lapply
instead of apply
to force the result to be a list.
> rhymesWithBrave <- function(x) substring(x,nchar(x)-2) =="ave"
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+ if(rhymesWithBrave(dfr[i,"name"])) dfr[i,] else NULL,
+ dfr))
id size name
1 1 100 dave
But in this case, subset
would be more appropriate:
> subset(dfr,rhymesWithBrave(name))
id size name
1 1 100 dave
If you want to perform additional transformations before returning the result, you can go back to the lapply
approach above:
> add100tosize <- function(x) within(x,size <- size+100)
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+ if(rhymesWithBrave(dfr[i,"name"])) add100tosize(dfr[i,])
+ else NULL,dfr))
id size name
1 1 200 dave
Or, in this simple case, apply the function to the output of subset
.
> add100tosize(subset(dfr,rhymesWithBrave(name)))
id size name
1 1 200 dave
UPDATE:
To select rows that do not fall between start and end, you might construct a different function (note: when summing result of boolean/logical vectors, TRUE values are converted to 1s and FALSE values are converted to 0s)
test <- function(x)
rowSums(mapply(function(start,end,x) x >= start & x <= end,
start=c(100,250,698,1988),
end=c(200,400,1520,2147))) == 0
subset(dfr,test(size))
creating a data frame with lapply(and plyr package)
I think this deals with what you're trying to do. Basically, I think you need to read all the data in once, then deal with that data.frame
. There are several questions dealing with how to read it all in, here is how I would do it so I maintain a record of which file each row in the data.frame
comes from, which can also be used for grouping:
filenames <- list.files(".", pattern="^[2-3].txt")
import <- mdply(filenames, read.table, header = T, quote = "\"")
import$file <- filenames[import$X1]
Now import
is a big dataframe with all your files in it (I'm assuming your pattern recognition etc for reading in files is correct). You can then do summaries based on whatever criteria you like.
I'm not sure what you're trying to achieve in line 3 of your code above, but for the ddply
below that, you just need to do:
ddply(import[import$Reqresponse==9,],.(Condition,Reqresponse,file),summarise,Score=mean(Score))
There's so much going on in the rest of your code that it's hard to make out exactly what you want.
I think the important thing is that to make this efficient, and easier to follow, you need to read your data in once, then work on that dataset - making subsets if necessary, doing summary stats or whatever else it is.
As an example of how you can work with this, here's an attempt to deal with your problem of dealing with trials (rows?) that have reqresponse == 9
and the following two. There are probably ways of doing this more efficiently, but this is slightly based on how you were doing it to show you briefly how to work with the larger dataframe. Now modified to remove the first two trials of each file:
import.clean <- ddply(import, .(file), function(x) {
index <- which(x$reqresponse == 9)
if(length(index) > 0) {
index <- unique(c(index, index + 1, index + 2, 1, 2))
}
else index <- c(1,2)
x <- x[-index,]
return(x)
})
Is there a way to apply plyr's count() function to every column individually?
You can use lapply
:
lapply(df, plyr::count)
R: converting each row of a data frame into a list item
Just use split
. It's a few times faster than your adply
line.
> system.time(myList <- alply( df, 1, function(x) data.frame(x) ))
user system elapsed
7.53 0.00 7.57
> system.time( splitList <- split(df, 1:NROW(df)) )
user system elapsed
1.73 0.00 1.74
>
I suspect the parallel backend on adply
is only for function evaluation (not splitting and re-combining).
UPDATE:
If you can convert your data.frame to a matrix, the solution below will be über-fast. You may be able to use split
, but it will drop names and return a vector in each list element.
> m <- as.matrix(df)
> system.time( matrixList <- lapply(1:NROW(m), function(i) m[i,,drop=FALSE]) )
user system elapsed
0.02 0.00 0.02
> str(matrixList[[1]])
num [1, 1:2] -0.0956 -1.5887
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "a" "b"
> system.time( matrixSplitList <- split(m, 1:NROW(m)) )
user system elapsed
0.01 0.00 0.02
> str(matrixSplitList[[1]])
num [1:2] -0.0956 -1.5887
Related Topics
Can't Change Fonts in Ggplot/Geom_Text
Dplyr - Summary Table for Multiple Variables
Mgcv: How to Set Number And/Or Locations of Knots for Splines
Where Should I Put Data for Automated Tests with Testthat
Adding Elements to a List in for Loop in R
Fastest Way to Read in 100,000 .Dat.Gz Files
Read CSV with Dates and Numbers
Control Number of Decimal Places on Xtable Output in R
In R, Use Lubridate to Convert Hms Objects into Seconds
Easier Way to Plot the Cumulative Frequency Distribution in Ggplot
Using Predict with a List of Lm() Objects
Differencebetween Names and Colnames
Matching Multiple Columns on Different Data Frames and Getting Other Column as Result
Efficiently Getting Older Versions of R Packages
Aggregate by Factor Levels, Keeping Other Variables in the Resulting Data Frame