Efficient apply or mapply for multiple matrix arguments by row
Splitting the matrices isn't the biggest contributor to evaluation time.
set.seed(21)
matrixA <- matrix(rnorm(5 * 9000), nrow = 9000)
matrixB <- matrix(rnorm(4 * 9000), nrow = 9000)
system.time( scores <- mapply(t.test.stat,
split(matrixA, row(matrixA)), split(matrixB, row(matrixB))) )
# user system elapsed
# 1.57 0.00 1.58
smA <- split(matrixA, row(matrixA))
smB <- split(matrixB, row(matrixB))
system.time( scores <- mapply(t.test.stat, smA, smB) )
# user system elapsed
# 1.14 0.00 1.14
Look at the output from Rprof
to see that most of the time is--not surprisingly--spent evaluating t.test.stat
(mean
, var
, etc.). Basically, there's quite a bit of overhead from function calls.
Rprof()
scores <- mapply(t.test.stat, smA, smB)
Rprof(NULL)
summaryRprof()
You may be able to find faster generalized solutions, but none will approach the speed of the vectorized solution below.
Since your function is simple, you can take advantage of the vectorized rowMeans
function to do this almost instantaneously (though it's a bit messy):
system.time({
ncA <- NCOL(matrixA)
ncB <- NCOL(matrixB)
ans <- (rowMeans(matrixA)-rowMeans(matrixB)) /
sqrt( rowMeans((matrixA-rowMeans(matrixA))^2)*(ncA/(ncA-1))/ncA +
rowMeans((matrixB-rowMeans(matrixB))^2)*(ncB/(ncB-1))/ncB )
})
# user system elapsed
# 0 0 0
head(ans)
# [1] 0.8272511 -1.0965269 0.9862844 -0.6026452 -0.2477661 1.1896181
UPDATE
Here's a "cleaner" version using a rowVars
function:
rowVars <- function(x, na.rm=FALSE, dims=1L) {
rowMeans((x-rowMeans(x, na.rm, dims))^2, na.rm, dims)*(NCOL(x)/(NCOL(x)-1))
}
ans <- (rowMeans(matrixA)-rowMeans(matrixB)) /
sqrt( rowVars(matrixA)/NCOL(matrixA) + rowVars(matrixB)/NCOL(matrixB) )
Using mapply to apply a function with two arguments to every row of two matrices
If we want to apply on each row, then split the matrix by row and pass as a list
mapply(f, asplit(M1, 1), asplit(M2, 1))
Note that mapply
on a matrix
(or a vector
) will loop over each element i.e. here the unit is a single element whereas in data.frame/data.table/tibble
, the single unit is a column. By splitting by row (asplit
- MARGIN = 1
), we get a list
of vectors and here the unit is a list element
As @Adam mentioned in the comments, it may needs to be t
ransposed (Not clear without testing with f
)
mapply - passing row and column of element as argument
There is no need for loops or *apply
functions. You can just use plain matrix operations:
nI <- nrows(cluster)
nJ <- ncols(cluster)
cluster.I <- matrix(rowMeans(cluster), nI, nJ, byrow = FALSE)
cluster.J <- matrix(rowMeans(cluster), nI, nJ, byrow = TRUE)
cluster.IJ <- matrix( mean(cluster), nI, nJ)
residue.mat <- (cluster - cluster.I - cluster.J - cluster.IJ) /
(cluster.N * cluster.M)
(You did not explain what cluster.N
and cluster.M
are but I assume they are scalars)
R: using mapply for a function of two vectors
If dim(M1)
and dim(M2)
are identical, then you can simply do:
rowSums(M1 != M2, na.rm = TRUE)
Your attempt with mapply
didn't work because m
-by-n
matrices are stored as m*n
-length vectors, and mapply
handles them as such. To accomplish this with mapply
, you would need to split each matrix into a list of row vectors:
mapply(Hamming, asplit(M1, 1L), asplit(M2, 1L))
vapply
would be better, though:
vapply(seq_len(nrow(M1)), function(i) Hamming(M1[i, ], M2[i, ]), 0L)
In any case, just use rowSums
.
Call apply-like function on each row of dataframe with multiple arguments from each row
You can apply apply
to a subset of the original data.
dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
apply(dat[,c('x','z')], 1, function(x) sum(x) )
or if your function is just sum use the vectorized version:
rowSums(dat[,c('x','z')])
[1] 6 8
If you want to use testFunc
testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(x) testFunc(x[1],x[2]))
EDIT To access columns by name and not index you can do something like this:
testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(y) testFunc(y['z'],y['x']))
Related Topics
Setting Midpoint for Continuous Diverging Color Scale on a Heatmap
Handling Latex Backslashes in Xtable
Rcpp Function Calling Another Rcpp Function
How to Prevent Objects from Automatically Loading When I Open Rstudio
How to Test If Object Is a Vector
Setting Midpoint for Continuous Diverging Color Scale on a Heatmap
R - Waiting for Page to Load in Rselenium with Phantomjs
Save Output Between Pipes in Dplyr
The Art of R Programming:Where Else Could I Find the Information
Generating Names Iteratively in R for Storing Plots
Can Ggplot Make 2D Summaries of Data
R: Bar Plot with Two Groups, of Which One Is Stacked
Create a Variable Length 'Alist()'
Easiest Way to Discretize Continuous Scales for Ggplot2 Color Scales
Mapping Specific States and Provinces in R