Efficient Apply or Mapply for Multiple Matrix Arguments by Row

Efficient apply or mapply for multiple matrix arguments by row

Splitting the matrices isn't the biggest contributor to evaluation time.

set.seed(21)
matrixA <- matrix(rnorm(5 * 9000), nrow = 9000)
matrixB <- matrix(rnorm(4 * 9000), nrow = 9000)

system.time( scores <- mapply(t.test.stat,
split(matrixA, row(matrixA)), split(matrixB, row(matrixB))) )
# user system elapsed
# 1.57 0.00 1.58
smA <- split(matrixA, row(matrixA))
smB <- split(matrixB, row(matrixB))
system.time( scores <- mapply(t.test.stat, smA, smB) )
# user system elapsed
# 1.14 0.00 1.14

Look at the output from Rprof to see that most of the time is--not surprisingly--spent evaluating t.test.stat (mean, var, etc.). Basically, there's quite a bit of overhead from function calls.

Rprof()
scores <- mapply(t.test.stat, smA, smB)
Rprof(NULL)
summaryRprof()

You may be able to find faster generalized solutions, but none will approach the speed of the vectorized solution below.

Since your function is simple, you can take advantage of the vectorized rowMeans function to do this almost instantaneously (though it's a bit messy):

system.time({
ncA <- NCOL(matrixA)
ncB <- NCOL(matrixB)
ans <- (rowMeans(matrixA)-rowMeans(matrixB)) /
sqrt( rowMeans((matrixA-rowMeans(matrixA))^2)*(ncA/(ncA-1))/ncA +
rowMeans((matrixB-rowMeans(matrixB))^2)*(ncB/(ncB-1))/ncB )
})
# user system elapsed
# 0 0 0
head(ans)
# [1] 0.8272511 -1.0965269 0.9862844 -0.6026452 -0.2477661 1.1896181

UPDATE

Here's a "cleaner" version using a rowVars function:

rowVars <- function(x, na.rm=FALSE, dims=1L) {
rowMeans((x-rowMeans(x, na.rm, dims))^2, na.rm, dims)*(NCOL(x)/(NCOL(x)-1))
}
ans <- (rowMeans(matrixA)-rowMeans(matrixB)) /
sqrt( rowVars(matrixA)/NCOL(matrixA) + rowVars(matrixB)/NCOL(matrixB) )

Using mapply to apply a function with two arguments to every row of two matrices

If we want to apply on each row, then split the matrix by row and pass as a list

mapply(f, asplit(M1, 1), asplit(M2, 1))

Note that mapply on a matrix (or a vector) will loop over each element i.e. here the unit is a single element whereas in data.frame/data.table/tibble, the single unit is a column. By splitting by row (asplit - MARGIN = 1), we get a list of vectors and here the unit is a list element

As @Adam mentioned in the comments, it may needs to be transposed (Not clear without testing with f)

mapply - passing row and column of element as argument

There is no need for loops or *apply functions. You can just use plain matrix operations:

nI <- nrows(cluster)
nJ <- ncols(cluster)
cluster.I <- matrix(rowMeans(cluster), nI, nJ, byrow = FALSE)
cluster.J <- matrix(rowMeans(cluster), nI, nJ, byrow = TRUE)
cluster.IJ <- matrix( mean(cluster), nI, nJ)

residue.mat <- (cluster - cluster.I - cluster.J - cluster.IJ) /
(cluster.N * cluster.M)

(You did not explain what cluster.N and cluster.M are but I assume they are scalars)

R: using mapply for a function of two vectors

If dim(M1) and dim(M2) are identical, then you can simply do:

rowSums(M1 != M2, na.rm = TRUE)

Your attempt with mapply didn't work because m-by-n matrices are stored as m*n-length vectors, and mapply handles them as such. To accomplish this with mapply, you would need to split each matrix into a list of row vectors:

mapply(Hamming, asplit(M1, 1L), asplit(M2, 1L))

vapply would be better, though:

vapply(seq_len(nrow(M1)), function(i) Hamming(M1[i, ], M2[i, ]), 0L)

In any case, just use rowSums.

Call apply-like function on each row of dataframe with multiple arguments from each row

You can apply apply to a subset of the original data.

 dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
apply(dat[,c('x','z')], 1, function(x) sum(x) )

or if your function is just sum use the vectorized version:

rowSums(dat[,c('x','z')])
[1] 6 8

If you want to use testFunc

 testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(x) testFunc(x[1],x[2]))

EDIT To access columns by name and not index you can do something like this:

 testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(y) testFunc(y['z'],y['x']))


Related Topics



Leave a reply



Submit