In R, match function for rows or columns of matrix

match will work on lists of atomic vectors. So to match rows of one matrix to another, you could do:

match(data.frame(t(x)), data.frame(t(y)))

t transposes the rows into columns, then data.frame creates a list of the columns in the transposed matrix.

How would I identify which columns and rows match between two data matrices?

You might want to have a look at the %in% operator in R. According to your question, you might want something like this:

m1[,1] %in% m2[,1]

You can then pair it with functions such as mean or sum which will help you to find the percentage as required:

sum(m1[,1] %in% m2[,1])
#[1] 5
mean(m1[,1] %in% m2[,1])
#[1] 0.625

EDIT: As required by the OP in the comments of this post, there are various methods for that, my personal favourite being the which function:

m1[which(m1[,1] %in% m2[,1]),]
#[1] "Taxon1" "Taxon3" "Taxon4" "Taxon6" "Taxon7"
m1[which(!(m1[,1] %in% m2[,1])),]
#[1] "Taxon2" "Taxon5" "Taxon8"

Again, this is only one method, out of many (I can count 3 right now...), so I suggest you to explore the other options...

R::How would I match the rows of one matrix to the rows in another matrix, regardless of the column order?

We can sort by row on each dataset

x1 <- t(apply(X, 1, sort))
y1 <- t(apply(Y, 1, sort))

and then do a match on the pasted rows of each dataset to return the row index of the match

#[1] 1 4 9

Match rows between two matrices

A[apply(A, 1, function(x) all(B[1,] %in% x)),]   
# [,1] [,2] [,3] [,4]
#[1,] 121 114 117 200
#[2,] 413 121 719 117
#[3,] 117 428 121 211

Match list to rows of matrix in R

Having a few columns and trying to take advantage of columns with > 1 unique values or no non-zero values to reduce computations:

ff = function(a, b)
i = seq_len(nrow(b)) #starting candidate matches
for(j in seq_len(ncol(a))) {
aj = a[, j]
nzaj = aj[aj != 0L]
if(!length(nzaj)) next #if all(a[, j] == 0) save some operations
if(sum(tabulate(nzaj) > 0L) > 1L) return(integer()) #if no unique values in a column break looping
i = i[b[i, j] == nzaj[[1L]]] #update candidate matches

lapply(a, function(x) ff(x, b))
#[1] 3 4
#[1] 6

With data of your actual size:

a2 = replicate(300L, matrix(sample(0:3, 20 * 5, TRUE, c(0.97, 0.01, 0.01, 0.01)), 20, 5), simplify = FALSE)
b2 = matrix(sample(1:3, 15 * 5, TRUE), 15, 5)
identical(OP(a2, b2), lapply(a2, function(x) ff(x, b2)))
#[1] TRUE
microbenchmark::microbenchmark(OP(a2, b2), lapply(a2, function(x) ff(x, b2)), times = 50)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# OP(a2, b2) 686.961815 730.840732 760.029859 753.790094 785.310056 863.04577 50 b
# lapply(a2, function(x) ff(x, b2)) 8.110542 8.450888 9.381802 8.949924 9.872826 15.51568 50 a

OP is:

OP = function (a, b) 
temp = Map(function(y) t(y), Map(function(a) apply(a, 1,
function(x) {
apply(b, 1, function(y) identical(x[x != 0], y[x !=
}), a))
lapply(temp, function(x) which(apply(x, 2, prod) == 1))

returning matrix column indices matching value(s) in R

res <- arrayInd(match(values, mat), .dim = dim(mat))
res[res[, 1] != seq_len(nrow(res)), 2] <- NA
# [,1] [,2]
# [1,] 1 2
# [2,] 2 1
# [3,] 3 3
# [4,] 2 NA
# [5,] 5 4
# [6,] 6 1
# [7,] 7 10
# [8,] 3 NA
# [9,] 9 1
#[10,] 10 1

R match rowwise values with column names in multiple columns and get column value

Another option in base R is split-unsplit:

data$New_Col <- unsplit(Map(`[`, 
data[paste0("Name_", LETTERS[1:4])],
split(seq_len(nrow(data)), data$PartName)),

It scales better than indexing the data frame with a matrix of the form cbind(i, j). The latter approach has significant overhead due to an intermediate coercion of the data frame to matrix, which involves a deep copy of all of the variables.

If you do go with split-unsplit, then make sure that PartName is a factor with suitable levels, as you need the second and third arguments of Map to correspond elementwise. In this case, it would be good practice to start with:

data$PartName <- factor(data$PartName, levels = LETTERS[1:4])

For the curious:

n <- 1e+06L
r <- 25L
x <-, rnorm(n), simplify = FALSE))
names(x) <- paste0("Name_", LETTERS[1:r])
x$PartName <- LETTERS[1:r][, n, TRUE)]

y <-

f1 <- function(x) {
n <- length(x)
i <- seq_len(nrow(x))
j <- match(x$PartName, sub("^Name_", "", names(x)[-n]))
x[-n][cbind(i, j)]
f2 <- function(x) {
nms <- names(x)[-length(x)]
g <- factor(x$PartName, levels = sub("^Name_", "", nms))
unsplit(Map(`[`, x[nms], split(seq_len(nrow(x)), g)), g)
f3 <- function(x) {
x[, New_Col := .SD[[paste0("Name_", .BY[[1L]])]], by = PartName]

bench::mark(f1(x), f2(x), f3(y), iterations = 100L, check = FALSE, filter_gc = FALSE)
## # A tibble: 3 × 13
## expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
## <bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
## 1 f1(x) 86.1ms 92.3ms 10.9 225.1MB 6.95 100 64 9.21s <NULL> <Rprofmem> <bench_tm> <tibble>
## 2 f2(x) 43.4ms 45.8ms 21.2 61.1MB 3.60 100 17 4.73s <NULL> <Rprofmem> <bench_tm> <tibble>
## 3 f3(y) 77.9ms 79.7ms 12.4 21.1MB 0.247 100 2 8.08s <NULL> <Rprofmem> <bench_tm> <tibble>

Extracting rows and columns of a matrix if row names and column names have a partial match

An easier option is to reshape to 'long' by converting to data.frame from table, and then subset the rows based on the values of 'Var1' and 'Var2'

out <- subset(, Var1 == sub("\\d+", "", Var2),
select =c(Var2, Freq))
with(out, setNames(Freq, Var2))
aaa1 aaa2 aaa3 bbb1 bbb2 bbb3 ccc1 ccc2 ccc3
0.01495641 1.57504185 2.32762287 0.42652979 0.41329383 0.07119408 0.64530516 1.39629918 0.17042160

Or with row/column indexing

i1 <- match( sub("\\d+", "", colnames(a)), rownames(a))
a[cbind(i1, seq_along(i1))]
[1] 0.01495641 1.57504185 2.32762287 0.42652979 0.41329383 0.07119408 0.64530516 1.39629918 0.17042160

