R: Compare All the Columns Pairwise in Matrix

R: Compare all the columns pairwise in matrix

A non-vectorized, (but perhaps more memory-efficient) way of doing this:

# Fancy way.
similarity.matrix<-apply(matrix,2,function(x)colSums(x==matrix))
diag(similarity.matrix)<-0

# More understandable. But verbose.
similarity.matrix<-matrix(nrow=ncol(matrix),ncol=ncol(matrix))
for(col in 1:ncol(matrix)){
  matches<-matrix[,col]==matrix
  match.counts<-colSums(matches)
  match.counts[col]<-0 # Set the same column comparison to zero.
  similarity.matrix[,col]<-match.counts
}

R Compare all columns in a matrix against each in loop

 vcoef <- numeric(3)
 for(i in 1:3) { 
     vcoef[i] <- coef( lm(y~frame[,i]))[2]
               }

outer(vcoef, vcoef, "-")
#----------
          [,1]        [,2]        [,3]
[1,] 0.0000000 -0.15208933 -0.17302592
[2,] 0.1520893  0.00000000 -0.02093659
[3,] 0.1730259  0.02093659  0.00000000

If you didn't want the redundant information you could get all the pairwise differences with combn:

> combcos  <- combn(vcoef,2)
> combcos[1, ] -combcos[2, ]
[1] -0.15208933 -0.17302592 -0.02093659

Pairwise comparison of dataframe column entries in r

Here's a tidyverse approach:

library(tidyverse)
df_dat %>% 
  pivot_longer(-code) %>%
  group_by(code) %>%
  mutate(value = case_when(
    sum(!is.na(value)) == n() ~ "exists",
    !is.na(value) & is.na(lag(value)) ~ "new",
    is.na(value) & !is.na(lag(value)) ~ "closed",
    TRUE ~ NA_character_
  )) %>%
  ungroup() %>%
  pivot_wider(names_from = name, values_from = value)

Result

# A tibble: 2 x 34
   code yr_1986 yr_1987 yr_1988 yr_1989 yr_1990 yr_1991 yr_1992 yr_1993 yr_1994 yr_1995 yr_1996 yr_1997 yr_1998 yr_1999 yr_2000 yr_2001 yr_2002 yr_2003 yr_2004 yr_2005 yr_2006 yr_2007 yr_2008 yr_2009 yr_2010 yr_2011 yr_2012 yr_2013 yr_2014 yr_2015 yr_2016 yr_2017 yr_2018
  <int> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1     1 NA      NA      NA      NA      NA      new     closed  NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA     
2 10000 exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists  exists

Parallelize column pairwise matrix comparison

There are a lot of different possibilities how to parallelise in R. Some options are parallel, foreach and future. Given your code, the least changes you have to make with the future based package future.apply as it provides the function future_apply. You have to use plan(multiprocess) to tell future that it should be calculated in parallel. multiprocess uses different R sessions or forking depending on your OS. This leads to the code (and already speeds up a toy example on my machine):

library(future.apply)
plan(multiprocess)
for(stat in c("kendall", "spearman")){

  # -- kendall tau and spearman 
  stats.vec <- future_apply(pairwise.permuts, 2, function(x) cor(db.mtx.rnk[,x[1]], db.mtx.rnk[,x[2]], method = stat))
  stats.mtx <- matrix(stats.vec, ncol = ncol(db.mtx.rnk))
  colnames(stats.mtx) <- colnames(db.mtx.rnk)
  rownames(stats.mtx) <- colnames(db.mtx.rnk)
}

Pairwise comparison of columns for selecting rows in case both rows are not zero

Try this:

do.call(rbind,
        lapply(combn(names(data), 2, simplify = FALSE),
               function(x) {
                 test <- Reduce("*", data[, x]) != 0
                 data.frame(setNames(as.list(x), c("user1", "user2")), 
                                      topics = if (any(test)) topicVector[test] else 0)
                 })
)

This

creates all combinations of column names in a list,
iterates over that list,
checks if the product (calculated with a Reduce("*", inputList) construct) of the two columns is zero and uses this for subsetting the topics vector,
combines the results with the do.call(rbind, listOfResults) construct.

R Pairwise comparison of matrix columns ignoring empty values

So the following is the answer from akrun :

first changing the blank cells to NA's

is.na(my.matrix) <- my.matrix==''

and then removing the NA's for the match.counts

similarity.matrix <- matrix(nrow=ncol(my.matrix), ncol=ncol(my.matrix))

for(col in 1:ncol(my.matrix)){
  matches <- my.matrix[,col] == my.matrix
  match.counts <- colSums(matches, na.rm=TRUE)
  match.counts[col] <- 0 
  similarity.matrix[,col] <- match.counts

}

Which did indeed give me my desired output:

    V1  V2  V3  V4  V5  V6
1   0   0   0   0   0   1
2   0   0   0   0   0   0
3   0   0   0   0   2   1
4   0   0   0   0   0   0
5   0   0   2   0   0   1
6   1   0   1   0   1   0

thank you.

Perform pairwise comparison of matrix

You can find the pairs with combn and use apply to create the result:

apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
##          [,1]      [,2]       [,3]      [,4]      [,5]      [,6]
## [1,] 3.217841 -2.568691  0.0668021 -5.786532 -3.151039 2.6354931
## [2,] 2.891622 -2.455487 -0.1124344 -5.347109 -3.004056 2.3430526
## [3,] 2.046909 -2.244467 -0.5667373 -4.291376 -2.613647 1.6777297
## [4,] 1.434025 -2.099483 -1.1635060 -3.533508 -2.597531 0.9359770
## [5,] 1.068941 -1.971652 -1.6384254 -3.040593 -2.707366 0.3332266
## [6,] 1.004582 -1.626496 -1.3764430 -2.631078 -2.381025 0.2500530

You can add appropriate names with another apply. Here the column names are very long, which impairs the formatting, but the labels tell what differences are in each column:

x <- apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
colnames(x) <- apply(combn(ncol(d), 2), 2, function(x) paste(names(d)[x], collapse=' - '))
> x
     Transportation.services - Recreational.goods.and.vehicles Transportation.services - Recreation.services
[1,]                                                  3.217841                                     -2.568691
[2,]                                                  2.891622                                     -2.455487
[3,]                                                  2.046909                                     -2.244467
[4,]                                                  1.434025                                     -2.099483
[5,]                                                  1.068941                                     -1.971652
[6,]                                                  1.004582                                     -1.626496
     Transportation.services - Other.services Recreational.goods.and.vehicles - Recreation.services
[1,]                                0.0668021                                             -5.786532
[2,]                               -0.1124344                                             -5.347109
[3,]                               -0.5667373                                             -4.291376
[4,]                               -1.1635060                                             -3.533508
[5,]                               -1.6384254                                             -3.040593
[6,]                               -1.3764430                                             -2.631078
     Recreational.goods.and.vehicles - Other.services Recreation.services - Other.services
[1,]                                        -3.151039                            2.6354931
[2,]                                        -3.004056                            2.3430526
[3,]                                        -2.613647                            1.6777297
[4,]                                        -2.597531                            0.9359770
[5,]                                        -2.707366                            0.3332266
[6,]                                        -2.381025                            0.2500530

R: Compare All the Columns Pairwise in Matrix