R: Compare all the columns pairwise in matrix
A non-vectorized, (but perhaps more memory-efficient) way of doing this:
# Fancy way.
similarity.matrix<-apply(matrix,2,function(x)colSums(x==matrix))
diag(similarity.matrix)<-0
# More understandable. But verbose.
similarity.matrix<-matrix(nrow=ncol(matrix),ncol=ncol(matrix))
for(col in 1:ncol(matrix)){
matches<-matrix[,col]==matrix
match.counts<-colSums(matches)
match.counts[col]<-0 # Set the same column comparison to zero.
similarity.matrix[,col]<-match.counts
}
R Compare all columns in a matrix against each in loop
vcoef <- numeric(3)
for(i in 1:3) {
vcoef[i] <- coef( lm(y~frame[,i]))[2]
}
outer(vcoef, vcoef, "-")
#----------
[,1] [,2] [,3]
[1,] 0.0000000 -0.15208933 -0.17302592
[2,] 0.1520893 0.00000000 -0.02093659
[3,] 0.1730259 0.02093659 0.00000000
If you didn't want the redundant information you could get all the pairwise differences with combn
:
> combcos <- combn(vcoef,2)
> combcos[1, ] -combcos[2, ]
[1] -0.15208933 -0.17302592 -0.02093659
Pairwise comparison of dataframe column entries in r
Here's a tidyverse
approach:
library(tidyverse)
df_dat %>%
pivot_longer(-code) %>%
group_by(code) %>%
mutate(value = case_when(
sum(!is.na(value)) == n() ~ "exists",
!is.na(value) & is.na(lag(value)) ~ "new",
is.na(value) & !is.na(lag(value)) ~ "closed",
TRUE ~ NA_character_
)) %>%
ungroup() %>%
pivot_wider(names_from = name, values_from = value)
Result
# A tibble: 2 x 34
code yr_1986 yr_1987 yr_1988 yr_1989 yr_1990 yr_1991 yr_1992 yr_1993 yr_1994 yr_1995 yr_1996 yr_1997 yr_1998 yr_1999 yr_2000 yr_2001 yr_2002 yr_2003 yr_2004 yr_2005 yr_2006 yr_2007 yr_2008 yr_2009 yr_2010 yr_2011 yr_2012 yr_2013 yr_2014 yr_2015 yr_2016 yr_2017 yr_2018
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 NA NA NA NA NA new closed NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 10000 exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists exists
Parallelize column pairwise matrix comparison
There are a lot of different possibilities how to parallelise in R. Some options are parallel
, foreach
and future
. Given your code, the least changes you have to make with the future
based package future.apply
as it provides the function future_apply
. You have to use plan(multiprocess)
to tell future
that it should be calculated in parallel. multiprocess
uses different R sessions or forking depending on your OS. This leads to the code (and already speeds up a toy example on my machine):
library(future.apply)
plan(multiprocess)
for(stat in c("kendall", "spearman")){
# -- kendall tau and spearman
stats.vec <- future_apply(pairwise.permuts, 2, function(x) cor(db.mtx.rnk[,x[1]], db.mtx.rnk[,x[2]], method = stat))
stats.mtx <- matrix(stats.vec, ncol = ncol(db.mtx.rnk))
colnames(stats.mtx) <- colnames(db.mtx.rnk)
rownames(stats.mtx) <- colnames(db.mtx.rnk)
}
Pairwise comparison of columns for selecting rows in case both rows are not zero
Try this:
do.call(rbind,
lapply(combn(names(data), 2, simplify = FALSE),
function(x) {
test <- Reduce("*", data[, x]) != 0
data.frame(setNames(as.list(x), c("user1", "user2")),
topics = if (any(test)) topicVector[test] else 0)
})
)
This
- creates all combinations of column names in a list,
- iterates over that list,
- checks if the product (calculated with a
Reduce("*", inputList)
construct) of the two columns is zero and uses this for subsetting the topics vector, - combines the results with the
do.call(rbind, listOfResults)
construct.
R Pairwise comparison of matrix columns ignoring empty values
So the following is the answer from akrun :
first changing the blank cells to NA's
is.na(my.matrix) <- my.matrix==''
and then removing the NA's for the match.counts
similarity.matrix <- matrix(nrow=ncol(my.matrix), ncol=ncol(my.matrix))
for(col in 1:ncol(my.matrix)){
matches <- my.matrix[,col] == my.matrix
match.counts <- colSums(matches, na.rm=TRUE)
match.counts[col] <- 0
similarity.matrix[,col] <- match.counts
}
Which did indeed give me my desired output:
V1 V2 V3 V4 V5 V6
1 0 0 0 0 0 1
2 0 0 0 0 0 0
3 0 0 0 0 2 1
4 0 0 0 0 0 0
5 0 0 2 0 0 1
6 1 0 1 0 1 0
thank you.
Perform pairwise comparison of matrix
You can find the pairs with combn
and use apply
to create the result:
apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 3.217841 -2.568691 0.0668021 -5.786532 -3.151039 2.6354931
## [2,] 2.891622 -2.455487 -0.1124344 -5.347109 -3.004056 2.3430526
## [3,] 2.046909 -2.244467 -0.5667373 -4.291376 -2.613647 1.6777297
## [4,] 1.434025 -2.099483 -1.1635060 -3.533508 -2.597531 0.9359770
## [5,] 1.068941 -1.971652 -1.6384254 -3.040593 -2.707366 0.3332266
## [6,] 1.004582 -1.626496 -1.3764430 -2.631078 -2.381025 0.2500530
You can add appropriate names with another apply
. Here the column names are very long, which impairs the formatting, but the labels tell what differences are in each column:
x <- apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
colnames(x) <- apply(combn(ncol(d), 2), 2, function(x) paste(names(d)[x], collapse=' - '))
> x
Transportation.services - Recreational.goods.and.vehicles Transportation.services - Recreation.services
[1,] 3.217841 -2.568691
[2,] 2.891622 -2.455487
[3,] 2.046909 -2.244467
[4,] 1.434025 -2.099483
[5,] 1.068941 -1.971652
[6,] 1.004582 -1.626496
Transportation.services - Other.services Recreational.goods.and.vehicles - Recreation.services
[1,] 0.0668021 -5.786532
[2,] -0.1124344 -5.347109
[3,] -0.5667373 -4.291376
[4,] -1.1635060 -3.533508
[5,] -1.6384254 -3.040593
[6,] -1.3764430 -2.631078
Recreational.goods.and.vehicles - Other.services Recreation.services - Other.services
[1,] -3.151039 2.6354931
[2,] -3.004056 2.3430526
[3,] -2.613647 1.6777297
[4,] -2.597531 0.9359770
[5,] -2.707366 0.3332266
[6,] -2.381025 0.2500530
Related Topics
Connect Ggplot Boxplots Using Lines and Multiple Factor
Adding Multiple Lag Variables Using Dplyr and for Loops
Rbuildignore and Excluding Directories
Cannot Read File with "#" and Space Using Read.Table or Read.CSV in R
How to Run a R Language(.R) File Using Batch File
Contrasts Can Be Applied Only to Factor
How to Rbind All the Data.Frames in Your Working Environment
Filling Under the a Curve with Ggplot Graphs
Loess Fit and Resulting Equation
Importing Data into R (Rdata) from Github
How to Prep Transaction Data into Basket for Arules
Cannot Read File with "#" and Space Using Read.Table or Read.CSV in R
Using Variable Value as Column Name in Data.Frame or Cbind
Convert a Mm-Yy String "Jan-01" into Date Format
Missing Data When Supplying a Dual-Axis--Multiple-Traces to Subplot
Plot Dates on the X Axis and Time on the Y Axis with Ggplot2