Create a co-occurrence matrix from dummy-coded observations
This will do the trick:
X <- as.matrix(X)
out <- crossprod(X) # Same as: t(X) %*% X
diag(out) <- 0 # (b/c you don't count co-occurrences of an aspect with itself)
out
# [,1] [,2] [,3] [,4]
# [1,] 0 0 1 0
# [2,] 0 0 2 1
# [3,] 1 2 0 1
# [4,] 0 1 1 0
To get the results into a data.frame exactly like the one you showed, you can then do something like:
nms <- paste("X", 1:4, sep="")
dimnames(out) <- list(nms, nms)
out <- as.data.frame(out)
Constructing a co-occurrence matrix in python pandas
It's a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don't forget to convert them to integer):
>>> df_asint = df.astype(int)
>>> coocc = df_asint.T.dot(df_asint)
>>> coocc
Dop Snack Trans
Dop 4 2 3
Snack 2 3 2
Trans 3 2 4
if, as in R answer, you want to reset diagonal, you can use numpy's fill_diagonal
:
>>> import numpy as np
>>> np.fill_diagonal(coocc.values, 0)
>>> coocc
Dop Snack Trans
Dop 0 2 3
Snack 2 0 2
Trans 3 2 0
finding the number of co-occurences of multiple binary variables in R
Try crossprod
> crossprod(df)
v1 v2 v3
v1 2 2 2
v2 2 4 4
v3 2 4 6
Convert dummy-coded matrix to adjacency matrix
You may make use of outer
function.
count1s <- function(x, y) colSums(x == 1 & y == 1)
n <- 1:ncol(data)
mat <- outer(n, n, function(x, y) count1s(data[, x], data[, y]))
diag(mat) <- 0
dimnames(mat) <- list(colnames(data), colnames(data))
mat
# A B C D
#A 0 1 2 0
#B 1 0 0 1
#C 2 0 0 0
#D 0 1 0 0
Convert co-occurrence dataframe to square matrix
What you described in words sounded like ordinary matrix multiplication forllowed by setting the diag to 0:
temp <- t(as.matrix(d)) %*% as.matrix(d)
diag(temp) <- 0
> temp
A B C D E F G H
A 0 6 1 0 0 0 0 3
B 6 0 1 0 0 0 0 3
C 0 1 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0
H 3 3 0 0 0 0 0 0
The tcrossprod
function is probably even faster, but either of these methods will surely out-perform your nested loop.
How to create a logical AND contingency table in R?
sapply(df, function(x) sapply(df, function(y) sum(x * y)))
#OR
t(df) %*% as.matrix(df)
# typeA typeB typeC
#typeA 4 3 2
#typeB 3 4 2
#typeC 2 2 4
Co-occurence (matrix) of values based on group and time
Here you go:
library(data.table)
library(magrittr)
options(stringsAsFactors = F)
dat <- read.table(text="Group ID Time
Trx1 A 1980
Trx1 B 1980
Trx1 C 1980
Trx2 E 1980
Trx2 B 1980
Trx3 B 1981
Trx3 C 1981
Trx4 C 1983
Trx4 E 1983
Trx4 B 1983
Trx5 F 1984
Trx5 B 1984
Trx5 C 1984
Trx6 A 1986", header=T)
str(dat)
dat = as.data.table(dat)
priorYears = 3
unqIDs = unique(dat$ID)
results = data.table(ID = character(), year = numeric(), total = numeric(), diff = numeric(), repeatSum = numeric())
for(i in 1:nrow(dat)){
endYear = dat$Time[i]
startYear = endYear - priorYears
this.ID = dat$ID[i]
this.group = dat$Group[i]
#Dates filtering
subset.DT = dat[dat$Time >= startYear & dat$Time < endYear]
# Keep projects where my current ID collaborated
groupsToKeep = subset.DT$Group[subset.DT$ID == this.ID] %>% unique
subset.DT = subset.DT[subset.DT$Group %in% groupsToKeep,]
# Calculations
unqMembers = unique(subset.DT$ID) %>% .[. != this.ID]
currentMembers = dat$ID[dat$Group == this.group] %>% .[. != this.ID]
total = length(which(subset.DT$ID != this.ID))
diff = length(unqMembers)
repeatSum = sum(table(subset.DT$ID)[currentMembers], na.rm = T)
# Add results
results = rbind(results, data.frame(ID = this.ID, year = endYear, total, diff, repeatSum))
}`
Related Topics
Applying a Function to Every Row of a Table Using Dplyr
Wrap Long Axis Labels Via Labeller=Label_Wrap in Ggplot2
How to Extract Plot Axes' Ranges For a Ggplot2 Object
R on Macos Error: Vector Memory Exhausted (Limit Reached)
Read All Files in Directory and Apply Multiple Functions to Each Data Frame
Add Correct Century to Dates With Year Provided as "Year Without Century", %Y
How to Set Multiple Legends/Scales For the Same Aesthetic in Ggplot2
Test If Characters Are in a String
R Shiny Passing Reactive to Selectinput Choices
Unordered Combinations of All Lengths
How to Remove Outliers from a Dataset
Why Is Rbindlist "Better" Than Rbind
How to Quickly Form Groups (Quartiles, Deciles, etc) by Ordering Column(S) in a Data Frame
Splitting a Continuous Variable into Equal Sized Groups
Geom_Rect and Alpha - Does This Work With Hard Coded Values