Create a Co-Occurrence Matrix from Dummy-Coded Observations

Create a co-occurrence matrix from dummy-coded observations

This will do the trick:

X <- as.matrix(X)
out <- crossprod(X) # Same as: t(X) %*% X
diag(out) <- 0 # (b/c you don't count co-occurrences of an aspect with itself)
out
# [,1] [,2] [,3] [,4]
# [1,] 0 0 1 0
# [2,] 0 0 2 1
# [3,] 1 2 0 1
# [4,] 0 1 1 0

To get the results into a data.frame exactly like the one you showed, you can then do something like:

nms <- paste("X", 1:4, sep="")
dimnames(out) <- list(nms, nms)
out <- as.data.frame(out)

Constructing a co-occurrence matrix in python pandas

It's a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don't forget to convert them to integer):

>>> df_asint = df.astype(int)
>>> coocc = df_asint.T.dot(df_asint)
>>> coocc
Dop Snack Trans
Dop 4 2 3
Snack 2 3 2
Trans 3 2 4

if, as in R answer, you want to reset diagonal, you can use numpy's fill_diagonal:

>>> import numpy as np
>>> np.fill_diagonal(coocc.values, 0)
>>> coocc
Dop Snack Trans
Dop 0 2 3
Snack 2 0 2
Trans 3 2 0

finding the number of co-occurences of multiple binary variables in R

Try crossprod

> crossprod(df)
v1 v2 v3
v1 2 2 2
v2 2 4 4
v3 2 4 6

Convert dummy-coded matrix to adjacency matrix

You may make use of outer function.

count1s <- function(x, y) colSums(x == 1 & y == 1)
n <- 1:ncol(data)
mat <- outer(n, n, function(x, y) count1s(data[, x], data[, y]))
diag(mat) <- 0
dimnames(mat) <- list(colnames(data), colnames(data))
mat

# A B C D
#A 0 1 2 0
#B 1 0 0 1
#C 2 0 0 0
#D 0 1 0 0

Convert co-occurrence dataframe to square matrix

What you described in words sounded like ordinary matrix multiplication forllowed by setting the diag to 0:

temp <- t(as.matrix(d)) %*% as.matrix(d)
diag(temp) <- 0


> temp
A B C D E F G H
A 0 6 1 0 0 0 0 3
B 6 0 1 0 0 0 0 3
C 0 1 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0
H 3 3 0 0 0 0 0 0

The tcrossprod function is probably even faster, but either of these methods will surely out-perform your nested loop.

How to create a logical AND contingency table in R?

sapply(df, function(x) sapply(df, function(y) sum(x * y)))
#OR
t(df) %*% as.matrix(df)
# typeA typeB typeC
#typeA 4 3 2
#typeB 3 4 2
#typeC 2 2 4

Co-occurence (matrix) of values based on group and time

Here you go:

library(data.table)
library(magrittr)
options(stringsAsFactors = F)

dat <- read.table(text="Group ID Time
Trx1 A 1980
Trx1 B 1980
Trx1 C 1980
Trx2 E 1980
Trx2 B 1980
Trx3 B 1981
Trx3 C 1981
Trx4 C 1983
Trx4 E 1983
Trx4 B 1983
Trx5 F 1984
Trx5 B 1984
Trx5 C 1984
Trx6 A 1986", header=T)

str(dat)
dat = as.data.table(dat)

priorYears = 3
unqIDs = unique(dat$ID)


results = data.table(ID = character(), year = numeric(), total = numeric(), diff = numeric(), repeatSum = numeric())

for(i in 1:nrow(dat)){

endYear = dat$Time[i]
startYear = endYear - priorYears
this.ID = dat$ID[i]
this.group = dat$Group[i]

#Dates filtering
subset.DT = dat[dat$Time >= startYear & dat$Time < endYear]

# Keep projects where my current ID collaborated
groupsToKeep = subset.DT$Group[subset.DT$ID == this.ID] %>% unique
subset.DT = subset.DT[subset.DT$Group %in% groupsToKeep,]


# Calculations
unqMembers = unique(subset.DT$ID) %>% .[. != this.ID]
currentMembers = dat$ID[dat$Group == this.group] %>% .[. != this.ID]

total = length(which(subset.DT$ID != this.ID))
diff = length(unqMembers)
repeatSum = sum(table(subset.DT$ID)[currentMembers], na.rm = T)

# Add results
results = rbind(results, data.frame(ID = this.ID, year = endYear, total, diff, repeatSum))

}`


Related Topics



Leave a reply



Submit