Union of Intersecting Vectors in a List in R

Union of intersecting vectors in a list in R

This is kind of like a graph problem so I like to use the igraph library for this, using your sample data, you can do

library(igraph)
#build edgelist
el <- do.call("rbind",lapply(data, embed, 2))
#make a graph
gg <- graph.edgelist(el, directed=F)
#partition the graph into disjoint sets
split(V(gg)$name, clusters(gg)$membership)

# $`1`
# [1] "b" "a" "c" "d" "n"
#
# $`2`
# [1] "h" "g" "k" "i"

And we can view the results with

V(gg)$color=c("green","purple")[clusters(gg)$membership]
plot(gg)

Sample Image

Find all intersecting vectors in a list of vectors in R

Given that the list elements should be partitioned according to:

  • List elements with empty intersections w.r.t. all the other list components,
  • List elements with a non-empty intersection w.r.t. some other list component,

a way to achieve this in base R is as follows:

## find set components w/ empty intersections w/ all other components
isUnique <- sapply(seq_along(sets), function(i) length(intersect(sets[[i]], unlist(sets[-i]))) < 1)

## empty intersect components
sets[isUnique]
#> $e
#> [1] "e45" "e55" "e65"
#>
#> $j
#> [1] "j1" "j2" "j3"

## non-empty intersect components
sets[!isUnique]
#> $b
#> [1] "b4" "b5" "b6"
#>
#> $c
#> [1] "c2" "c3" "b4" "b5" "c6"
#>
#> $d
#> [1] "d1" "d2"
#>
#> $f
#> [1] "f4" "f5" "d1" "f6"
#>
#> $g
#> [1] "g1" "g2"
#>
#> $h
#> [1] "h5" "h6" "h7"
#>
#> $i
#> [1] "i9" "h5" "g1" "h6" "i8" "i7"

Find the union and intersection of grouped variables

We can remove duplicates and combine a sorted vector every 2 elements like this (R version 4.0 and later for pipe |>):

f <- function(x, y, sep, max){
m <- paste0("\\", max)
gsub(m, "", c(x, y)) |>
strsplit(sep, fixed = T) |>
unlist(use.names = F) |>
sort() |>
unique() |>
as.numeric() |>
(\(.) tapply(., gl(length(.), 2, length(.)), paste, collapse = sep, simplify = T))() |>
(\(.) .[!is.na(.)])() |>
as.character() |>
(\(.) {.[length(.)] <- paste0(.[length(.)], max) ; .})()
}

# for older R versions
f <- function(x, y, sep, max){
x <- gsub(paste0("\\", max), "", c(x, y))
x <- as.numeric(unique(sort(unlist(strsplit(x, sep, T), use.names = F))))
x <- tapply(x, gl(length(x), 2L, length(x)), paste, collapse = sep, simplify = T)
x <- as.character(x[!is.na(x)])
x[length(x)] <- paste0(x[length(x)], max)
x
}

f(example1, example2, "--", "+")
[1] "18--23" "24--25" "26--30" "31--50" "51--65" "66+"

Find any intersection in a list of character vectors

For each element in the list find out it's count using table and select only those values that occur in more than one list.

vals <- unique(unlist(l))

intersect_vals <- names(Filter(function(x) x > 1,
rowSums(sapply(l, function(x) table(factor(x, vals))) > 0)))

intersect_vals
#[1] "a"

Intersecting many pairs of integer vectors

Try representing l as a 1/0 matrix:

max.val = max(sapply(l, max))
mat = do.call(rbind, lapply(l, function(x) {z = rep(0, max.val); z[x] = 1; z}))

Now you can easily compute the pairwise intersections and unions up front:

pair_intsct = mat %*% t(mat)

pair_union = outer(rowSums(mat), rowSums(mat), '+') - pair_intsct

How to make in R matrix of intersections and unions over categories?

Maybe something like this?

days <- levels(allt$day)

f <- function(x, y) {
xids <- allt$id[allt$day == x]
yids <- allt$id[allt$day == y]
length(intersect(xids, yids)) / length(union(xids, yids))
}
f <- Vectorize(f)

outer(days, days, f)

# [,1] [,2] [,3]
# [1,] 1.0 0.5 0.2
# [2,] 0.5 1.0 0.5
# [3,] 0.2 0.5 1.0

optionally pipe that into set_colnames(days) and set_rownames(days)

UNION ALL or INTERSECT ALL equivalent in R?

vecsets::vintersect() does exactly what you want for intersection:

Unlike the base::intersect function, if the vectors have repeated elements in common, the intersection returns as many of these elements as are in whichever vector has fewer of them.

vecsets::vintersect(x,y)
# [1] 1 1 2

Unfortunately, vecsets::vunion() follows a different definition than yours, which seems to be just concatenation, as pointed out by others:

c(x,y)
# [1] 1 2 3 3 1 1 1 2

R: unshared elements across multiple vectors (opposite to intersect)

My approach would be to combine all those vector first.
then count frequency with table function and lastly calculate the length

temp = c(a,b,c)
temp_table = table(temp)
length(temp_table[temp_table == 1])

and use names if you want to show the unique element

names(temp_table[temp_table == 1])

How to find common elements from multiple vectors?

There might be a cleverer way to go about this, but

intersect(intersect(a,b),c)

will do the job.

EDIT: More cleverly, and more conveniently if you have a lot of arguments:

Reduce(intersect, list(a,b,c))


Related Topics



Leave a reply



Submit