Merge Lists That Share Common Elements

Merge lists that share common elements

You can see your list as a notation for a Graph, ie ['a','b','c'] is a graph with 3 nodes connected to each other. The problem you are trying to solve is finding connected components in this graph.

You can use NetworkX for this, which has the advantage that it's pretty much guaranteed to be correct:

l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]

import networkx
from networkx.algorithms.components.connected import connected_components

def to_graph(l):
G = networkx.Graph()
for part in l:
# each sublist is a bunch of nodes
G.add_nodes_from(part)
# it also imlies a number of edges:
G.add_edges_from(to_edges(part))
return G

def to_edges(l):
"""
treat `l` as a Graph and returns it's edges
to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
"""
it = iter(l)
last = next(it)

for current in it:
yield last, current
last = current

G = to_graph(l)
print connected_components(G)
# prints [['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p'], ['k']]

To solve this efficiently yourself you have to convert the list into something graph-ish anyways, so you might as well use networkX from the start.

How to merge lists with common elements in a list of lists?

You need to use recursion:

def group(d, _start, _c = [], _seen = [], _used=[]):
r = [i for i in d if any(c in _start for c in i) and i not in _seen and i not in _used]
if not r:
yield set(_c)
for i in d:
if i != _start and i not in _used:
yield from group(d, i, _c=[], _seen=[], _used=_used+[i, *r])
else:
yield from group(d, _start, _c=_c+[i for b in r for i in b], _seen=_seen+r, _used=_used+r)

data = [[1, 7, 3], [1, 7, 5], [2, 0, 4], [2, 0, 6], [3, 7, 1], [3, 7, 5], [4, 0, 2], [4, 0, 6], [5, 7, 1], [5, 7, 3], [6, 0, 2], [6, 0, 4]]
result = list(map(list, {tuple(i) for i in group(data, data[0], _seen=[data[0]]) if i}))

Output:

[[0, 2, 4, 6], [1, 3, 5, 7]]

How do I merge lists with common elements, where these elements themselves are lists/tuples?

Here is a solution that uses O(n) additional space which tracks all sub-lists that are already added using a hash table (thus, all lookups would just be O(1)).

added = set()
merged = []

for item in data:
filtered = list(filter(lambda value: value not in added, map(tuple, item)))
if filtered:
added.update(filtered)
merged.append(list(map(list, filtered)))

print(merged)

Output

[[[0, 1], [2, 3], [4, 5]], [[6, 7], [8, 9], [10, 11]], [[12, 13], [14, 15], [16, 17], [18, 19], [20, 21]]]

Update

The solution above only prevents duplicate list items from occurring again on the next list. Here, we wouldn't only prevent them from occurring again but also merge them on existing ones. As a result, the time complexity of this algorithm is somewhat O(n^2).

merged = []

for item in data:
item_set = set(map(tuple, item))
for result in merged:
if result & item_set:
result.update(item_set)
break
else:
merged.append(item_set)

merged = [sorted(map(list, item)) for item in merged]
print(merged)

Merge List of lists where sublists have common elements

The following code should solve your problem:

def merge_subs(lst_of_lsts):
res = []
for row in lst_of_lsts:
for i, resrow in enumerate(res):
if row[0]==resrow[0]:
res[i] += row[1:]
break
else:
res.append(row)
return res

Note that the elsebelongs to the inner for and is executed if the loop is exited without hitting the break.

merging sets which have even one element in common R

Here is another approach which constructs a matrix showing which elements of the list intersect with each other and uses the igraph package to deduce the groups:

library(igraph)
## Construct the matrix
m = sapply(lista,function(x) sapply(lista,function(y) length(intersect(x,y))>0))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] TRUE TRUE FALSE FALSE TRUE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE FALSE
[3,] FALSE FALSE TRUE TRUE FALSE FALSE
[4,] FALSE FALSE TRUE TRUE FALSE FALSE
[5,] TRUE FALSE FALSE FALSE TRUE TRUE
[6,] FALSE FALSE FALSE FALSE TRUE TRUE

## Determine the groups of the graph constructed from m
groups = groups(components(graph_from_adjacency_matrix(m)))
$`1`
[1] 1 2 5 6

$`2`
[1] 3 4

## Get the unique elements of each group
res = lapply(groups,function(x) sort(unique(unlist(lista[x]))))
$`1`
[1] 1 2 4 6 7 8 9 10 11 12 19 32 34 35 36 37 38 39 40

$`2`
[1] 13 14 16 20 21 23 25 26 27 28 29 30 31

How to merge multiple lists into 1 but only those elements that were in all of the initial lists?

Given some lists xs1, ..., xs5:

xss = [xs1, xs2, xs3, xs4, xs5]
sets = [set(xs) for xs in xss]
merged = set.intersection(*sets)

This has the property that merged may be in any order.



Related Topics



Leave a reply



Submit