Pythonic Way to Create Union of All Values Contained in Multiple Lists

Pythonic way to create union of all values contained in multiple lists

set.union does what you want:

>>> results_list = [[1,2,3], [1,2,4]]
>>> results_union = set().union(*results_list)
>>> print(results_union)
set([1, 2, 3, 4])

You can also do this with more than two lists.

Taking the union of 2 lists of lists in Python

You should be able to create a set, add each element of each list to that set, then create a list from that set, something like:

new_set = set()
for item in list1:
set.add(item)
for item in list2:
set.add(item)
final_list = list(set)

It won't be guaranteed to be in order but it will have only unique elements. Of course, if you don't care about order, you may as well be using sets exclusively so you don't have to concern yourself with duplicates.


If you want to maintain order, you just maintain both a final list and the set, and only add to the final list if it's not already in the set:

dupe_set = set()
final_list = []
for item in list1:
if item not in dupe_set:
set.add(item)
final_list.append(item)
for item in list1:
if item not in dupe_set:
set.add(item)
final_list.append(item)

Whatever method you choose, this is the sort of thing that cries out for a function/method to do the heavy lifting, since it's quite likely you may want to do this more than once in your code (sometimes, it's worth a separate method even if only done once, as that can aid readability).

For example, these are the equivalent methods for the two options shown above, able to handle any number of input lists:

# Call with x = join_lists_uniq_any_order([list1, list2, ...])

def join_lists_uniq_any_order(list_of_lists):
# Process each list, adding each item to (initially empty) set.

new_set = set()
for one_list in list_of_lists:
for item in one_list:
set.add(item)

return list(set)

# Call with x = join_lists_uniq_keep_order([list1, list2, ...])

def join_lists_uniq_keep_order(list_of_lists):
dupe_set = set()
final_list = []

# Process each list, adding each item if not yet seen.

for one_list in list_of_lists:
for item in one_list:
if item not in dupe_set:
set.add(item)
final_list.append(item)

What is the preferred way to compose a set from multiple lists in Python

For small lists, set(A + B + C) works fine. For larger lists, the following is more efficient because it does not create a temporary list:

myset = set(A)
myset.update(B)
myset.update(C)

A different approach uses itertools.chain, which is also efficient because it does not create temporary list:

import itertools
myset = set(itertools.chain(A, B, C))

How to do union of several lists?

If I understand correctly what you are trying to do, you can use the set.update method with an arbitrary number of iterable arguments.

>>> lists = [[1,2,3], [3,4,5], [5,6,7]]
>>> result = set()
>>> result.update(*lists)
>>>
>>> result
{1, 2, 3, 4, 5, 6, 7}

edit: with your sample data:

>>> list_a = ['abc','bcd','dcb']
>>> list_b = ['abc','xyz','ASD']
>>> list_c = ['AZD','bxd','qwe']
>>>
>>> result = set()
>>> result.update(list_a, list_b, list_c)
>>> result
{'ASD', 'xyz', 'qwe', 'bxd', 'AZD', 'bcd', 'dcb', 'abc'}

join list of lists in python

import itertools
a = [['a','b'], ['c']]
print(list(itertools.chain.from_iterable(a)))

This gives

['a', 'b', 'c']

Fastest method to update all list entries with union of all intersecting entries

This problem is about creating disjoint sets and so I would use union-find methods.

Now Python is not particularly known for being fast, but for the sake of showing the algorithm, here is an implementation of a DisjointSet class without libraries:

class DisjointSet:
class Element:
def __init__(self):
self.parent = self
self.rank = 0

def __init__(self):
self.elements = {}

def find(self, key):
el = self.elements.get(key, None)
if not el:
el = self.Element()
self.elements[key] = el
else: # Path splitting algorithm
while el.parent != el:
el, el.parent = el.parent, el.parent.parent
return el

def union(self, key=None, *otherkeys):
if key is not None:
root = self.find(key)
for otherkey in otherkeys:
el = self.find(otherkey)
if el != root:
# Union by rank
if root.rank < el.rank:
root, el = el, root
el.parent = root
if root.rank == el.rank:
root.rank += 1

def groups(self):
result = { el: [] for el in self.elements.values()
if el.parent == el }
for key in self.elements:
result[self.find(key)].append(key)
return result

Here is how you could use it for this particular problem:

def solve(lists):
disjoint = DisjointSet()
for lst in lists:
disjoint.union(*lst)

groups = disjoint.groups()
return [lst and groups[disjoint.find(lst[0])] for lst in lists]

Example call:

data = [
[0, 5, 101],
[8, 9, 19, 21],
[],
[78, 79],
[5, 7, 63, 64]
]
result = solve(data)

The result will be:

[[0, 5, 101, 7, 63, 64], [8, 9, 19, 21], [], [78, 79], [0, 5, 101, 7, 63, 64]]

Note that I added an empty list in the input list, so to illustrate that this boundary case remains unaltered.

NB: There are libraries out there that provide union-find/disjoint set functionality, each with a slightly different API, but I suppose that using one of those can give a better performance.



Related Topics



Leave a reply



Submit