Pythonic way to create union of all values contained in multiple lists
set.union
does what you want:
>>> results_list = [[1,2,3], [1,2,4]]
>>> results_union = set().union(*results_list)
>>> print(results_union)
set([1, 2, 3, 4])
You can also do this with more than two lists.
Taking the union of 2 lists of lists in Python
You should be able to create a set, add each element of each list to that set, then create a list from that set, something like:
new_set = set()
for item in list1:
set.add(item)
for item in list2:
set.add(item)
final_list = list(set)
It won't be guaranteed to be in order but it will have only unique elements. Of course, if you don't care about order, you may as well be using sets exclusively so you don't have to concern yourself with duplicates.
If you want to maintain order, you just maintain both a final list and the set, and only add to the final list if it's not already in the set:
dupe_set = set()
final_list = []
for item in list1:
if item not in dupe_set:
set.add(item)
final_list.append(item)
for item in list1:
if item not in dupe_set:
set.add(item)
final_list.append(item)
Whatever method you choose, this is the sort of thing that cries out for a function/method to do the heavy lifting, since it's quite likely you may want to do this more than once in your code (sometimes, it's worth a separate method even if only done once, as that can aid readability).
For example, these are the equivalent methods for the two options shown above, able to handle any number of input lists:
# Call with x = join_lists_uniq_any_order([list1, list2, ...])
def join_lists_uniq_any_order(list_of_lists):
# Process each list, adding each item to (initially empty) set.
new_set = set()
for one_list in list_of_lists:
for item in one_list:
set.add(item)
return list(set)
# Call with x = join_lists_uniq_keep_order([list1, list2, ...])
def join_lists_uniq_keep_order(list_of_lists):
dupe_set = set()
final_list = []
# Process each list, adding each item if not yet seen.
for one_list in list_of_lists:
for item in one_list:
if item not in dupe_set:
set.add(item)
final_list.append(item)
What is the preferred way to compose a set from multiple lists in Python
For small lists, set(A + B + C)
works fine. For larger lists, the following is more efficient because it does not create a temporary list:
myset = set(A)
myset.update(B)
myset.update(C)
A different approach uses itertools.chain
, which is also efficient because it does not create temporary list:
import itertools
myset = set(itertools.chain(A, B, C))
How to do union of several lists?
If I understand correctly what you are trying to do, you can use the set.update
method with an arbitrary number of iterable arguments.
>>> lists = [[1,2,3], [3,4,5], [5,6,7]]
>>> result = set()
>>> result.update(*lists)
>>>
>>> result
{1, 2, 3, 4, 5, 6, 7}
edit: with your sample data:
>>> list_a = ['abc','bcd','dcb']
>>> list_b = ['abc','xyz','ASD']
>>> list_c = ['AZD','bxd','qwe']
>>>
>>> result = set()
>>> result.update(list_a, list_b, list_c)
>>> result
{'ASD', 'xyz', 'qwe', 'bxd', 'AZD', 'bcd', 'dcb', 'abc'}
join list of lists in python
import itertools
a = [['a','b'], ['c']]
print(list(itertools.chain.from_iterable(a)))
This gives
['a', 'b', 'c']
Fastest method to update all list entries with union of all intersecting entries
This problem is about creating disjoint sets and so I would use union-find methods.
Now Python is not particularly known for being fast, but for the sake of showing the algorithm, here is an implementation of a DisjointSet
class without libraries:
class DisjointSet:
class Element:
def __init__(self):
self.parent = self
self.rank = 0
def __init__(self):
self.elements = {}
def find(self, key):
el = self.elements.get(key, None)
if not el:
el = self.Element()
self.elements[key] = el
else: # Path splitting algorithm
while el.parent != el:
el, el.parent = el.parent, el.parent.parent
return el
def union(self, key=None, *otherkeys):
if key is not None:
root = self.find(key)
for otherkey in otherkeys:
el = self.find(otherkey)
if el != root:
# Union by rank
if root.rank < el.rank:
root, el = el, root
el.parent = root
if root.rank == el.rank:
root.rank += 1
def groups(self):
result = { el: [] for el in self.elements.values()
if el.parent == el }
for key in self.elements:
result[self.find(key)].append(key)
return result
Here is how you could use it for this particular problem:
def solve(lists):
disjoint = DisjointSet()
for lst in lists:
disjoint.union(*lst)
groups = disjoint.groups()
return [lst and groups[disjoint.find(lst[0])] for lst in lists]
Example call:
data = [
[0, 5, 101],
[8, 9, 19, 21],
[],
[78, 79],
[5, 7, 63, 64]
]
result = solve(data)
The result will be:
[[0, 5, 101, 7, 63, 64], [8, 9, 19, 21], [], [78, 79], [0, 5, 101, 7, 63, 64]]
Note that I added an empty list in the input list, so to illustrate that this boundary case remains unaltered.
NB: There are libraries out there that provide union-find/disjoint set functionality, each with a slightly different API, but I suppose that using one of those can give a better performance.
Related Topics
Stratified Train/Test-Split in Scikit-Learn
Converting "Yield From" Statement to Python 2.7 Code
Sphinx's Autodoc's Automodule Having Apparently No Effect
How Does Keras Calculate the Accuracy
Unbuffered Stdout in Python (As in Python -U) from Within the Program
Numpy: Get Random Set of Rows from 2D Array
Passing Csrftoken with Python Requests
Scraping Ajax Pages Using Python
Handling Multiple Requests in Flask
How to Find First Non-Zero Value in Every Column of a Numpy Array
Finding the Consecutive Zeros in a Numpy Array
How to Stop Numpy from Multithreading
How to Check If All Items in a List Are There in Another List
Setting Django Up to Use MySQL