How to Split a List into Pairs in All Possible Ways

Split a list into all pairs in all possible ways

You can use itertools.permutations and filter out duplicates using frozenset:

In [173]: d = {frozenset([frozenset(x[:2]), frozenset(x[2:4]), x[-1]]) for x in itertools.permutations(l1, 
...: len(l1))}

In [174]: d2 = [sorted(x, key=lambda x: (not isinstance(x, frozenset), x)) for x in d]

In [175]: sorted([[tuple(x[0]), tuple(x[1]), x[-1]] for x in d2])
Out[175]:
[[(0, 4), (2, 3), 1],
[(1, 2), (0, 3), 4],
[(1, 2), (0, 4), 3],
[(1, 2), (3, 4), 0],
[(1, 3), (0, 2), 4],
[(1, 3), (0, 4), 2],
[(1, 3), (2, 4), 0],
[(1, 4), (0, 2), 3],
[(1, 4), (0, 3), 2],
[(2, 3), (0, 1), 4],
[(2, 3), (1, 4), 0],
[(2, 4), (0, 1), 3],
[(2, 4), (0, 3), 1],
[(3, 4), (0, 1), 2],
[(3, 4), (0, 2), 1]]

How to split a list into n groups in all possible combinations of group length and elements within group?

We can use the basic recursive algorithm from this answer and modify it to produce partitions of a particular length without having to generate and filter out unwanted partitions.

def sorted_k_partitions(seq, k):
"""Returns a list of all unique k-partitions of `seq`.

Each partition is a list of parts, and each part is a tuple.

The parts in each individual partition will be sorted in shortlex
order (i.e., by length first, then lexicographically).

The overall list of partitions will then be sorted by the length
of their first part, the length of their second part, ...,
the length of their last part, and then lexicographically.
"""
n = len(seq)
groups = [] # a list of lists, currently empty

def generate_partitions(i):
if i >= n:
yield list(map(tuple, groups))
else:
if n - i > k - len(groups):
for group in groups:
group.append(seq[i])
yield from generate_partitions(i + 1)
group.pop()

if len(groups) < k:
groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

result = generate_partitions(0)

# Sort the parts in each partition in shortlex order
result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result]
# Sort partitions by the length of each part, then lexicographically.
result = sorted(result, key = lambda ps: (*map(len, ps), ps))

return result

There's quite a lot going on here, so let me explain.

First, we start with a procedural, bottom-up (teminology?) implementation of the same aforementioned recursive algorithm:

def partitions(seq):
"""-> a list of all unique partitions of `seq` in no particular order.

Each partition is a list of parts, and each part is a tuple.
"""
n = len(seq)
groups = [] # a list of lists, currently empty

def generate_partitions(i):
if i >= n:
yield list(map(tuple, groups))
else:
for group in groups
group.append(seq[i])
yield from generate_partitions(i + 1)
group.pop()

groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

if n > 0:
return list(generate_partitions(0))
else:
return [[()]]

The main algorithm is in the nested generate_partitions function. Basically, it walks through the sequence, and for each item, it: 1) puts the item into each of current groups (a.k.a parts) in the working set and recurses; 2) puts the item in its own, new group.

When we reach the end of the sequence (i == n), we yield a (deep) copy of the working set that we've been building up.

Now, to get partitions of a particular length, we could simply filter or group the results for the ones we're looking for and be done with it, but this approach performs a lot of unnecessary work (i.e. recursive calls) if we just wanted partitions of some length k.

Note that in the function above, the length of a partition (i.e. the # of groups) is increased whenever:

            # this adds a new group (or part) to the partition
groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

...is executed. Thus, we limit the size of a partition by simply putting a guard on that block, like so:

def partitions(seq, k):
...

def generate_partitions(i):
...

# only add a new group if the total number would not exceed k
if len(groups) < k:
groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

Adding the new parameter and just that line to the partitions function will now cause it to only generate partitions of length up to k. This is almost what we want. The problem is that the for loop still sometimes generates partitions of length less than k.

In order to prune those recursive branches, we need to only execute the for loop when we can be sure that we have enough remaining elements in our sequence to expand the working set to a total of k groups. The number of remaining elements--or elements that haven't yet been placed into a group--is n - i (or len(seq) - i). And k - len(groups) is the number of new groups that we need to add to produce a valid k-partition. If n - i <= k - len(groups), then we cannot waste an item by adding it one of the current groups--we must create a new group.

So we simply add another guard, this time to the other recursive branch:

    def generate_partitions(i):
...

# only add to current groups if the number of remaining items
# exceeds the number of required new groups.
if n - i > k - len(groups):
for group in groups:
group.append(seq[i])
yield from generate_partitions(i + 1)
group.pop()

# only add a new group if the total number would not exceed k
if len(groups) < k:
groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

And with that, you have a working k-partition generator. You could probably collapse some of the recursive calls even further (for example, if there are 3 remaining items and we need 3 more groups, then you already know that you must split each item into their own group), but I wanted to show the function as a slight modification of the basic algorithm which generates all partitions.

The only thing left to do is sort the results. Unfortunately, rather than figuring out how to directly generate the partitions in the desired order (an exercise for a smarter dog), I cheated and just sorted post-generation.

def sorted_k_partitions(seq, k):
...
result = generate_partitions(0)

# Sort the parts in each partition in shortlex order
result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result]
# Sort partitions by the length of each part, then lexicographically.
result = sorted(result, key = lambda ps: (*map(len, ps), ps))

return result

Somewhat self-explanatory, except for the key functions. The first one:

key = lambda p: (len(p), p) 

says to sort a sequence by length, then by the sequence itself (which, in Python, are ordered lexicographically by default). The p stands for "part". This is used to sort the parts/groups within a partition. This key means that, for example, (4,) precedes (1, 2, 3), so that [(1, 2, 3), (4,)] is sorted as [(4,), (1, 2, 3)].

key = lambda ps: (*map(len, ps), ps) 
# or for Python versions <3.5: lambda ps: tuple(map(len, ps)) + (ps,)

The ps here stands for "parts", plural. This one says to sort a sequence by the lengths of each of its elements (which must be sequence themselves), then (lexicographically) by the sequence itself. This is used to sort the partitions with respect to each other, so that, for example, [(4,), (1, 2, 3)] precedes [(1, 2), (3, 4)].

The following:

seq = [1, 2, 3, 4]

for k in 1, 2, 3, 4:
for groups in sorted_k_partitions(seq, k):
print(k, groups)

produces:

1 [(1, 2, 3, 4)]
2 [(1,), (2, 3, 4)]
2 [(2,), (1, 3, 4)]
2 [(3,), (1, 2, 4)]
2 [(4,), (1, 2, 3)]
2 [(1, 2), (3, 4)]
2 [(1, 3), (2, 4)]
2 [(1, 4), (2, 3)]
3 [(1,), (2,), (3, 4)]
3 [(1,), (3,), (2, 4)]
3 [(1,), (4,), (2, 3)]
3 [(2,), (3,), (1, 4)]
3 [(2,), (4,), (1, 3)]
3 [(3,), (4,), (1, 2)]
4 [(1,), (2,), (3,), (4,)]

Splitting a list into all combinations

There may be more performant solutions that avoid a complete extra iteration of the list for the other half, but that should be rather negligible:

l = [[x, tuple(y for y in L if y not in x)] for x in combinations(L, 3)]
[[(0, 1, 2), (3, 4, 5)],
[(0, 1, 3), (2, 4, 5)],
[(0, 1, 4), (2, 3, 5)],
[(0, 1, 5), (2, 3, 4)],
[(0, 2, 3), (1, 4, 5)],
[(0, 2, 4), (1, 3, 5)],
[(0, 2, 5), (1, 3, 4)],
[(0, 3, 4), (1, 2, 5)],
[(0, 3, 5), (1, 2, 4)],
[(0, 4, 5), (1, 2, 3)],
[(1, 2, 3), (0, 4, 5)],
[(1, 2, 4), (0, 3, 5)],
[(1, 2, 5), (0, 3, 4)],
[(1, 3, 4), (0, 2, 5)],
[(1, 3, 5), (0, 2, 4)],
[(1, 4, 5), (0, 2, 3)],
[(2, 3, 4), (0, 1, 5)],
[(2, 3, 5), (0, 1, 4)],
[(2, 4, 5), (0, 1, 3)],
[(3, 4, 5), (0, 1, 2)]]

This depends on the absence of duplicates in the original list. Otherwise you'd have to work with the indexes instead. The following modification uses the same approach, but uses the list indexes for the combinations and can thus handle duplicates in the original list:

indexes = ((x, (y for y in L if y not in x)) for x in combinations(range(len(L)), 3))
l = [[tuple(L[a] for a in A), tuple(L[b] for b in B)] for A, B in indexes]

All possibilities to split a list into two lists

itertools has product() which could be used to generate the masks and izip() which could combine the lists for easy filtering. As a bonus, since they return iterators, they don't use much memory.

from itertools import *

facs = ['one','two','three']

l1 = []
l2 = []
for pattern in product([True,False],repeat=len(facs)):
l1.append([x[1] for x in izip(pattern,facs) if x[0]])
l2.append([x[1] for x in izip(pattern,facs) if not x[0]])

python - split a list in pairs and unique elements

Maybe a simpler solution would be a recursive one.
Just create every combination with the first element and move to sublists without it.

def partition(L):
if len(L) <= 1:
return [L]

partitions = [[L[0]] + p for p in partition(L[1:])]
for i in xrange(1, len(L)):
partitions.extend([[(L[0], L[i])] + p for p in partition(L[1:i]+L[i+1:])])

return partitions

Fastest way to get sets of all mutually exclusive pairs that can be formed from a list in python?

Simple recursive version:

def all_pairings(l):
if len(l) == 0:
return [[]]
else:
return [[(l[0],l[i])] + p for i in range(1,len(l)) for p in all_pairings(l[1:i]+l[i+1:])]

all_pairings('')
# [[]]

all_pairings('ab')
# [[('a', 'b')]]

all_pairings('abcd')
# [[('a', 'b'), ('c', 'd')], [('a', 'c'), ('b', 'd')], [('a', 'd'), ('b', 'c')]]

all_pairings('abcdef')
# [[('a', 'b'), ('c', 'd'), ('e', 'f')], [('a', 'b'), ('c', 'e'), ('d', 'f')],
# [('a', 'b'), ('c', 'f'), ('d', 'e')], [('a', 'c'), ('b', 'd'), ('e', 'f')],
# [('a', 'c'), ('b', 'e'), ('d', 'f')], [('a', 'c'), ('b', 'f'), ('d', 'e')],
# [('a', 'd'), ('b', 'c'), ('e', 'f')], [('a', 'd'), ('b', 'e'), ('c', 'f')],
# [('a', 'd'), ('b', 'f'), ('c', 'e')], [('a', 'e'), ('b', 'c'), ('d', 'f')],
# [('a', 'e'), ('b', 'd'), ('c', 'f')], [('a', 'e'), ('b', 'f'), ('c', 'd')],
# [('a', 'f'), ('b', 'c'), ('d', 'e')], [('a', 'f'), ('b', 'd'), ('c', 'e')],
# [('a', 'f'), ('b', 'e'), ('c', 'd')]]

Fastest way to split a list into a minimum amount of sets, enumerating all possible solutions

This way takes time linear in the number of list elements, and its output is the same (same sets, in the same order) regardless of the input list's order. It's basically a more "eager" variation of your code:

    def split(xs):
from collections import defaultdict
x2count = defaultdict(int)
result = []
for x in xs:
x2count[x] += 1
count = x2count[x]
if count > len(result):
result.append(set())
result[count - 1].add(x)
return result

Then, e.g.,

    xs = [0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 9]
import random
random.shuffle(xs)
print(split(xs))

displays

[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 2, 3, 4, 7, 8}, {8, 2, 4}, {4}]

Finding all answers is bound to be annoying ;-) But straightforward enough. Once you know there are 4 sets in the result, then you have a hairy kind of Cartesian product to compute. You know that, e.g., 7 appears twice, so there are comb(4, 2) == 6 ways to pick the two result sets 7 goes in. For each of those ways, you know, e.g., that 8 appears 3 times, so there are comb(4, 3) == 4 ways to pick the three results sets 8 goes in. Now we're up to 6 * 4 = 24 partial results. Repeat similarly for all other original integers. itertools.combinations() can be used to do the choosing.

Unclear: consider input [1, 1, 2]. The output here is [{1, 2}, {1}]. Do you, or do you not, consider that to be the same as [{1}, {1, 2}]? That is, do you consider an output to be a sequence of sets (in which case they're different), or as a set of sets (in which case they're the same)? A straightforward Cartesian product takes the "it's a sequence" view.

Finding all of 'em

Here's a way. As sketched, it computes a Cartesian product of all ways of distributing each element across the number of required output sets. But rather than use itertools.product() for this, it does it recursively, one element at a time. This allows it to check partial results so far for isomorphism, and decline to extend any partial solution that's isomorphic to a partial solution it already extended.

Toward that end, it views a partial solution as a set of sets. For technical reasons, Python requires using a frozenset for a set that'a going to be used in turn as a set element.

Caution: this generator yields the same result object every time. That's for efficiency. If you don't like that, you could, e.g., replace

            yield result

with

            yield result[:]

instead.

EDIT: note that I replaced the line

            sig = frozenset(map(frozenset, result))

with

            sig = frozenset(Counter(map(frozenset, result)).items())

That's because you really aren't viewing the result as a set of sets, but as a multiset of sets (a given set can appear more than once in a result, and the number of times it appears is significant). In fancier test cases than were given here, that can make a real difference.

A Counter is the closest thing Python has to a builtin multiset type, but there is no "frozen" Counter workalike akin to frozensets. So instead we turn the Counter into a sequence of 2-tuples, and put those tuples into a frozenset. By using (set, count) pairs, this allows us to account for that the number of times a set appears in a result is significant.

def allsplit(xs):
from collections import Counter
from itertools import combinations
c = Counter(xs)
n = max(c.values())
result = [set() for i in range(n)]
pairs = list(c.items())
pin = len(pairs)

def inner(pi):
if pi >= pin:
yield result
return
elt, count = pairs[pi]
seen = set()
for ixs in combinations(range(n), count):
for i in ixs:
result[i].add(elt)
sig = frozenset(Counter(map(frozenset, result)).items())
if sig not in seen:
yield from inner(pi + 1)
seen.add(sig)
for i in ixs:
result[i].remove(elt)

return inner(0)

Example:

>>> for x in allsplit([1, 1, 2, 3, 8, 4, 4]):
... print(x)

[{1, 2, 3, 4, 8}, {1, 4}]
[{1, 2, 3, 4}, {8, 1, 4}]
[{1, 2, 4, 8}, {1, 3, 4}]
[{1, 2, 4}, {1, 3, 4, 8}]

For your original example, it finds 36992 unique ways to partition the input.



Related Topics



Leave a reply



Submit