Alternative Way to Split a List into Groups of N

Alternative way to split a list into groups of n

A Python recipe (In Python 2.6, use itertools.izip_longest):

def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)

Example usage:

>>> list(grouper(3, range(9)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
>>> list(grouper(3, range(10)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]

If you want the last group to be shorter than the others instead of padded with fillvalue, then you could e.g. change the code like this:

>>> def mygrouper(n, iterable):
... args = [iter(iterable)] * n
... return ([e for e in t if e != None] for t in itertools.zip_longest(*args))
...
>>> list(mygrouper(3, range(9)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
>>> list(mygrouper(3, range(10)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Splitting a list into N parts of approximately equal length

This code is broken due to rounding errors. Do not use it!!!

assert len(chunkIt([1,2,3], 10)) == 10  # fails

Here's one that could work:

def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0

while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg

return out

Testing:

>>> chunkIt(range(10), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
>>> chunkIt(range(11), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]]
>>> chunkIt(range(12), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]

How to split a list into n groups in all possible combinations of group length and elements within group?

We can use the basic recursive algorithm from this answer and modify it to produce partitions of a particular length without having to generate and filter out unwanted partitions.

def sorted_k_partitions(seq, k):
"""Returns a list of all unique k-partitions of `seq`.

Each partition is a list of parts, and each part is a tuple.

The parts in each individual partition will be sorted in shortlex
order (i.e., by length first, then lexicographically).

The overall list of partitions will then be sorted by the length
of their first part, the length of their second part, ...,
the length of their last part, and then lexicographically.
"""
n = len(seq)
groups = [] # a list of lists, currently empty

def generate_partitions(i):
if i >= n:
yield list(map(tuple, groups))
else:
if n - i > k - len(groups):
for group in groups:
group.append(seq[i])
yield from generate_partitions(i + 1)
group.pop()

if len(groups) < k:
groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

result = generate_partitions(0)

# Sort the parts in each partition in shortlex order
result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result]
# Sort partitions by the length of each part, then lexicographically.
result = sorted(result, key = lambda ps: (*map(len, ps), ps))

return result

There's quite a lot going on here, so let me explain.

First, we start with a procedural, bottom-up (teminology?) implementation of the same aforementioned recursive algorithm:

def partitions(seq):
"""-> a list of all unique partitions of `seq` in no particular order.

Each partition is a list of parts, and each part is a tuple.
"""
n = len(seq)
groups = [] # a list of lists, currently empty

def generate_partitions(i):
if i >= n:
yield list(map(tuple, groups))
else:
for group in groups
group.append(seq[i])
yield from generate_partitions(i + 1)
group.pop()

groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

if n > 0:
return list(generate_partitions(0))
else:
return [[()]]

The main algorithm is in the nested generate_partitions function. Basically, it walks through the sequence, and for each item, it: 1) puts the item into each of current groups (a.k.a parts) in the working set and recurses; 2) puts the item in its own, new group.

When we reach the end of the sequence (i == n), we yield a (deep) copy of the working set that we've been building up.

Now, to get partitions of a particular length, we could simply filter or group the results for the ones we're looking for and be done with it, but this approach performs a lot of unnecessary work (i.e. recursive calls) if we just wanted partitions of some length k.

Note that in the function above, the length of a partition (i.e. the # of groups) is increased whenever:

            # this adds a new group (or part) to the partition
groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

...is executed. Thus, we limit the size of a partition by simply putting a guard on that block, like so:

def partitions(seq, k):
...

def generate_partitions(i):
...

# only add a new group if the total number would not exceed k
if len(groups) < k:
groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

Adding the new parameter and just that line to the partitions function will now cause it to only generate partitions of length up to k. This is almost what we want. The problem is that the for loop still sometimes generates partitions of length less than k.

In order to prune those recursive branches, we need to only execute the for loop when we can be sure that we have enough remaining elements in our sequence to expand the working set to a total of k groups. The number of remaining elements--or elements that haven't yet been placed into a group--is n - i (or len(seq) - i). And k - len(groups) is the number of new groups that we need to add to produce a valid k-partition. If n - i <= k - len(groups), then we cannot waste an item by adding it one of the current groups--we must create a new group.

So we simply add another guard, this time to the other recursive branch:

    def generate_partitions(i):
...

# only add to current groups if the number of remaining items
# exceeds the number of required new groups.
if n - i > k - len(groups):
for group in groups:
group.append(seq[i])
yield from generate_partitions(i + 1)
group.pop()

# only add a new group if the total number would not exceed k
if len(groups) < k:
groups.append([seq[i]])
yield from generate_partitions(i + 1)
groups.pop()

And with that, you have a working k-partition generator. You could probably collapse some of the recursive calls even further (for example, if there are 3 remaining items and we need 3 more groups, then you already know that you must split each item into their own group), but I wanted to show the function as a slight modification of the basic algorithm which generates all partitions.

The only thing left to do is sort the results. Unfortunately, rather than figuring out how to directly generate the partitions in the desired order (an exercise for a smarter dog), I cheated and just sorted post-generation.

def sorted_k_partitions(seq, k):
...
result = generate_partitions(0)

# Sort the parts in each partition in shortlex order
result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result]
# Sort partitions by the length of each part, then lexicographically.
result = sorted(result, key = lambda ps: (*map(len, ps), ps))

return result

Somewhat self-explanatory, except for the key functions. The first one:

key = lambda p: (len(p), p) 

says to sort a sequence by length, then by the sequence itself (which, in Python, are ordered lexicographically by default). The p stands for "part". This is used to sort the parts/groups within a partition. This key means that, for example, (4,) precedes (1, 2, 3), so that [(1, 2, 3), (4,)] is sorted as [(4,), (1, 2, 3)].

key = lambda ps: (*map(len, ps), ps) 
# or for Python versions <3.5: lambda ps: tuple(map(len, ps)) + (ps,)

The ps here stands for "parts", plural. This one says to sort a sequence by the lengths of each of its elements (which must be sequence themselves), then (lexicographically) by the sequence itself. This is used to sort the partitions with respect to each other, so that, for example, [(4,), (1, 2, 3)] precedes [(1, 2), (3, 4)].

The following:

seq = [1, 2, 3, 4]

for k in 1, 2, 3, 4:
for groups in sorted_k_partitions(seq, k):
print(k, groups)

produces:

1 [(1, 2, 3, 4)]
2 [(1,), (2, 3, 4)]
2 [(2,), (1, 3, 4)]
2 [(3,), (1, 2, 4)]
2 [(4,), (1, 2, 3)]
2 [(1, 2), (3, 4)]
2 [(1, 3), (2, 4)]
2 [(1, 4), (2, 3)]
3 [(1,), (2,), (3, 4)]
3 [(1,), (3,), (2, 4)]
3 [(1,), (4,), (2, 3)]
3 [(2,), (3,), (1, 4)]
3 [(2,), (4,), (1, 3)]
3 [(3,), (4,), (1, 2)]
4 [(1,), (2,), (3,), (4,)]

How to split a list into n groups in all possible combinations of group length and elements within group in R?

If you want an easy implementation for the similar objective, you can try listParts from package partitions, e.g.,

> x <- 4

> partitions::listParts(x)
[[1]]
[1] (1,2,3,4)

[[2]]
[1] (1,2,4)(3)

[[3]]
[1] (1,2,3)(4)

[[4]]
[1] (1,3,4)(2)

[[5]]
[1] (2,3,4)(1)

[[6]]
[1] (1,4)(2,3)

[[7]]
[1] (1,2)(3,4)

[[8]]
[1] (1,3)(2,4)

[[9]]
[1] (1,4)(2)(3)

[[10]]
[1] (1,2)(3)(4)

[[11]]
[1] (1,3)(2)(4)

[[12]]
[1] (2,4)(1)(3)

[[13]]
[1] (2,3)(1)(4)

[[14]]
[1] (3,4)(1)(2)

[[15]]
[1] (1)(2)(3)(4)

where x is the number of elements in the set, and all partitions denotes the indices of elements.


If you want to choose the number of partitions, below is a user function that may help

f <- function(x, n) {
res <- listParts(x)
subset(res, lengths(res) == n)
}

such that

> f(x, 2)
[[1]]
[1] (1,2,4)(3)

[[2]]
[1] (1,2,3)(4)

[[3]]
[1] (1,3,4)(2)

[[4]]
[1] (2,3,4)(1)

[[5]]
[1] (1,4)(2,3)

[[6]]
[1] (1,2)(3,4)

[[7]]
[1] (1,3)(2,4)

> f(x, 3)
[[1]]
[1] (1,4)(2)(3)

[[2]]
[1] (1,2)(3)(4)

[[3]]
[1] (1,3)(2)(4)

[[4]]
[1] (2,4)(1)(3)

[[5]]x
[1] (2,3)(1)(4)

[[6]]
[1] (3,4)(1)(2)

Update
Given x <- LETTERS[1:4], we can run

res <- rapply(listParts(length(x)), function(v) x[v], how = "replace")

such that

> res
[[1]]
[1] (A,B,C,D)

[[2]]
[1] (A,B,D)(C)

[[3]]
[1] (A,B,C)(D)

[[4]]
[1] (A,C,D)(B)

[[5]]
[1] (B,C,D)(A)

[[6]]
[1] (A,D)(B,C)

[[7]]
[1] (A,B)(C,D)

[[8]]
[1] (A,C)(B,D)

[[9]]
[1] (A,D)(B)(C)

[[10]]
[1] (A,B)(C)(D)

[[11]]
[1] (A,C)(B)(D)

[[12]]
[1] (B,D)(A)(C)

[[13]]
[1] (B,C)(A)(D)

[[14]]
[1] (C,D)(A)(B)

[[15]]
[1] (A)(B)(C)(D)

How do I split a list into equally-sized chunks?

Here's a generator that yields evenly-sized chunks:

def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

For Python 2, using xrange instead of range:

def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in xrange(0, len(lst), n):
yield lst[i:i + n]

Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

For Python 2:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

Splitting a list into uneven groups?

This solution keeps track of how many items you've written. It will crash if the sum of the numbers in the second_list is longer than mylist

total = 0
listChunks = []
for j in range(len(second_list)):
chunk_mylist = mylist[total:total+second_list[j]]
listChunks.append(chunk_mylist)
total += second_list[j]

After running this, listChunks is a list containing sublists with the lengths found in second_list.

How do you split a list of Size N, into K groups as evenly as possible

You could use numpy.array_split:

import numpy as np 

arr = range(6)
np.array_split(arr, 4)

>>[array([0, 1]), array([2, 3]), array([4]), array([5])]

Python split a list into several lists for each new line character

I would recommend splitting your list with more_itertools.split_at.

Because your original list ends with the separator, '\n', splitting it will result in the final item of your list being an empty sublist. The if check excludes this.

from more_itertools import split_at

original = [
'chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n',
'27\n',
'\n',
'chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n',
'27\n',
'\n'
]

processed = [
[item.rstrip() for item in sublist]
for sublist in split_at(original, lambda i: i == '\n')
if sublist
]

print(processed)

Output (line break added for clarity):

[['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']]


Related Topics



Leave a reply



Submit