Slicing a List into N Nearly-Equal-Length Partitions

Splitting a list into N parts of approximately equal length

This code is broken due to rounding errors. Do not use it!!!

assert len(chunkIt([1,2,3], 10)) == 10  # fails

Here's one that could work:

def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0

while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg

return out

Testing:

>>> chunkIt(range(10), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
>>> chunkIt(range(11), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]]
>>> chunkIt(range(12), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]

Slicing a list into n nearly-equal-length partitions

def partition(lst, n):
division = len(lst) / float(n)
return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]

>>> partition([1,2,3,4,5],5)
[[1], [2], [3], [4], [5]]
>>> partition([1,2,3,4,5],2)
[[1, 2, 3], [4, 5]]
>>> partition([1,2,3,4,5],3)
[[1, 2], [3, 4], [5]]
>>> partition(range(105), 10)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]

Python 3 version:

def partition(lst, n):
division = len(lst) / n
return [lst[round(division * i):round(division * (i + 1))] for i in range(n)]

How to randomly partition a list into n nearly equal parts?

Call random.shuffle() on the list before partitioning it.

Splitting a list into N parts of approximately equal length

This code is broken due to rounding errors. Do not use it!!!

assert len(chunkIt([1,2,3], 10)) == 10  # fails

Here's one that could work:

def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0

while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg

return out

Testing:

>>> chunkIt(range(10), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
>>> chunkIt(range(11), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]]
>>> chunkIt(range(12), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]

How do I split a list into equally-sized chunks?

Here's a generator that yields evenly-sized chunks:

def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

For Python 2, using xrange instead of range:

def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in xrange(0, len(lst), n):
yield lst[i:i + n]

Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

For Python 2:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

Split a list of numbers into n chunks such that the chunks have (close to) equal sums and keep the original order

This approach defines partition boundaries that divide the array in roughly equal numbers of elements, and then repeatedly searches for better partitionings until it can't find any more. It differs from most of the other posted solutions in that it looks to find an optimal solution by trying multiple different partitionings. The other solutions attempt to create a good partition in a single pass through the array, but I can't think of a single pass algorithm that's guaranteed optimal.

The code here is an efficient implementation of this algorithm, but it can be hard to understand so a more readable version is included as an addendum at the end.

def partition_list(a, k):
if k <= 1: return [a]
if k >= len(a): return [[x] for x in a]
partition_between = [(i+1)*len(a)/k for i in range(k-1)]
average_height = float(sum(a))/k
best_score = None
best_partitions = None
count = 0

while True:
starts = [0]+partition_between
ends = partition_between+[len(a)]
partitions = [a[starts[i]:ends[i]] for i in range(k)]
heights = map(sum, partitions)

abs_height_diffs = map(lambda x: abs(average_height - x), heights)
worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
worst_height_diff = average_height - heights[worst_partition_index]

if best_score is None or abs(worst_height_diff) < best_score:
best_score = abs(worst_height_diff)
best_partitions = partitions
no_improvements_count = 0
else:
no_improvements_count += 1

if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
return best_partitions
count += 1

move = -1 if worst_height_diff < 0 else 1
bound_to_move = 0 if worst_partition_index == 0\
else k-2 if worst_partition_index == k-1\
else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
else worst_partition_index
direction = -1 if bound_to_move < worst_partition_index else 1
partition_between[bound_to_move] += move * direction

def print_best_partition(a, k):
print 'Partitioning {0} into {1} partitions'.format(a, k)
p = partition_list(a, k)
print 'The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p))

a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2)
print_best_partition(a, 3)
print_best_partition(a, 4)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

There may be some modifications to make depending on what you are doing with this. For example, to determine whether the best partitioning has been found, this algorithm stops when there is no height difference among partitions, it doesn't find anything better than the best thing it's seen for more than 5 iterations in a row, or after 100 total iterations as a catch-all stopping point. You may need to adjust those constants or use a different scheme. If your heights form a complex landscape of values, knowing when to stop can get into classic problems of trying to escape local maxima and things like that.

Output

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]

Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]

Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]

Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]

Edit

Added the new test case, [95, 15, 75, 25, 85, 5], which this method handles correctly.

Addendum

This version of the algorithm is easier to read and understand, but is a bit longer due to taking less advantage of built-in Python features. It seems to execute in a comparable or even slightly faster amount of time, however.

#partition list a into k partitions
def partition_list(a, k):
#check degenerate conditions
if k <= 1: return [a]
if k >= len(a): return [[x] for x in a]
#create a list of indexes to partition between, using the index on the
#left of the partition to indicate where to partition
#to start, roughly partition the array into equal groups of len(a)/k (note
#that the last group may be a different size)
partition_between = []
for i in range(k-1):
partition_between.append((i+1)*len(a)/k)
#the ideal size for all partitions is the total height of the list divided
#by the number of paritions
average_height = float(sum(a))/k
best_score = None
best_partitions = None
count = 0
no_improvements_count = 0
#loop over possible partitionings
while True:
#partition the list
partitions = []
index = 0
for div in partition_between:
#create partitions based on partition_between
partitions.append(a[index:div])
index = div
#append the last partition, which runs from the last partition divider
#to the end of the list
partitions.append(a[index:])
#evaluate the partitioning
worst_height_diff = 0
worst_partition_index = -1
for p in partitions:
#compare the partition height to the ideal partition height
height_diff = average_height - sum(p)
#if it's the worst partition we've seen, update the variables that
#track that
if abs(height_diff) > abs(worst_height_diff):
worst_height_diff = height_diff
worst_partition_index = partitions.index(p)
#if the worst partition from this run is still better than anything
#we saw in previous iterations, update our best-ever variables
if best_score is None or abs(worst_height_diff) < best_score:
best_score = abs(worst_height_diff)
best_partitions = partitions
no_improvements_count = 0
else:
no_improvements_count += 1
#decide if we're done: if all our partition heights are ideal, or if
#we haven't seen improvement in >5 iterations, or we've tried 100
#different partitionings
#the criteria to exit are important for getting a good result with
#complex data, and changing them is a good way to experiment with getting
#improved results
if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
return best_partitions
count += 1
#adjust the partitioning of the worst partition to move it closer to the
#ideal size. the overall goal is to take the worst partition and adjust
#its size to try and make its height closer to the ideal. generally, if
#the worst partition is too big, we want to shrink the worst partition
#by moving one of its ends into the smaller of the two neighboring
#partitions. if the worst partition is too small, we want to grow the
#partition by expanding the partition towards the larger of the two
#neighboring partitions
if worst_partition_index == 0: #the worst partition is the first one
if worst_height_diff < 0: partition_between[0] -= 1 #partition too big, so make it smaller
else: partition_between[0] += 1 #partition too small, so make it bigger
elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
if worst_height_diff < 0: partition_between[-1] += 1 #partition too small, so make it bigger
else: partition_between[-1] -= 1 #partition too big, so make it smaller
else: #the worst partition is in the middle somewhere
left_bound = worst_partition_index - 1 #the divider before the partition
right_bound = worst_partition_index #the divider after the partition
if worst_height_diff < 0: #partition too big, so make it smaller
if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the right bigger
partition_between[right_bound] -= 1
else: #the partition on the left is smaller than the one on the right, so make the one on the left bigger
partition_between[left_bound] += 1
else: #partition too small, make it bigger
if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
partition_between[left_bound] -= 1
else: #the partition on the left is smaller than the one on the right, so make the one on the right smaller
partition_between[right_bound] += 1

def print_best_partition(a, k):
#simple function to partition a list and print info
print ' Partitioning {0} into {1} partitions'.format(a, k)
p = partition_list(a, k)
print ' The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p))

#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2)
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

Split a string into N equal parts?

import textwrap
print(textwrap.wrap("123456789", 2))
#prints ['12', '34', '56', '78', '9']

Note: be careful with whitespace etc - this may or may not be what you want.

"""Wrap a single paragraph of text, returning a list of wrapped lines.

Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines. By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space. See TextWrapper class for available keyword args to customize
wrapping behaviour.
"""

Best way to separate range into n equal ranges in Python

It is surely simpler to calculate the start and stop numbers directly rather than slicing a range object to get them:

def ranges(N, nb):
step = N / nb
return ["{},{}".format(round(step*i), round(step*(i+1))) for i in range(nb)]

This is not as much more efficient than your code that it might look because slicing a range object takes only O(1) time, so your existing code is already asymptotically optimal. My version likely improves performance by some constant factor, but it may be small. I do think my version is also a lot clearer though, which may be more important than whatever performance change there might be.



Related Topics



Leave a reply



Submit