How to Randomly Partition a List into N Nearly Equal Parts

How to randomly partition a list into n nearly equal parts?

Call random.shuffle() on the list before partitioning it.

Splitting a list into N parts of approximately equal length

This code is broken due to rounding errors. Do not use it!!!

assert len(chunkIt([1,2,3], 10)) == 10  # fails

Here's one that could work:

def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0

while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg

return out

Testing:

>>> chunkIt(range(10), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
>>> chunkIt(range(11), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]]
>>> chunkIt(range(12), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]

How to split a list into two random parts

While using sets is cooler, you can as well shuffle the list of 12 players and slice it:

import random

all_players = list(range(12))

random.shuffle(all_players)

print(all_players[:6])
print(all_players[6:])

Output:

[3, 7, 10, 11, 0, 2]
[4, 8, 5, 6, 9, 1]

Especially if you need to do this multiple times you avoid creating multiple sets/lists over and over, instead you have one 12 element list as datastore.


Timings:

import random

for l in range(12,30,2):

def shuffle():
all_players = list(range(l))
random.shuffle(all_players)
return all_players[: l // 2], all_players[l // 2 :]

def sets():
all_players = set(range(l))
team1 = set(random.sample(all_players, l//2))
return team1, all_players - team1

from timeit import timeit

print(l, timeit(shuffle, number=10000))
print(l, timeit(sets, number=10000), "\n")

Output:

12 0.27789219999999994   # shuffle marginally faster
12 0.2809480000000001 # sets

14 0.3270378999999999 # still less memory but slower
14 0.3056880999999998 # sets faster

[...]

26 0.6052818999999996
26 0.4748621000000002

28 0.6143755999999998
28 0.49672119999999964

Split a list into n randomly sized chunks

np.split is still the way to go. If you pass in a sequence of integers, split will treat them as cut points. Generating random cut points is easy. You can do something like

P = 10
I = 5

data = np.arange(P) + 1
indices = np.random.randint(P, size=I - 1)

You want I - 1 cut points to get I chunks. The indices need to be sorted, and duplicates need to be removed. np.unique does both for you. You may end up with fewer than I chunks this way:

result = np.split(data, indices)

If you absolutely need to have I numbers, choose without resampling. That can be implemented for example via np.shuffle:

indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
indices.sort()

Slicing a list into n nearly-equal-length partitions

def partition(lst, n):
division = len(lst) / float(n)
return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]

>>> partition([1,2,3,4,5],5)
[[1], [2], [3], [4], [5]]
>>> partition([1,2,3,4,5],2)
[[1, 2, 3], [4, 5]]
>>> partition([1,2,3,4,5],3)
[[1, 2], [3, 4], [5]]
>>> partition(range(105), 10)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]

Python 3 version:

def partition(lst, n):
division = len(lst) / n
return [lst[round(division * i):round(division * (i + 1))] for i in range(n)]

How to divide a list into n equal parts, python

One-liner returning a list of lists, given a list and the chunk size:

>>> lol = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)]

Testing:

>>> x = range(20, 36)
>>> print x
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]

>>> lol(x, 4)
[[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]]

>>> lol(x, 7)
[[20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33],
[34, 35]]

Update:

I think the question is really asking is a function which, given a list and a number, returns a list containing $(number) lists, with the items of the original list evenly distributed. So your example of lol(x, 7) should really return [[20,21,22], [23,24,25], [26,27], [28,29], [30,31], [32,33], [34,35]]. – markrian

Well, in this case, you can try:

def slice_list(input, size):
input_size = len(input)
slice_size = input_size / size
remain = input_size % size
result = []
iterator = iter(input)
for i in range(size):
result.append([])
for j in range(slice_size):
result[i].append(iterator.next())
if remain:
result[i].append(iterator.next())
remain -= 1
return result

I'm sure this can be improved but I'm feeling lazy. :-)

>>> slice_list(x, 7)
[[20, 21, 22], [23, 24, 25],
[26, 27], [28, 29],
[30, 31], [32, 33],
[34, 35]]

Randomly dividing a list into chunks of two or three items

As a more general approach you can use following function:

from itertools import count
import random
def my_split(lst, chunks):
def chunk_creator():
total = 0
while total <= len(lst):
x = random.choice(chunks)
yield L[total: x + total]
total += x
yield total - x

def chunk_finder():
for _ in count():
chunk = list(chunk_creator())
total = chunk.pop(-1)
if total == len(L):
return chunk[:-1]
if max(chunks) <= len(L):
return chunk_finder()
else:
return None

Demo :

>>> L = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] 
>>> my_split(L, (2, 3))
... [[1, 1], [1, 1], [1, 1], [1, 1, 1], [1, 1, 1]]
>>> my_split(L, (2, 3))
... [[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]]

Explanation:

This function is consist of to sub functions. First one is chunk_creator which it's job is creating the desire chunks based on the length of your list and return them as an iterator. Note that, the end value is the total variable which is the sum of preceding chunks.

The second function (chunk_finder) will find the desire chunks for us by goring through an in infinity loop (itertools.count()) and checking if the value of total is equal with the length of input list.

Partition array into N random chunks of different sizes with Numpy

One way to ensure that every element of a is contained in exactly one chunk would be to create a random permutation of a first and then split it with np.split.

In order to get an array of splitting indices for np.split from chunk_size you can use np.cumsum.

Example

>>> import numpy as np
>>> np.random.seed(13)
>>> a = np.arange(20)
>>> b = np.random.permutation(a)
>>> b
array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13,
3, 17, 9, 4, 2, 6, 19, 10, 16, 18])

>>> chunk_size = [10, 5, 3, 2]
>>> np.cumsum(chunk_size)
array([10, 15, 18, 20])

>>> np.split(b, np.cumsum(chunk_size))
[array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13]),
array([ 3, 17, 9, 4, 2]), array([ 6, 19, 10]), array([16, 18]),
array([], dtype=int64)]

You could avoid the trailing empty array by omitting the last value in chunk_size, as it is implied by the size of a and the sum of the previous values:

>>> np.split(b, np.cumsum(chunk_size[:-1]))  # [10, 5, 3] -- 2 is implied
[array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13]),
array([ 3, 17, 9, 4, 2]), array([ 6, 19, 10]), array([16, 18])]

Randomly divide df in list of df into equal subsets

The Problem resulting in your different group-sizes is a cut-thing. It does always need a hard interval-border on one side and I don't really know how to do that in your case.
You could solve your problem with gl, just ignore the warnings.
And when you randomize the generated levels before you apply them, you're there.

set.seed(0L)
AB_df = data.frame(replicate(2,sample(0:130,1624,rep=TRUE)))
BC_df = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
DE_df = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
FG_df = data.frame(replicate(2,sample(0:130,1729,rep=TRUE)))

AB_pc = data.frame(replicate(2,sample(0:130,1624,rep=TRUE)))
BC_pc = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
DE_pc = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
FG_pc = data.frame(replicate(2,sample(0:130,1729,rep=TRUE)))

df_list = list(AB_df, BC_df, DE_df, FG_df, AB_pc, BC_pc, DE_pc, FG_pc)
names(df_list) = c("AB_df", "BC_df", "DE_df", "FG_df", "AB_pc", "BC_pc", "DE_pc", "FG_pc")

#the number of groups you want to generate
subs <- 4

splittedList <- lapply(df_list,
function(df){
idx <- gl(n = subs,round(nrow(df)/subs))
split(df, sample(idx))# randomize the groups
})
#> Warning in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...):
#> data length is not a multiple of split variable

#> Warning in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...):
#> data length is not a multiple of split variable

## the groups are appr. equally sized:
lapply(splittedList,function(l){sapply(l,nrow)})
#> $AB_df
#> 1 2 3 4
#> 406 406 406 406
#>
#> $BC_df
#> 1 2 3 4
#> 414 414 414 414
#>
#> $DE_df
#> 1 2 3 4
#> 414 414 414 414
#>
#> $FG_df
#> 1 2 3 4
#> 432 432 433 432
#>
#> $AB_pc
#> 1 2 3 4
#> 406 406 406 406
#>
#> $BC_pc
#> 1 2 3 4
#> 414 414 414 414
#>
#> $DE_pc
#> 1 2 3 4
#> 414 414 414 414
#>
#> $FG_pc
#> 1 2 3 4
#> 432 432 433 432

## and the sizes are right:
sapply(df_list,nrow)
#> AB_df BC_df DE_df FG_df AB_pc BC_pc DE_pc FG_pc
#> 1624 1656 1656 1729 1624 1656 1656 1729

sapply(splittedList,function(l){sum(sapply(l,nrow))})
#> AB_df BC_df DE_df FG_df AB_pc BC_pc DE_pc FG_pc
#> 1624 1656 1656 1729 1624 1656 1656 1729


Related Topics



Leave a reply



Submit