How to randomly partition a list into n nearly equal parts?
Call random.shuffle()
on the list before partitioning it.
Splitting a list into N parts of approximately equal length
This code is broken due to rounding errors. Do not use it!!!
assert len(chunkIt([1,2,3], 10)) == 10 # fails
Here's one that could work:
def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0
while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg
return out
Testing:
>>> chunkIt(range(10), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
>>> chunkIt(range(11), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]]
>>> chunkIt(range(12), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
How to split a list into two random parts
While using sets is cooler, you can as well shuffle the list of 12 players and slice it:
import random
all_players = list(range(12))
random.shuffle(all_players)
print(all_players[:6])
print(all_players[6:])
Output:
[3, 7, 10, 11, 0, 2]
[4, 8, 5, 6, 9, 1]
Especially if you need to do this multiple times you avoid creating multiple sets/lists over and over, instead you have one 12 element list as datastore.
Timings:
import random
for l in range(12,30,2):
def shuffle():
all_players = list(range(l))
random.shuffle(all_players)
return all_players[: l // 2], all_players[l // 2 :]
def sets():
all_players = set(range(l))
team1 = set(random.sample(all_players, l//2))
return team1, all_players - team1
from timeit import timeit
print(l, timeit(shuffle, number=10000))
print(l, timeit(sets, number=10000), "\n")
Output:
12 0.27789219999999994 # shuffle marginally faster
12 0.2809480000000001 # sets
14 0.3270378999999999 # still less memory but slower
14 0.3056880999999998 # sets faster
[...]
26 0.6052818999999996
26 0.4748621000000002
28 0.6143755999999998
28 0.49672119999999964
Split a list into n randomly sized chunks
np.split
is still the way to go. If you pass in a sequence of integers, split
will treat them as cut points. Generating random cut points is easy. You can do something like
P = 10
I = 5
data = np.arange(P) + 1
indices = np.random.randint(P, size=I - 1)
You want I - 1
cut points to get I
chunks. The indices need to be sorted, and duplicates need to be removed. np.unique
does both for you. You may end up with fewer than I
chunks this way:
result = np.split(data, indices)
If you absolutely need to have I
numbers, choose without resampling. That can be implemented for example via np.shuffle
:
indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
indices.sort()
Slicing a list into n nearly-equal-length partitions
def partition(lst, n):
division = len(lst) / float(n)
return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]
>>> partition([1,2,3,4,5],5)
[[1], [2], [3], [4], [5]]
>>> partition([1,2,3,4,5],2)
[[1, 2, 3], [4, 5]]
>>> partition([1,2,3,4,5],3)
[[1, 2], [3, 4], [5]]
>>> partition(range(105), 10)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]
Python 3 version:
def partition(lst, n):
division = len(lst) / n
return [lst[round(division * i):round(division * (i + 1))] for i in range(n)]
How to divide a list into n equal parts, python
One-liner returning a list of lists, given a list and the chunk size:
>>> lol = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)]
Testing:
>>> x = range(20, 36)
>>> print x
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
>>> lol(x, 4)
[[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]]
>>> lol(x, 7)
[[20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33],
[34, 35]]
Update:
I think the question is really asking is a function which, given a list and a number, returns a list containing $(number) lists, with the items of the original list evenly distributed. So your example of lol(x, 7) should really return [[20,21,22], [23,24,25], [26,27], [28,29], [30,31], [32,33], [34,35]]. – markrian
Well, in this case, you can try:
def slice_list(input, size):
input_size = len(input)
slice_size = input_size / size
remain = input_size % size
result = []
iterator = iter(input)
for i in range(size):
result.append([])
for j in range(slice_size):
result[i].append(iterator.next())
if remain:
result[i].append(iterator.next())
remain -= 1
return result
I'm sure this can be improved but I'm feeling lazy. :-)
>>> slice_list(x, 7)
[[20, 21, 22], [23, 24, 25],
[26, 27], [28, 29],
[30, 31], [32, 33],
[34, 35]]
Randomly dividing a list into chunks of two or three items
As a more general approach you can use following function:
from itertools import count
import random
def my_split(lst, chunks):
def chunk_creator():
total = 0
while total <= len(lst):
x = random.choice(chunks)
yield L[total: x + total]
total += x
yield total - x
def chunk_finder():
for _ in count():
chunk = list(chunk_creator())
total = chunk.pop(-1)
if total == len(L):
return chunk[:-1]
if max(chunks) <= len(L):
return chunk_finder()
else:
return None
Demo :
>>> L = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>> my_split(L, (2, 3))
... [[1, 1], [1, 1], [1, 1], [1, 1, 1], [1, 1, 1]]
>>> my_split(L, (2, 3))
... [[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]]
Explanation:
This function is consist of to sub functions. First one is chunk_creator which it's job is creating the desire chunks based on the length of your list and return them as an iterator. Note that, the end value is the total
variable which is the sum of preceding chunks.
The second function (chunk_finder
) will find the desire chunks for us by goring through an in infinity loop (itertools.count()
) and checking if the value of total
is equal with the length of input list.
Partition array into N random chunks of different sizes with Numpy
One way to ensure that every element of a
is contained in exactly one chunk would be to create a random permutation of a
first and then split it with np.split
.
In order to get an array of splitting indices for np.split
from chunk_size
you can use np.cumsum
.
Example
>>> import numpy as np
>>> np.random.seed(13)
>>> a = np.arange(20)
>>> b = np.random.permutation(a)
>>> b
array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13,
3, 17, 9, 4, 2, 6, 19, 10, 16, 18])
>>> chunk_size = [10, 5, 3, 2]
>>> np.cumsum(chunk_size)
array([10, 15, 18, 20])
>>> np.split(b, np.cumsum(chunk_size))
[array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13]),
array([ 3, 17, 9, 4, 2]), array([ 6, 19, 10]), array([16, 18]),
array([], dtype=int64)]
You could avoid the trailing empty array by omitting the last value in chunk_size
, as it is implied by the size of a
and the sum of the previous values:
>>> np.split(b, np.cumsum(chunk_size[:-1])) # [10, 5, 3] -- 2 is implied
[array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13]),
array([ 3, 17, 9, 4, 2]), array([ 6, 19, 10]), array([16, 18])]
Randomly divide df in list of df into equal subsets
The Problem resulting in your different group-sizes is a cut-thing. It does always need a hard interval-border on one side and I don't really know how to do that in your case.
You could solve your problem with gl
, just ignore the warnings.
And when you randomize the generated levels before you apply them, you're there.
set.seed(0L)
AB_df = data.frame(replicate(2,sample(0:130,1624,rep=TRUE)))
BC_df = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
DE_df = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
FG_df = data.frame(replicate(2,sample(0:130,1729,rep=TRUE)))
AB_pc = data.frame(replicate(2,sample(0:130,1624,rep=TRUE)))
BC_pc = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
DE_pc = data.frame(replicate(2,sample(0:130,1656,rep=TRUE)))
FG_pc = data.frame(replicate(2,sample(0:130,1729,rep=TRUE)))
df_list = list(AB_df, BC_df, DE_df, FG_df, AB_pc, BC_pc, DE_pc, FG_pc)
names(df_list) = c("AB_df", "BC_df", "DE_df", "FG_df", "AB_pc", "BC_pc", "DE_pc", "FG_pc")
#the number of groups you want to generate
subs <- 4
splittedList <- lapply(df_list,
function(df){
idx <- gl(n = subs,round(nrow(df)/subs))
split(df, sample(idx))# randomize the groups
})
#> Warning in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...):
#> data length is not a multiple of split variable
#> Warning in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...):
#> data length is not a multiple of split variable
## the groups are appr. equally sized:
lapply(splittedList,function(l){sapply(l,nrow)})
#> $AB_df
#> 1 2 3 4
#> 406 406 406 406
#>
#> $BC_df
#> 1 2 3 4
#> 414 414 414 414
#>
#> $DE_df
#> 1 2 3 4
#> 414 414 414 414
#>
#> $FG_df
#> 1 2 3 4
#> 432 432 433 432
#>
#> $AB_pc
#> 1 2 3 4
#> 406 406 406 406
#>
#> $BC_pc
#> 1 2 3 4
#> 414 414 414 414
#>
#> $DE_pc
#> 1 2 3 4
#> 414 414 414 414
#>
#> $FG_pc
#> 1 2 3 4
#> 432 432 433 432
## and the sizes are right:
sapply(df_list,nrow)
#> AB_df BC_df DE_df FG_df AB_pc BC_pc DE_pc FG_pc
#> 1624 1656 1656 1729 1624 1656 1656 1729
sapply(splittedList,function(l){sum(sapply(l,nrow))})
#> AB_df BC_df DE_df FG_df AB_pc BC_pc DE_pc FG_pc
#> 1624 1656 1656 1729 1624 1656 1656 1729
Related Topics
How to Display Index During List Iteration With Django
How to Select Percentage of Rows in Pandas Dataframe
Sys.Path Different in Jupyter and Python - How to Import Own Modules in Jupyter
How to Upgrade the Sqlite Version Used by Python'S Sqlite3 Module on Mac
How to Make Multiple Empty Lists in Python
Test a Function Called Twice in Python
Pyqt: Getting Widgets to Resize Automatically in a Qdialog
Index Out of Bounds Error:Python
How to Calculate a Gaussian Kernel Matrix Efficiently in Numpy
Pandas: Calculate the Percentage Between Two Rows and Add the Value as a Column
Find All CSV Files in a Directory Using Python
Pandas - Find Index of Value Anywhere in Dataframe
Compare Two Lists and Find the Unique Values
How to Increment a Variable on a for Loop in Jinja Template
Inserting a Python Datetime.Datetime Object into MySQL