Algorithm for Distributing a Number Between Certain Number of Chunks

Distribute a quantity randomly

For this problem, the first thing I would try is this:

  1. Generate N-1 random numbers between 0 and 1
  2. Sort them
  3. Raise them to the xth power
  4. Multiply the N differences between 0, successive numbers, and 1, by the quantity you want to distribute. Of course all these differences add up to 1, so you'll end up distributing exactly the target quantity.

A nice advantage of this method is that you can adjust the parameter x to get an aesthetically pleasing distribution of chunks. Natural explosions won't produce a uniform distribution of chunk sizes, so you'll want to play with this.

Evenly distribute amount with limitations

Here is a O(nlog(n)) version of the algorithm you proposed which is O(n^2).

ordered_nodes = order_nodes_by_max_value_increasing(nodes)
remaining = x
for i, node in enumerate(ordered_nodes):
node.value = min(remaining/(n-i), node.max_value)
remaining -= node.value

Note that when remaining > 0 at the end all maximal values are hit. The runtime O(nlog(n)) comes from the sorting. And the result is the same as the one from your algorithm, which is optimal with respect to the variance.

I use enumerate like the python version of it. It enumerates through the objects starting at 0.

Parallel algorithm for set splitting

I think you're trying to do something more complicated than necessary - do you actually need an exact solution (global optimum)? Regarding the heuristic solution, I had to do something along these lines in the past so here's my take on it:

Reformulate the problem as follows: You have a vector with given mean ('global mean') and you want to break it into chunks such that means of each individual chunk will be as close as possible to the 'global mean'.

Just divide it into chunks randomly and then iteratively swap elements between the chunks until you get acceptable results. You can experiment with different ways how to do it, here I'm just reshuffling elements of chunks with the minimum at maximum 'chunk-mean'.

In general, the bigger the chunk is, the easier it becomes, because the first random split would already give you not-so-bad solution (think sample means).

How big is your input list? I tested this with 100000 elements input (uniform distribution integers). With 50 2000-elements chunks you get the result instantly, with 2000 50-elements chunks you need to wait <1min.

import numpy as np

my_numbers = np.random.randint(10000, size=100000)
chunks = 50
iter_limit = 10000
desired_mean = my_numbers.mean()
accepatable_range = 0.1

split = np.array_split(my_numbers, chunks)

for i in range(iter_limit):
split_means = np.array([array.mean() for array in split]) # this can be optimized, some of the means are known
current_min = split_means.min()
current_max = split_means.max()
mean_diff = split_means.ptp()
if(i % 100 == 0 or mean_diff <= accepatable_range):
print("Iter: {}, Desired: {}, Min {}, Max {}, Range {}".format(i, desired_mean, current_min, current_max, mean_diff))
if mean_diff <= accepatable_range:
print('Acceptable solution found')
break
min_index = split_means.argmin()
max_index = split_means.argmax()
if max_index < min_index:
merged = np.hstack((split.pop(min_index), split.pop(max_index)))
else:
merged = np.hstack((split.pop(max_index), split.pop(min_index)))
reshuffle_range = mean_diff+1
while reshuffle_range > mean_diff:
# this while just ensures that you're not getting worse split, either the same or better
np.random.shuffle(merged)
modified_arrays = np.array_split(merged, 2)
reshuffle_range = np.array([array.mean() for array in modified_arrays]).ptp()
split += modified_arrays

Largest divisor such that two numbers divided by it round to the same value?

Here I would like to give another heuristic, which is different from btilly's.

The task is to find integers m and n such that m / n <= j < k <= (m + 1) / n, with n as big as possible (but still under M).

Intuitively, it is preferable that the fraction m / n is close to j. This leads to the idea of using continued fractions.

The algorithm that I propose is quite simple:

  1. calculate all the continued fractions of j using minus signs (so that the fractions are always approching j from above), until the denominator exceeds M;
  2. for each such fraction m / n, find the biggest integer i >= 0 such that k <= (m * i + 1) / (n * i) and n * i <= M, and replace the fraction m / n with (m * i) / (n * i);
  3. among all the fractions in 2, find the one with biggest denominator.

The algorithm is not symmetric in j and k. Hence there is a similar k-version, which in general should not give the same answer, so that you can choose the bigger one from the two results.


Example: Here I will take btilly's example: j = 0.6 and k = 0.65, but I will take M = 10.

I will first go through the j-procedure. To calculate the continued fraction expansion of j, we compute:

  0.6
= 0 + 0.6
= 0 + 1 / (2 - 0.3333)
= 0 + 1 / (2 - 1 / (3 - 0))

Since 0.6 is a rational number, the expansion terminates in fintely many steps. The corresponding fractions are:

0 = 0 / 1
0 + 1 / 2 = 1 / 2
0 + 1 / (2 - 1 / 3) = 3 / 5

Computing the corresponding i values in step 2, we replaces the three factions with:

0 / 1 = 0 / 1
1 / 2 = 3 / 6
3 / 5 = 6 / 10

The biggest denominator is given by 6 / 10.


Continue with the example above, the corresponding k-procedure goes as follows:

  0.65
= 1 - 0.35
= 1 - 1 / (3 - 0.1429)
= 1 - 1 / (3 - 1 / (7 - 0))

Hence the corresponding fractions:

1 = 1 / 1
1 - 1 / 3 = 2 / 3
1 - 1 / (3 - 1 / 7) = 13 / 20

Passing step 2, we get:

1 / 1 = 2 / 2
2 / 3 = 6 / 9
13 / 20 = 0 / 0 (this is because 20 is already bigger than M = 10)

The biggest denominator is given by 6 / 9.


EDIT: experimental results.

To my surprise, the algorithm works better than I thought.

I did the following experiment, with the bound M ignored (equivalently, one can take M big enough).

In every round, I generate a pair (j, k) of uniformly distributed random numbers in the inteval [0, 1) with j < k. If the difference k - j is smaller than 1e-4, I discard this pair, making this round ineffective. Otherwise I calculate the true result trueN using naive algorithm, and calculate the heuristic result heurN using my algorithm, and add them to statistic data. This goes for 1e6 rounds.

Here is the result:

effective round     =  999789
sum of trueN = 14013312
sum of heurN = 13907575
correct percentage = 99.2262 %
average quotient = 0.999415

The correct percentage is the percentage of effective rounds such that trueN is equal to heurN, and the average quotient is the average of the quotient heurN / trueN for all effective rounds.

Thus the method gives the correct answer in 99%+ cases.

I also did experiments with smaller M values, and the results are similar.

Splitting values into similarly distributed evenly sized groups

This is a variation on what @m.raynal came up with that will work well even when n is just a fairly small multiple of k.

  1. Sort the elements from smallest to largest.
  2. Create k empty groups.
  3. Put them into a Priority Queue sorted from least elements to most, then largest sum to smallest. (So the next element is always the one with the largest sum among all of those with the fewest elements.)
  4. For each element, take a group off of the priority queue, add that element, put the group back in the priority queue.

In practice this means that the first k elements go to groups randomly, the next k elements go in reverse order. And then it gets clever about keeping things balanced.

Depending on your application, the fact that the bottom two values are spaced predictably far apart could be a problem. If that is the case then you could complicate this by going "middle out". But that scheme is much more complicated.



Related Topics



Leave a reply



Submit