Random Weighted Choice

A weighted version of random.choice

Since version 1.7.0, NumPy has a choice function that supports probability distributions.

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
              p=probability_distribution)

Note that probability_distribution is a sequence in the same order of list_of_candidates. You can also use the keyword replace=False to change the behavior so that drawn items are not replaced.

Random weighted choice

Your algorithm is nearly correct. However, the test should be < instead of <=:

if (randomNumber < broker.Weight)

This is because 0 is inclusive in the random number while totalWeight is exclusive. In other words, a broker with weight 0 would still have a small chance of being selected – not at all what you want. This accounts for broker A having more hits than broker D.

Other than that, your algorithm is fine and in fact the canonical way of solving this problem.

Weighted random selection from array

Compute the discrete cumulative density function (CDF) of your list -- or in simple terms the array of cumulative sums of the weights. Then generate a random number in the range between 0 and the sum of all weights (might be 1 in your case), do a binary search to find this random number in your discrete CDF array and get the value corresponding to this entry -- this is your weighted random number.

Random weighted choice

Generate a Cumulative Distribution Function for each ID1 thus:

cdfs = defaultdict()
for id1,id2,val in d:
    prevtotal = cdfs[id1][-1][0]
    newtotal = prevtotal + val
    cdfs[id1].append( (newtotal,id2) )

So you will have

cdfs = { 701 : [ (0.2,1), (0.5,2), (1.0,3) ], 
         702 : [ (0.2,1), (0.5,2) ],
         703 : [ (0.5,3) ] }

Then generate a random number and search for it in the list.

def func(id1):
    max = cdfs[id1][-1][0]
    rand = random.random()*max
    for upper,id2 in cdfs[id1]:
        if upper>rand:
            return id2
    return None

Speed up random weighted choice without replacement in python

This is just a comment on jdhesa's answer. The question was if it is useful to consider the case where only one weight is incresed -> Yes it is!

Example

@nb.njit(parallel=True)
def numba_choice_opt(population, weights, k,wc,b_full_wc_calc,ind,value):
    # Get cumulative weights
    if b_full_wc_calc:
        acc=0
        for i in range(weights.shape[0]):
            acc+=weights[i]
            wc[i]=acc
    #Increase only one weight (faster than recalculating the cumulative  weight)
    else:
        weights[ind]+=value
        for i in nb.prange(ind,wc.shape[0]):
            wc[i]+=value

    # Total of weights
    m = wc[-1]
    # Arrays of sample and sampled indices
    sample = np.empty(k, population.dtype)
    sample_idx = np.full(k, -1, np.int32)
    # Sampling loop
    i = 0
    while i < k:
        # Pick random weight value
        r = m * np.random.rand()
        # Get corresponding index
        idx = np.searchsorted(wc, r, side='right')
        # Check index was not selected before
        # If not using Numba you can just do `np.isin(idx, sample_idx)`
        for j in range(i):
            if sample_idx[j] == idx:
                continue
        # Save sampled value and index
        sample[i] = population[idx]
        sample_idx[i] = population[idx]
        i += 1
    return sample

Example

np.random.seed(0)
population = np.random.randint(100, size=1_000_000)
weights = np.random.rand(len(population))
k = 10
wc = np.empty_like(weights)

#Initial calculation 
%timeit numba_choice_opt(population, weights, k,wc,True,0,0)
#1.41 ms ± 9.21 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Increase weight[100] by 3 and calculate
%timeit numba_choice_opt(population, weights, k,wc,False,100,3)
#213 µs ± 6.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#For comparison
#Please note that it is the memory allcocation of wc which makes
#it so much slower than the initial calculation from above
%timeit numba_choice(population, weights, k)
#4.23 ms ± 64.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Weighted random sample without replacement in python

You can use np.random.choice with replace=False as follows:

np.random.choice(vec,size,replace=False, p=P)

where vec is your population and P is the weight vector.

For example:

import numpy as np
vec=[1,2,3]
P=[0.5,0.2,0.3]
np.random.choice(vec,size=2,replace=False, p=P)

Pandas Random Weighted Choice

Stack the DataFrame:

stacked = df.stack()

Normalize the weights (so that they add up to 1):

weights = stacked / stacked.sum()
# As GeoMatt22 pointed out, this part is not necessary. See the other comment.

And then use sample:

stacked.sample(1, weights=weights)
Out: 
1  2    12
dtype: int64

# Or without normalization, stacked.sample(1, weights=stacked)

DataFrame.sample method allows you to either sample from rows or from columns. Consider this:

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05])
Out: 
    0  1   2  3   4   5
1  24  3  12  6  21  15

It selects one row (the first row with 40% chance, the second with 30% chance etc.)

This is also possible:

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05], axis=1)
Out: 
   1
0  5
1  3
2  9
3  1
4  2
5  6

Same process but 40% chance is associated with the first column and we are selecting from columns. However, your question seems to imply that you don't want to select rows or columns - you want to select the cells inside. Therefore, I changed the dimension from 2D to 1D.

df.stack()

Out: 
0  0    40
   1     5
   2    20
   3    10
   4    35
   5    25
1  0    24
   1     3
   2    12
   3     6
   4    21
   5    15
2  0    72
   1     9
   2    36
   3    18
   4    63
   5    45
3  0     8
   1     1
   2     4
   3     2
   4     7
   5     5
4  0    16
   1     2
   2     8
   3     4
   4    14
   5    10
5  0    48
   1     6
   2    24
   3    12
   4    42
   5    30
dtype: int64

So if I now sample from this, I will both sample a row and a column. For example:

df.stack().sample()
Out: 
1  0    24
dtype: int64

selects row 1 and column 0.

Weighted random selection with and without replacement

One of the fastest ways to make many with replacement samples from an unchanging list is the alias method. The core intuition is that we can create a set of equal-sized bins for the weighted list that can be indexed very efficiently through bit operations, to avoid a binary search. It will turn out that, done correctly, we will need to only store two items from the original list per bin, and thus can represent the split with a single percentage.

Let's us take the example of five equally weighted choices, (a:1, b:1, c:1, d:1, e:1)

To create the alias lookup:

Normalize the weights such that they sum to 1.0. (a:0.2 b:0.2 c:0.2 d:0.2 e:0.2) This is the probability of choosing each weight.
Find the smallest power of 2 greater than or equal to the number of variables, and create this number of partitions, |p|. Each partition represents a probability mass of 1/|p|. In this case, we create 8 partitions, each able to contain 0.125.
Take the variable with the least remaining weight, and place as much of it's mass as possible in an empty partition. In this example, we see that a fills the first partition. (p1{a|null,1.0},p2,p3,p4,p5,p6,p7,p8) with (a:0.075, b:0.2 c:0.2 d:0.2 e:0.2)
If the partition is not filled, take the variable with the most weight, and fill the partition with that variable.

Repeat steps 3 and 4, until none of the weight from the original partition need be assigned to the list.

For example, if we run another iteration of 3 and 4, we see

(p1{a|null,1.0},p2{a|b,0.6},p3,p4,p5,p6,p7,p8) with (a:0, b:0.15 c:0.2 d:0.2 e:0.2) left to be assigned

At runtime:

Get a U(0,1) random number, say binary 0.001100000
bitshift it lg2(p), finding the index partition. Thus, we shift it by 3, yielding 001.1, or position 1, and thus partition 2.
If the partition is split, use the decimal portion of the shifted random number to decide the split. In this case, the value is 0.5, and 0.5 < 0.6, so return a.

Here is some code and another explanation, but unfortunately it doesn't use the bitshifting technique, nor have I actually verified it.

Weighted probability random choice array

You can use a trick that clones the origin array to a new array by weighted probability.

You can modify it by:

increase weight on which item you want to show more
decrease weight on which item you want to show less.

You can check the below demo:

const array = [ 1, 2 ,3 ,4 ,5, 6, 7, 8 ]
const weight = [ 8, 7, 6, 5, 4, 3, 2, 1 ];

let randomArray = [];
array.forEach((item, index) => {
   var clone = Array(weight[index]).fill(item);
   randomArray.push(...clone);
});

const result = randomArray[~~(Math.random() * randomArray.length)]

console.log('random value:', result);

Random Weighted Choice

A weighted version of random.choice

Random weighted choice

Weighted random selection from array

Random weighted choice

Speed up random weighted choice without replacement in python

Weighted random sample without replacement in python

Pandas Random Weighted Choice

Weighted random selection with and without replacement

Weighted probability random choice array

Related Topics

Leave a reply