C++ Function for Picking from a List Where Each Element Has a Distinct Probability

C++ function for picking from a list where each element has a distinct probability

float p = (rand() / static_cast<float>(RAND_MAX)) * total_probability;
s* current = &sArray[0];
while ( (p -= current->probability) > 0)
++current;
// `current` now points to your chosen target

Picking a random item based on probabilities

"I have various different sizes of cups of coffee. The larger they are, the more I want to charge for them. I'm having trouble actually figuring out how to assign prices".

This isn't just a programming problem - you've specified that probability increases with value, but you haven't said how it increases with value. Normally, coffee shops don't charge in direct proportion to the amount of coffee. You can't assign probabilities in proportion to value, because some of your values are negative, but probabilities cannot be negative.

Sounds like you need to nail down the problem a bit more before you can write any code.

If you really don't care how probability relates to value, other than that they increase in order of value, then one easy way would be:

  • sort your array
  • assign a probability of 1 to the first element, 2 to the second, and so on.
  • now, your probabilities don't add up to 1, which is a problem. So divide each probability by the total of all the probabilities you have assigned: (1 + 2 + .. + n) = n(n+1)/2. This is called "normalization".

Given your list of probabilities, which add up to 1, the easiest way to repeatedly choose one is generally to calculate the cumulative probabilities, which I will demonstrate with an example:

value (sorted):           -12     -3      127    1000000
assigned probability: 0.1 0.2 0.3 0.4
cumulative probability: 0.1 0.3 0.6 1.0

The cumulative probability is defined as the sum of all the probabilities up to that point.

Now, from your random number generator you need a random (floating-point) value between 0 and 1. If it lies between 0 and 0.1, you've picked -12. If it lies between 0.1 and 0.3, you've picked -3, and so on. To figure out which range it lies in, you could walk linearly through your array, or you could do a binary search.

You could skip the normalization step and the use of floating-point, if you wanted. Assign "cumulative probabilities" (1, 3, 6, 10 ...) , but make it understood that the actual probability is the stored integer value divided by n(n+1)/2. Then choose a random integer from 0 to n(n+1)/2 - 1. If it's less than 1, you've selected the first value, else if less than 3 the second, and so on. This may or may not make the code clearer, and your RNG may or may not do well choosing integer values from a large range.

Note that you could have assigned probabilities (0.001, 0.002, 0.003, 0.994) instead of (0.1, 0.2, 0.3, 0.4), and still satisfied your requirement that "the higher the value, the higher the probability".

How to randomly pick element from an array with different probabilities in C++

You are looking for std::discrete_distribution. Forget about rand().

#include <random>
#include <vector>

struct Point {};

int main() {
std::mt19937 gen(std::random_device{}());

std::vector<double> chances{1.0, 2.0, 3.0};
// Initialize to same length.
std::vector<Point> points(chances.size());
// size_t is suitable for indexing.
std::discrete_distribution<std::size_t> d{chances.begin(), chances.end()};

auto sampled_value = points[d(gen)];
}

Conveniently for you, the weights do not have to sum to 1.

Choosing random event with different probabilities

The simplest solution would be to sum and normalize the ranges (as an example, the sum of the values is 195, your first event would get the range [0, 45/195=0.23[, the second one [0.23, 0.23+0.076=0.31], and so on), then extract a number from [0,1[ and look to what range it belongs.

Pick random element from list with probability

Define a class for your items like this:

class Items<T>
{
public double Probability { get; set; }
public T Item { get; set; }
}

then initialize it

var initial = new List<Items<string>>
{
new Items<string> {Probability = 74 / 100.0, Item = "A"},
new Items<string> {Probability = 15 / 100.0, Item = "B"},
new Items<string> {Probability = 7 / 100.0, Item = "C"},
new Items<string> {Probability = 4 / 100.0, Item = "D"},
};

then you need to convert it to aggregate a sum of probabilities from 0 to 1

var converted = new List<Items<string>>(initial.Count);
var sum = 0.0;
foreach (var item in initial.Take(initial.Count - 1))
{
sum += item.Probability;
converted.Add(new Items<string> {Probability = sum, Item = item.Item});
}
converted.Add(new Items<string> {Probability = 1.0, Item = initial.Last().Item});

now you can pick an item from converted collection with respect to probability:

var rnd = new Random();
while (true)
{
var probability = rnd.NextDouble();
var selected = converted.SkipWhile(i => i.Probability < probability).First();
Console.WriteLine($"Selected item = {selected.Item}");
}

NOTE: my implementation have O(n) complexity. You can optimize it with binary search (because values in converted collection are sorted)

Random numbers with different probabilities

You can easily implement this using the rand function:

bool TrueFalse = (rand() % 100) < 75;

The rand() % 100 will give you a random number between 0 and 100, and the probability of it being under 75 is, well, 75%. You can substitute the 75 for any probability you want.

How to pick a number based on probability?

p1 + p2 + ... + pn = 1
p1 = p2 * x
p2 = p3 * x
...
p_n-1 = pn * x

Solving this gives you:

p1 + p2 + ... + pn = 1
(p2 * x) + (p3 * x) + ... + (pn * x) + pn = 1
((p3*x) * x) + ((p4*x) * x) + ... + ((p_n-1*x) * x) + pn = 1
....
pn* (x^(n-1) + x^(n-2) + ... +x^1 + x^0) = 1
pn*(1-x^n)/(1-x) = 1
pn = (1-x)/(1-x^n)

This gives you the probability you need to set to pn, and from it you can calculate the probabilities for all other p1,p2,...p_n-1

Now, you can use a "black box" RNG that chooses a number with a distribution, like those in the threads you mentioned.

A simple approach to do it is to set an auxillary array:

aux[i] = p1 + p2 + ... + pi

Now, draw a random number with uniform distribution between 0 to aux[n], and using binary search (aux array is sorted), get the first value, which matching value in aux is greater than the random uniform number you got


Original answer, for substraction (before question was editted):

For n items, you need to solve the equation:

p1 + p2 + ... + pn = 1
p1 = p2 + x
p2 = p3 + x
...
p_n-1 = pn + x

Solving this gives you:

p1 + p2 + ... + pn = 1
(p2 + x) + (p3 + x) + ... + (pn + x) + pn = 1
((p3+x) + x) + ((p4+x) + x) + ... + ((p_n-1+x) + x) + pn = 1
....
pn* ((n-1)x + (n-2)x + ... +x + 0) = 1
pn* x = n(n-1)/2
pn = n(n-1)/(2x)

This gives you the probability you need to set to pn, and from it you can calculate the probabilities for all other p1,p2,...p_n-1

Now, you can use a "black box" RNG that chooses a number with a distribution, like those in the threads you mentioned.


Be advised, this is not guaranteed you will have a solution such that 0<p_i<1 for all i, but you cannot guarantee one given from your requirements, and it is going to depend on values of n and x to fit.

Picking a random option, where each option has a different probability of being picked

As a first approximation to a more efficient algorithm, if you compute the cumulative distribution function (which is just one pass over the distribution function, computing a running sum), then you can find the position of the randomly chosen integer using a binary search instead of a linear search. This will help if you have a lot of options, since it reduces the search time from O(#options) to O(log #options).

There is an O(1) solution, though. Here's the basic outline.

Let's say we have N options, 1...N, with weights ω1...ωN, where all of the ω values are at least 0. For simplicity, we scale the weights so their mean is 1, or in other words, their sum is N. (We just multiply them by N/Σω. We don't actually have to do this, but it makes the next couple of paragraphs easier to type without MathJax.)

Now, create a vector of N elements, where each element has a two option identifiers (lo and hi) and a cutoff p. The option identifiers are just integers 1...N, and p will be computed as a real number in the range (0, 1.0) inclusive.

We proceed to fill in the vector as follows. For each element i in turn:

  • If some ωj is exactly 1.0, then we set:

       loi = j

       hii = j

       pi = 1.0

    And we remove ωj from the list of weights.

  • Otherwise, there must be some ωj < 1.0 and some ωk > 1.0. (That's because the average weight is 1.0, and none of them have the average value. Some some of them must have less and some of them more, because it is impossible for all elements to be greater than the average or all elements to be less than the average.) Now, we set:

       loi = j

       hii = k

       pi = ωj

       ωk = ωk - (1 - ωj)

    And once again, we remove ωj from the weights.

Note that in both cases, we have removed one weight, and we have reduced the sum of the weights by 1.0. So the average weight is still 1.0.

We continue in this fashion until the entire vector is filled. (The last element will have p = 1.0).

Given this vector, we can select a weighted random option as follows:

  • Generate a random integer i in the range 1...N and a random floating point value r in the range (0, 1.0]. If r < pi then we select option loi; otherwise, we select option hii.

It should be clear why this works from the construction of the vector. The weights of each above-average-weight option are distributed amongst the various vector elements, while each below-average-weight option is assigned to one part of some vector element with a corresponding probability of selection.

In a real implementation, we would map the range of weights onto integer values, and make the total weights close to the maximum integer (it has to be a multiple of N, so there will be some slosh.) We can then select a slot and select the weight inside the slot from a single random integer. In fact, we can modify the algorithm to avoid the division by forcing the number of slots to be a power of 2 by adding some 0-weighted options. Because the integer arithmetic will not work out perfectly, a bit of fiddling around will be necessary, but the end result can be made to be statistically correct, modulo the characteristics of the PRNG being used, and it will execute almost as fast as a simple unweighted selection of N options (one shift and a couple of comparisons extra), at the cost of a vector occupying less than 6N storage elements (counting the possibility of having to almost double the number of slots).



Related Topics



Leave a reply



Submit