Random Number with Probabilities

Random number with Probabilities

Yours is a pretty good way already and works well with any range.

Just thinking: another possibility is to get rid of the fractions by multiplying with a constant multiplier, and then build an array with the size of this multiplier. Multiplying by 10 you get

P(1) = 2
P(2) = 3
P(3) = 5

Then you create an array with the inverse values -- '1' goes into elements 1 and 2, '2' into 3 to 6, and so on:

P = (1,1, 2,2,2, 3,3,3,3,3);

and then you can pick a random element from this array instead.

(Add.) Using the probabilities from the example in kiruwka's comment:

int[] numsToGenerate           = new int[]    { 1,   2,    3,   4,    5   };
double[] discreteProbabilities = new double[] { 0.1, 0.25, 0.3, 0.25, 0.1 };

the smallest multiplier that leads to all-integers is 20, which gives you

2, 5, 6, 5, 2

and so the length of numsToGenerate would be 20, with the following values:

1 1
2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4
5 5

The distribution is exactly the same: the chance of '1', for example, is now 2 out of 20 -- still 0.1.

This is based on your original probabilities all adding up to 1. If they do not, multiply the total by this same factor (which is then going to be your array length as well).

Random number with Probabilities in C#

The method

public void addNumber(int val, double dist)

Is not correctly translated, since you are missing the following lines:

if (this.distribution.get(value) != null) {
    distSum -= this.distribution.get(value);
}

Those lines should cover the case when you call the following (based on your example code):

DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(1, 0.5d);

So calling the method addNumber twice with the same first argument, the missing code part looks if the first argument is already present in the dictionary and if so it will remove the "old" value from the dictionary to insert the new value.

Your method should look like this:

public void addNumber(int val, double dist)
{
    if (distribution.TryGetValue(val, out var oldDist)) //get the old "dist" value, based on the "val"
    {
        distribution.Remove(val); //remove the old entry
        distSum -= oldDist; //substract "distSum" with the old "dist" value
    }

    distribution.Add(val, dist); //add the "val" with the current "dist" value to the dictionary
    distSum += dist; //add the current "dist" value to "distSum"
}

Now to your second method

public int getDistributedRandomNumber()

Instead of calling initializing a new instance of Random every time this method is called you should only initialize it once, so change the line

double rand = new Random().NextDouble();

to this

double rand = _random.NextDouble();

and initialize the field _random outside of a method inside the class declaration like this

public class DistributedRandomNumberGenerator
{
    private Dictionary<Int32, Double> distribution;
    private double distSum;
    private Random _random = new Random();        

    ... rest of your code
}

This will prevent new Random().NextDouble() from producing the same number over and over again if called in a loop.
You can read about this problem here: Random number generator only generating one random number

As I side note, fields in c# are named with a prefix underscore. You should consider renaming distribution to _distribution, same applies for distSum.

double ratio = 1.0f / distSum;//why is ratio needed?

Ratio is need because the method tries its best to do its job with the information you have provided, imagine you only call this:

DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
int random = drng.getDistributedRandomNumber();

With those lines you told the class you want to have the number 1 in 20% of the cases, but what about the other 80%?

And that's where the ratio variable comes in place, it calculates a comparable value based on the sum of probabilities you have given.
eg.

double ratio = 1.0f / distSum;

As with the latest example drng.addNumber(1, 0.2d); distSum will be 0.2, which translates to a probability of 20%.

double ratio = 1.0f / 0.2;

The ratio is 5.0, with a probability of 20% the ratio is 5 because 100% / 5 = 20%.

Now let's have a look at how the code reacts when the ratio is 5

double tempDist = 0;
foreach (Int32 i in distribution.Keys)
{
    tempDist += distribution[i];

    if (rand / ratio <= tempDist)
    {
        return i;
    }
}

rand will be to any given time a value that is greater than or equal to 0.0, and less than 1.0., that's how NextDouble works, so let's assume the following 0.254557522132321 as rand.

Now let's look what happens step by step

double tempDist = 0; //initialize with 0 
foreach (Int32 i in distribution.Keys) //step through the added probabilities
{
    tempDist += distribution[i]; //get the probabilities and add it to a temporary probability sum

    //as a reminder
    //rand = 0.254557522132321
    //ratio = 5
    //rand / ratio = 0,0509115044264642
    //tempDist = 0,2
    // if will result in true
    if (rand / ratio <= tempDist)
    {
        return i;
    }
}

If we didn't apply the ratio the if would be false, but that would be wrong, since we only have a single value inside our dictionary, so no matter what the rand value might be the if statement should return true and that's the natur of rand / ratio.

To "fix" the randomly generated number based on the sum of probabilities we added. The rand / ratio will only be usefull if you didn't provide probabilites that perfectly sum up to 1 = 100%.

eg. if your example would be this

DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(2, 0.3d);
drng.addNumber(3, 0.5d);

You can see that the provided probabilities equal to 1 => 0.2 + 0.3 + 0.5, in this case the line

if (rand / ratio <= tempDist)

Would look like this

if (rand / 1 <= tempDist)

Divding by 1 will never change the value and rand / 1 = rand, so the only use case for this devision are cases where you didn't provided a perfect 100% probability, could be either more or less.

As a side note, I would suggest changing your code to this

//call the dictionary distributions (notice the plural)
//dont use .Keys
//var distribution will be a KeyValuePair
foreach (var distribution in distributions)
{
    //access the .Value member of the KeyValuePair
    tempDist += distribution.Value;

    if (rand / ratio <= tempDist)
    {
        return i;
    }
}

Your test routine seems to be correctly translated.

Generate random numbers with a given (numerical) distribution

scipy.stats.rv_discrete might be what you want. You can supply your probabilities via the values parameter. You can then use the rvs() method of the distribution object to generate random numbers.

As pointed out by Eugene Pakhomov in the comments, you can also pass a p keyword parameter to numpy.random.choice(), e.g.

numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

If you are using Python 3.6 or above, you can use random.choices() from the standard library – see the answer by Mark Dickinson.

Generate random number with different probabilities for each line in R

The problem is that you are passing 4 vectors to the probs parameter of sample (the entire columns p00, p10, p01, and p11), but sample is not vectorised in this way, and only takes a single vector of probabilities.

You need to write a version of sample that is vectorized on probs. Something like this:

vec_sample <- function(A, B, C, D)
{
  do.call("c", lapply(seq_along(A), function(i)
  {
    sample(1:4, 1, replace = TRUE, prob=c(A[i], B[i], C[i], D[i]))
  }))
}

So your code would work like this:

data02 <- data01 %>% mutate(u = vec_sample(p00, p10, p01, p11))

Generate random number with given probability

You could just do a weighted random sample, without worrying about your cumsum method:

sample(c(1, 2, 3), size = 100, replace = TRUE, prob = c(0.5, 0.1, 0.4))

If you already have the numbers, you could also do:

x <- runif(10, 0, 1)
as.numeric(cut(x, breaks = c(0, 0.5, 0.6, 1)))

How to get random numbers with different probabilities in different ranges in java?

You just need to have an extra random number determining the range it should generate:

int getRandomNumberWithRangeProbability() {
    double range = Math.random();

    if (range < 0.5) {
        return randomWithRange(0, 20);
    } else if (range < 0.9) {
        return randomWithRange(21, 80);
    } else {
        return randomWithRange(81, 100);
    }
}

int randomWithRange(int min, int max) {
    int range = (max - min) + 1;
    return (int) (Math.random() * range) + min;
}

A small test can be found here.

^{Credits to AusCBloke for the randomWithRange() method.}

Generate random integers with probabilities

Here's a useful trick :-)

function randomWithProbability() {
  var notRandomNumbers = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4];
  var idx = Math.floor(Math.random() * notRandomNumbers.length);
  return notRandomNumbers[idx];
}