Generate 3 Random Number That Sum to 1 in R

Generate 3 random number that sum to 1 in R

just random 2 digits from (0, 1) and if assume its a and b then you got:

rand1 = min(a, b)
rand2 = abs(a - b)
rand3 = 1 - max(a, b)

Generate N random integers that sum to M in R

Normalize.

rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
  vec <- rnorm(N, M/N, sd)
  if (abs(sum(vec)) < 0.01) vec <- vec + 1
  vec <- round(vec / sum(vec) * M)
  deviation <- M - sum(vec)
  for (. in seq_len(abs(deviation))) {
    vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
  }
  if (pos.only) while (any(vec < 0)) {
    negs <- vec < 0
    pos  <- vec > 0
    vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
    vec[pos][i]  <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
  }
  vec
}

For a continuous version, simply use:

rand_vect_cont <- function(N, M, sd = 1) {
  vec <- rnorm(N, M/N, sd)
  vec / sum(vec) * M
}

Examples

rand_vect(3, 50)
# [1] 17 16 17

rand_vect(10, 10, pos.only = FALSE)
# [1]  0  2  3  2  0  0 -1  2  1  1

rand_vect(10, 5, pos.only = TRUE)
# [1] 0 0 0 0 2 0 0 1 2 0

rand_vect_cont(3, 10)
# [1] 2.832636 3.722558 3.444806

rand_vect(10, -1, pos.only = FALSE)
# [1] -1 -1  1 -2  2  1  1  0 -1 -1

How to generate three random numbers, whose sum is 1?

Just get 3 random numbers and then calculate a factor which is 1 / [sum of your numbers]. Finally multiply each of the random numbers with that factor. The sum will be 1.

Generating randomly the probabilities that sum to 1

You could do

diff(c(0, sort(sample(seq(0.05, 0.95, 0.05), 2)), 1))
#> [1] 0.05 0.75 0.2

This works by choosing 2 random double-digit numbers between 0.05 and 0.95 and sorting them. The first number is the first sample, the second is the distance between the two numbers, and the third is the distance between the second number and 1, so they necessarily add to 1.

Note that if all the numbers have to add up to 1, and none can be lower than 0.05, this means none of the numbers can be 0.95.

For example, if we want ten such samples, we can do:

t(replicate(10, diff(c(0, sort(sample(seq(0.05, 0.95, 0.05), 2)), 1))))
#>       [,1] [,2] [,3]
#>  [1,] 0.25 0.15 0.60
#>  [2,] 0.25 0.25 0.50
#>  [3,] 0.30 0.05 0.65
#>  [4,] 0.25 0.50 0.25
#>  [5,] 0.50 0.05 0.45
#>  [6,] 0.45 0.20 0.35
#>  [7,] 0.10 0.85 0.05
#>  [8,] 0.45 0.50 0.05
#>  [9,] 0.15 0.40 0.45
#> [10,] 0.40 0.30 0.30

^{Created on 2022-06-17 by the reprex package (v2.0.1)}

Generating a list of random numbers, summing to 1

The simplest solution is indeed to take N random values and divide by the sum.

A more generic solution is to use the Dirichlet distribution
which is available in numpy.

By changing the parameters of the distribution you can change the "randomness" of individual numbers

>>> import numpy as np, numpy.random
>>> print np.random.dirichlet(np.ones(10),size=1)
[[ 0.01779975  0.14165316  0.01029262  0.168136    0.03061161  0.09046587
   0.19987289  0.13398581  0.03119906  0.17598322]]

>>> print np.random.dirichlet(np.ones(10)/1000.,size=1)
[[  2.63435230e-115   4.31961290e-209   1.41369771e-212   1.42417285e-188
    0.00000000e+000   5.79841280e-143   0.00000000e+000   9.85329725e-005
    9.99901467e-001   8.37460207e-246]]

>>> print np.random.dirichlet(np.ones(10)*1000.,size=1)
[[ 0.09967689  0.10151585  0.10077575  0.09875282  0.09935606  0.10093678
   0.09517132  0.09891358  0.10206595  0.10283501]]

Depending on the main parameter the Dirichlet distribution will either give vectors where all the values are close to 1./N where N is the length of the vector, or give vectors where most of the values of the vectors will be ~0 , and there will be a single 1, or give something in between those possibilities.

EDIT (5 years after the original answer): Another useful fact about the Dirichlet distribution is that you naturally get it, if you generate a Gamma-distributed set of random variables and then divide them by their sum.

Is there a way to generate data in R where the sum of the observations add up to a specific value?

You could make a vector of 20,000,000 samples of the numbers 1 through 15 then make a table from them, but this seems rather computationally expensive, and will result in an unrealistically even split of votes. Instead, you could normalise the cumulative sum of 15 numbers drawn from a uniform distribution and multiply by 20 million. This will give a more realistic spread of votes, with some parties having significantly more votes than others.

my_sample <- cumsum(runif(15))
my_sample <- c(0, my_sample/max(my_sample))
votes <- round(diff(my_sample) * 20000000)
votes
#>  [1]  725623 2052337 1753844   61946 1173750 1984897
#>  [7]  554969 1280220 1381259 1311762  766969 2055094
#> [13] 1779572 2293662  824096

These will add up to 20,000,000:

sum(votes)
#> [1] 2e+07

And we can see quite a "natural looking" spread of votes.

barplot(setNames(votes, letters[1:15]), xlab = "party")

Sample Image

I'm guessing if you substitute rexp for runif in the above solution this would more closely match actual voting numbers in real life, with a small number of high-vote parties and a large number of low-vote parties.

Generate 3 Random Number That Sum to 1 in R