Generate 3 random number that sum to 1 in R
just random 2 digits from (0, 1) and if assume its a
and b
then you got:
rand1 = min(a, b)
rand2 = abs(a - b)
rand3 = 1 - max(a, b)
Generate N random integers that sum to M in R
Normalize.
rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
vec <- rnorm(N, M/N, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * M)
deviation <- M - sum(vec)
for (. in seq_len(abs(deviation))) {
vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)) {
negs <- vec < 0
pos <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
vec
}
For a continuous version, simply use:
rand_vect_cont <- function(N, M, sd = 1) {
vec <- rnorm(N, M/N, sd)
vec / sum(vec) * M
}
Examples
rand_vect(3, 50)
# [1] 17 16 17
rand_vect(10, 10, pos.only = FALSE)
# [1] 0 2 3 2 0 0 -1 2 1 1
rand_vect(10, 5, pos.only = TRUE)
# [1] 0 0 0 0 2 0 0 1 2 0
rand_vect_cont(3, 10)
# [1] 2.832636 3.722558 3.444806
rand_vect(10, -1, pos.only = FALSE)
# [1] -1 -1 1 -2 2 1 1 0 -1 -1
How to generate three random numbers, whose sum is 1?
Just get 3 random numbers and then calculate a factor which is 1 / [sum of your numbers]. Finally multiply each of the random numbers with that factor. The sum will be 1.
Generating randomly the probabilities that sum to 1
You could do
diff(c(0, sort(sample(seq(0.05, 0.95, 0.05), 2)), 1))
#> [1] 0.05 0.75 0.2
This works by choosing 2 random double-digit numbers between 0.05 and 0.95 and sorting them. The first number is the first sample, the second is the distance between the two numbers, and the third is the distance between the second number and 1, so they necessarily add to 1.
Note that if all the numbers have to add up to 1, and none can be lower than 0.05, this means none of the numbers can be 0.95.
For example, if we want ten such samples, we can do:
t(replicate(10, diff(c(0, sort(sample(seq(0.05, 0.95, 0.05), 2)), 1))))
#> [,1] [,2] [,3]
#> [1,] 0.25 0.15 0.60
#> [2,] 0.25 0.25 0.50
#> [3,] 0.30 0.05 0.65
#> [4,] 0.25 0.50 0.25
#> [5,] 0.50 0.05 0.45
#> [6,] 0.45 0.20 0.35
#> [7,] 0.10 0.85 0.05
#> [8,] 0.45 0.50 0.05
#> [9,] 0.15 0.40 0.45
#> [10,] 0.40 0.30 0.30
Created on 2022-06-17 by the reprex package (v2.0.1)
Generating a list of random numbers, summing to 1
The simplest solution is indeed to take N random values and divide by the sum.
A more generic solution is to use the Dirichlet distribution
which is available in numpy.
By changing the parameters of the distribution you can change the "randomness" of individual numbers
>>> import numpy as np, numpy.random
>>> print np.random.dirichlet(np.ones(10),size=1)
[[ 0.01779975 0.14165316 0.01029262 0.168136 0.03061161 0.09046587
0.19987289 0.13398581 0.03119906 0.17598322]]
>>> print np.random.dirichlet(np.ones(10)/1000.,size=1)
[[ 2.63435230e-115 4.31961290e-209 1.41369771e-212 1.42417285e-188
0.00000000e+000 5.79841280e-143 0.00000000e+000 9.85329725e-005
9.99901467e-001 8.37460207e-246]]
>>> print np.random.dirichlet(np.ones(10)*1000.,size=1)
[[ 0.09967689 0.10151585 0.10077575 0.09875282 0.09935606 0.10093678
0.09517132 0.09891358 0.10206595 0.10283501]]
Depending on the main parameter the Dirichlet distribution will either give vectors where all the values are close to 1./N where N is the length of the vector, or give vectors where most of the values of the vectors will be ~0 , and there will be a single 1, or give something in between those possibilities.
EDIT (5 years after the original answer): Another useful fact about the Dirichlet distribution is that you naturally get it, if you generate a Gamma-distributed set of random variables and then divide them by their sum.
Is there a way to generate data in R where the sum of the observations add up to a specific value?
You could make a vector of 20,000,000 samples of the numbers 1 through 15 then make a table from them, but this seems rather computationally expensive, and will result in an unrealistically even split of votes. Instead, you could normalise the cumulative sum of 15 numbers drawn from a uniform distribution and multiply by 20 million. This will give a more realistic spread of votes, with some parties having significantly more votes than others.
my_sample <- cumsum(runif(15))
my_sample <- c(0, my_sample/max(my_sample))
votes <- round(diff(my_sample) * 20000000)
votes
#> [1] 725623 2052337 1753844 61946 1173750 1984897
#> [7] 554969 1280220 1381259 1311762 766969 2055094
#> [13] 1779572 2293662 824096
These will add up to 20,000,000:
sum(votes)
#> [1] 2e+07
And we can see quite a "natural looking" spread of votes.
barplot(setNames(votes, letters[1:15]), xlab = "party")
I'm guessing if you substitute rexp
for runif
in the above solution this would more closely match actual voting numbers in real life, with a small number of high-vote parties and a large number of low-vote parties.
Related Topics
Assign Names to Vector Entries Without Assigning the Vector a Variable Name
R Looping Through in Survey Package
R: Pass a List of Filtering Conditions into a Dataframe
Dplyr: Mutate_At + Coalesce: Dynamic Names of Columns
How to Convert a Factor Column That Contains Decimal Numbers to Numeric
Calculating Prediction Accuracy of a Tree Using Rpart's Predict Method
How to Averaging Over a Time Period by Hours
How to Create a Pie Chart with Percentage Labels Using Ggplot2
Space Between Gpplot2 Horizontal Legend Elements
Syntax Highlighting for Python Chunks Does Not Work
R Reshape2 'Aggregation Function Missing: Defaulting to Length'
Condition Filter in Dplyr Based on Shiny Input
How to Install 2 Different R Versions on Debian
How to Use "Cast" in Reshape Without Aggregation
Saving a File to Sharepoint with R
R - Download Filtered Datatable
R - Lattice Xyplot - How to Add Error Bars to Groups and Summary Lines