Generate Array of Numbers that fit to a Probability Distribution in Ruby?
You can generate UNIX timestamps which are really just integers. First figure out when you want to start, for example now:
start = DateTime::now().to_time.to_i
Find out when the end of your interval should be (say 1 week later):
finish = (DateTime::now()+1.week).to_time.to_i
Ruby uses this algorithm to generate random numbers. It is almost uniform. Then generate random numbers between the two:
r = Random.new.rand(start..finish)
Then convert that back to a date:
d = Time.at(r)
This looks promising as well:
http://rb-gsl.rubyforge.org/files/rdoc/randist_rdoc.html
And this too:
http://rb-gsl.rubyforge.org/files/rdoc/rng_rdoc.html
Ruby - Picking an element in an array with 50% chance for a[0], 25% chance for a[1]
First guess...pick a random number between 1 and 2**size
, find the log base 2 of that, and pick the number that many elements from the end.
Forgive my horrible ruby skillz.
return a[-((Math.log(rand(2**size-1)+1) / Math.log(2)).floor) - 1]
if rand
returns 0, the last element should be chosen. 1 or 2, the next to last. 3, 4, 5, or 6, third from the end. Etc. Assuming an even distribution of random numbers, each element has twice as much chance of being picked as the one after it.
Edit: Actually, it seems there's a log2
function, so we don't have to do the log/log(2) thing.
return a[-(Math.log2(rand(2**size - 1)+1).floor) - 1]
You may be able to get rid of those log calls altogether like
return a[-((rand(2**size-1)+1).to_s(2).length)]
But you're creating an extra String
. Not sure whether that's better than complicated math. :)
Edit: Actually, if you're going to go the string route, you can get rid of the +1 and -1 altogether. It'd make the probabilities more accurate, as the last two elements should have an equal chance of being chosen. (If the next-to-last value isn't chosen, the last value would always be.)
Edit: We could also turn the **
into a bit shift, which should be a little faster (unless Ruby was smart enough to do that already).
return a[-(rand(1<<size).to_s(2).length)]
Generate Random Numbers with Probabilistic Distribution
Look at distributions used in reliability analysis - they tend to have these long tails. A relatively simply possibility is the Weibull distribution with P(X>x)=exp[-(x/b)^a].
Fitting your values as P(X>1)=0.1 and P(X>10)=0.005, I get a=0.36 and b=0.1. This would imply that P(X>40)*10000=1.6, which is a bit too low, but P(X>70)*10000=0.2 which is reasonable.
EDIT
Oh, and to generate a Weibull-distributed random variable from a uniform(0,1) value U, just calculate b*[-log(1-u)]^(1/a). This is the inverse function of 1-P(X>x) in case I miscalculated something.
How do I select a random key from a hash using the probability distribution stored within the corresponding values?
If I can assume that the hash values do indeed add up to exactly 1.0
, then the solution is little simpler. (Otherwise, this approach would still work, but requires a little extra effort to first sum all the values - and use them as a weighting, but not a direct probability.)
First, let's choose a random value between 0
and 1
, to represent a "fair selection". You may wish to use SecureRandom.random_number
in your implementation.
Then, I loop through the possibilities, seeing when the cumulative sum reaches the chosen value.
possible_features = {
white_pin_fire_green: "0.00138",
white_pin_fire_blue: "0.00138",
# ...
}
r = rand
possible_features.find { |choice, probability| (r -= probability.to_f) <= 0 }.first
This effectively treats each possibility as covering a range: 0 .. 0.00138
, 0.00138 .. 0.00276
, 0.00276 .. 0.00420
, ..., 0.76 .. 1
.
Since the original random value (r
) is was chosen from an even distribution, its value will lie within one of those ranges with the desired weighted probability.
Ruby - Pick one element from array by possibility
Your code is fine but here are two other approaches.
Use a cumulative distribution function ("CDF")
CDF = [[0.05,0], [0.05+0.60,1], [0.5+0.60+0.35,2]]
#=> [[0.05,0], [0.65,1], [1.0,2]]
def get_num(arr)
n = rand
arr[CDF.find { |mx,_idx| n <= mx }.last]
end
arr = [{:num=>1, :diff=>-29}, {:num=>2, :diff=>5}, {:num=>3, :diff=>25}]
get_num(arr)
#=> {:num=>2, :diff=>5}
get_num(arr)
#=> {:num=>2, :diff=>5}
get_num(arr)
#=> {:num=>3, :diff=>25}
get_num(arr)
#=> {:num=>1, :diff=>-29}
get_num(arr)
#=> {:num=>2, :diff=>5}
Suppose:
n = rand
#=> 0.5385005480168696
then
a = CDF.find { |mx,_idx| n <= mx }
#=> [0.65,1]
i = a.last
#=> 1
arr[i]
#=> {:num=>2, :diff=>5}
Note that I've followed the convention of beginning the name of find
's second block variable (_idx
) with an underscore to signal to the reader that that block variable is not used in the block calculation. Often just an underscore
(_
) is used.
Now consider the fraction of times each element of arr
will be randomly-drawn if n
draws are made:
def outcome_fractions(arr, n)
n.times
.with_object(Hash.new(0)) { |_,h| h[get_num(arr)] += 1 }
.transform_values { |v| v.fdiv(n) }
end
Randomly select from an array of indices
outcome_fractions(arr, 1_000)
#=> {{:num=>2, :diff=>5} =>0.612,
# {:num=>3, :diff=>25} =>0.328,
# {:num=>1, :diff=>-29}=>0.06}
outcome_fractions(arr, 100_000)
#=> {{:num=>3, :diff=>25} =>0.34818,
# {:num=>1, :diff=>-29}=>0.04958,
# {:num=>2, :diff=>5} =>0.60224}
Notice that the fraction of each hash that is randomly drawn approaches its specified population probability as the sample size is increased (though the "pseudo-random" draws are not truly random).
Do not be concerned with how outcome_fractions
works.
Here is another way that is more efficient (because it does not use find
, which performs a linear search) but uses more memory.
CHOICE = [*[0]*5, *[1]*60, *[2]*35]
#=> [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
# 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
# 2, 2, 2, 2, 2]
def get_num(arr)
arr[CHOICE[rand(100)]]
end
#=> {{:num=>2, :diff=>5} =>0.60029,
# {:num=>3, :diff=>25}=>0.35022,
# {:num=>1, :diff=>-29}=>0.04949}
Note that:
[*[0]*5, *[1]*60, *[2]*35]
produces the same array as
[[0]*5, [1]*60, [2]*35].flatten
The first *
in *[0]*5
is the splat operator; the second is the method Array#*. [0]*5 #=> [0,0,0,0,0]
is evaluated first.
CHOICE
has 100 elements. If the three probabilities were, say, 0.048
, 0.604
and 0.348
, CHOICE
would have 10**3 #=> 1_000
elements (48
zeros, 604
ones and 348
twos).
Biasing random number generator to some integer n with deviation b
i think the simplest route is to sample from a normal (aka gaussian) distribution with the properties you want, and then transform the result:
- generate a normal value with given mean and sd
- round to nearest integer
- if outside given range (normal can generate values over the entire range from -infinity to -infinity), discard and repeat
if you need to generate a normal from a uniform the simplest transform is "box-muller".
there are some details you may need to worry about. in particular, box muller is limited in range (it doesn't generate extremely unlikely values, ever). so if you give a very narrow range then you will never get the full range of values. other transforms are not as limited - i'd suggest using whatever ruby provides (look for "normal" or "gaussian").
also, be careful to round the value. 2.6 to 3.4 should all become 3, for example. if you simply discard the decimal (so 3.0 to 3.999 become 3) you will be biased.
if you're really concerned with efficiency, and don't want to discard values, you can simply invent something. one way to cheat is to mix a uniform variate with the bias value (so 9/10 times generate the uniform, 1/10 times return 3, say). in some cases, where you only care about average of the sample, that can be sufficient.
Fastest method to see if all elements in an array have a particular value
require 'benchmark'
n = 50000
Benchmark.bm do |x|
x.report "uniq " do
n.times do
input = [9,9,9,9,9,9,9,9,9,9,9,9]
input.uniq == [9]
end
end
x.report "delete" do
n.times do
input = [9,9,9,9,9,9,9,9,9,9,9,9]
input.delete 9
input == []
end
end
x.report "count " do
n.times do
input = [9,9,9,9,9,9,9,9,9,9,9,9]
input.count(9)==input.size
end
end
x.report "select" do
n.times do
input = [9,9,9,9,9,9,9,9,9,9,9,9]
input.select{|x| x != 9}.empty?
end
end
x.report "detect" do
n.times do
input = [9,9,9,9,9,9,9,9,9,9,9,9]
input.detect { |i| i != 9 }.nil?
end
end
x.report "all? " do
n.times do
input = [9,9,9,9,9,9,9,9,9,9,9,9]
input.all?{|x| x == 9}
end
end
end
it a benchmark for the answers above and some mine
user system total real
uniq 0.313000 0.000000 0.313000 ( 0.312500)
delete 0.140000 0.000000 0.140000 ( 0.140625)
count 0.079000 0.000000 0.079000 ( 0.078125)
select 0.234000 0.000000 0.234000 ( 0.234375)
detect 0.234000 0.000000 0.234000 ( 0.234375)
all? 0.219000 0.000000 0.219000 ( 0.218750)
if input = [1]+[9]*9
:
user system total real
uniq 0.328000 0.000000 0.328000 ( 0.328125)
delete 0.188000 0.000000 0.188000 ( 0.203125)
count 0.187000 0.000000 0.187000 ( 0.218750)
select 0.281000 0.016000 0.297000 ( 0.296875)
detect 0.203000 0.000000 0.203000 ( 0.203125)
all? 0.204000 0.000000 0.204000 ( 0.203125)
if input = [9]*9 + [1]
:
user system total real
uniq 0.313000 0.000000 0.313000 ( 0.328125)
delete 0.187000 0.000000 0.187000 ( 0.187500)
count 0.172000 0.000000 0.172000 ( 0.187500)
select 0.297000 0.000000 0.297000 ( 0.312500)
detect 0.313000 0.000000 0.313000 ( 0.312500)
all? 0.281000 0.000000 0.281000 ( 0.281250)
if input = [1,2,3,4,5,6,7,8,9]
:
user system total real
uniq 0.407000 0.000000 0.407000 ( 0.406250)
delete 0.125000 0.000000 0.125000 ( 0.125000)
count 0.125000 0.000000 0.125000 ( 0.125000)
select 0.218000 0.000000 0.218000 ( 0.234375)
detect 0.110000 0.000000 0.110000 ( 0.109375)
all? 0.109000 0.000000 0.109000 ( 0.109375)
Related Topics
How to Modify a Text File in Ruby
Parsing Date from Text Using Ruby
Best Way to Handle Category/Subcategory Relationship Ruby on Rails
Why Will a Range Not Work When Descending
Ruby: Uri::Invalidurierror (Uri Must Be Ascii Only
Why Is Uri.Escape() Marked as Obsolete and Where Is This Regexp::Unsafe Constant
How to Update to Ruby 2.1.2 Using Rails 3.2.3
Ruby on Rails Add a Column After a Specific Column Name
Rails: How to Print a Decimal as a Percent
Fastercsv Error with Ruby 1.9.2
How to Know the Current Rake Task
Ruby Can Not Access Variable Outside the Method
How to Force a Gem's Dependencies in Gemfile
Keyword for Exclusive or in Ruby
Ruby Keyword Arguments of Method
How to Evaluate a Date Difference in Years, Months and Days (Ruby)
Is There an Equivalent of Array#Find_Index for the Last Index in Ruby