Generate Array of Numbers That Fit to a Probability Distribution in Ruby

Generate Array of Numbers that fit to a Probability Distribution in Ruby?

You can generate UNIX timestamps which are really just integers. First figure out when you want to start, for example now:

start = DateTime::now().to_time.to_i

Find out when the end of your interval should be (say 1 week later):

finish = (DateTime::now()+1.week).to_time.to_i

Ruby uses this algorithm to generate random numbers. It is almost uniform. Then generate random numbers between the two:

r = Random.new.rand(start..finish)

Then convert that back to a date:

d = Time.at(r)

This looks promising as well:
http://rb-gsl.rubyforge.org/files/rdoc/randist_rdoc.html

And this too:
http://rb-gsl.rubyforge.org/files/rdoc/rng_rdoc.html

Ruby - Picking an element in an array with 50% chance for a[0], 25% chance for a[1]

First guess...pick a random number between 1 and 2**size, find the log base 2 of that, and pick the number that many elements from the end.

Forgive my horrible ruby skillz.

return a[-((Math.log(rand(2**size-1)+1) / Math.log(2)).floor) - 1]

if rand returns 0, the last element should be chosen. 1 or 2, the next to last. 3, 4, 5, or 6, third from the end. Etc. Assuming an even distribution of random numbers, each element has twice as much chance of being picked as the one after it.

Edit: Actually, it seems there's a log2 function, so we don't have to do the log/log(2) thing.

return a[-(Math.log2(rand(2**size - 1)+1).floor) - 1]

You may be able to get rid of those log calls altogether like

return a[-((rand(2**size-1)+1).to_s(2).length)]

But you're creating an extra String. Not sure whether that's better than complicated math. :)

Edit: Actually, if you're going to go the string route, you can get rid of the +1 and -1 altogether. It'd make the probabilities more accurate, as the last two elements should have an equal chance of being chosen. (If the next-to-last value isn't chosen, the last value would always be.)

Edit: We could also turn the ** into a bit shift, which should be a little faster (unless Ruby was smart enough to do that already).

return a[-(rand(1<<size).to_s(2).length)]

Generate Random Numbers with Probabilistic Distribution

Look at distributions used in reliability analysis - they tend to have these long tails. A relatively simply possibility is the Weibull distribution with P(X>x)=exp[-(x/b)^a].

Fitting your values as P(X>1)=0.1 and P(X>10)=0.005, I get a=0.36 and b=0.1. This would imply that P(X>40)*10000=1.6, which is a bit too low, but P(X>70)*10000=0.2 which is reasonable.

EDIT
Oh, and to generate a Weibull-distributed random variable from a uniform(0,1) value U, just calculate b*[-log(1-u)]^(1/a). This is the inverse function of 1-P(X>x) in case I miscalculated something.

How do I select a random key from a hash using the probability distribution stored within the corresponding values?

If I can assume that the hash values do indeed add up to exactly 1.0, then the solution is little simpler. (Otherwise, this approach would still work, but requires a little extra effort to first sum all the values - and use them as a weighting, but not a direct probability.)

First, let's choose a random value between 0 and 1, to represent a "fair selection". You may wish to use SecureRandom.random_number in your implementation.

Then, I loop through the possibilities, seeing when the cumulative sum reaches the chosen value.

possible_features = {
  white_pin_fire_green: "0.00138",
  white_pin_fire_blue: "0.00138",
  # ...
}

r = rand
possible_features.find { |choice, probability| (r -= probability.to_f) <= 0 }.first

This effectively treats each possibility as covering a range: 0 .. 0.00138, 0.00138 .. 0.00276, 0.00276 .. 0.00420, ..., 0.76 .. 1.

Since the original random value (r) is was chosen from an even distribution, its value will lie within one of those ranges with the desired weighted probability.

Ruby - Pick one element from array by possibility

Your code is fine but here are two other approaches.

Use a cumulative distribution function ("CDF")

CDF = [[0.05,0], [0.05+0.60,1], [0.5+0.60+0.35,2]]
  #=> [[0.05,0], [0.65,1], [1.0,2]]

def get_num(arr)
  n = rand
  arr[CDF.find { |mx,_idx| n <= mx }.last]
end

arr = [{:num=>1, :diff=>-29}, {:num=>2, :diff=>5}, {:num=>3, :diff=>25}]

get_num(arr)
  #=> {:num=>2, :diff=>5}
get_num(arr)
  #=> {:num=>2, :diff=>5}
get_num(arr)
  #=> {:num=>3, :diff=>25}
get_num(arr)
  #=> {:num=>1, :diff=>-29}
get_num(arr)
  #=> {:num=>2, :diff=>5}

Suppose:

n = rand
  #=> 0.5385005480168696

then

a = CDF.find { |mx,_idx| n <= mx }
  #=> [0.65,1]
i = a.last
  #=> 1
arr[i]
  #=> {:num=>2, :diff=>5}

Note that I've followed the convention of beginning the name of find's second block variable (_idx) with an underscore to signal to the reader that that block variable is not used in the block calculation. Often just an underscore (_) is used.

Now consider the fraction of times each element of arr will be randomly-drawn if n draws are made:

def outcome_fractions(arr, n)
  n.times
   .with_object(Hash.new(0)) { |_,h| h[get_num(arr)] += 1 }
   .transform_values { |v| v.fdiv(n) }
end

Randomly select from an array of indices

outcome_fractions(arr, 1_000)
  #=> {{:num=>2, :diff=>5}  =>0.612,
  #    {:num=>3, :diff=>25} =>0.328,
  #    {:num=>1, :diff=>-29}=>0.06}

outcome_fractions(arr, 100_000)
  #=> {{:num=>3, :diff=>25} =>0.34818,
  #    {:num=>1, :diff=>-29}=>0.04958,
  #    {:num=>2, :diff=>5}  =>0.60224}

Notice that the fraction of each hash that is randomly drawn approaches its specified population probability as the sample size is increased (though the "pseudo-random" draws are not truly random).

Do not be concerned with how outcome_fractions works.

Here is another way that is more efficient (because it does not use find, which performs a linear search) but uses more memory.

CHOICE = [*[0]*5, *[1]*60, *[2]*35]
  #=> [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
  #    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
  #    2, 2, 2, 2, 2]

def get_num(arr)
  arr[CHOICE[rand(100)]]
end

  #=> {{:num=>2, :diff=>5} =>0.60029,
  #    {:num=>3, :diff=>25}=>0.35022,
  #    {:num=>1, :diff=>-29}=>0.04949}

Note that:

[*[0]*5, *[1]*60, *[2]*35]

produces the same array as

[[0]*5, [1]*60, [2]*35].flatten

The first * in *[0]*5 is the splat operator; the second is the method Array#*. [0]*5 #=> [0,0,0,0,0] is evaluated first.

CHOICE has 100 elements. If the three probabilities were, say, 0.048, 0.604 and 0.348, CHOICE would have 10**3 #=> 1_000 elements (48 zeros, 604 ones and 348 twos).

Biasing random number generator to some integer n with deviation b

i think the simplest route is to sample from a normal (aka gaussian) distribution with the properties you want, and then transform the result:

generate a normal value with given mean and sd
round to nearest integer
if outside given range (normal can generate values over the entire range from -infinity to -infinity), discard and repeat

if you need to generate a normal from a uniform the simplest transform is "box-muller".

there are some details you may need to worry about. in particular, box muller is limited in range (it doesn't generate extremely unlikely values, ever). so if you give a very narrow range then you will never get the full range of values. other transforms are not as limited - i'd suggest using whatever ruby provides (look for "normal" or "gaussian").

also, be careful to round the value. 2.6 to 3.4 should all become 3, for example. if you simply discard the decimal (so 3.0 to 3.999 become 3) you will be biased.

if you're really concerned with efficiency, and don't want to discard values, you can simply invent something. one way to cheat is to mix a uniform variate with the bias value (so 9/10 times generate the uniform, 1/10 times return 3, say). in some cases, where you only care about average of the sample, that can be sufficient.

Fastest method to see if all elements in an array have a particular value

require 'benchmark'

n = 50000
Benchmark.bm do |x|
  x.report "uniq  " do
    n.times do 
      input = [9,9,9,9,9,9,9,9,9,9,9,9]
      input.uniq == [9]
    end
  end
  x.report "delete" do
    n.times do 
      input = [9,9,9,9,9,9,9,9,9,9,9,9]
      input.delete 9
      input == []
    end  
  end
  x.report "count " do
    n.times do
      input = [9,9,9,9,9,9,9,9,9,9,9,9]
      input.count(9)==input.size
    end
  end
  x.report "select" do
    n.times do
      input = [9,9,9,9,9,9,9,9,9,9,9,9]
      input.select{|x| x != 9}.empty?
    end
  end  
  x.report "detect" do
    n.times do
      input = [9,9,9,9,9,9,9,9,9,9,9,9]
      input.detect { |i| i != 9 }.nil?
    end
  end 

  x.report "all?  " do
    n.times do
      input = [9,9,9,9,9,9,9,9,9,9,9,9]
      input.all?{|x| x == 9} 
    end
  end 

end

it a benchmark for the answers above and some mine

        user       system      total        real
uniq    0.313000   0.000000   0.313000 (  0.312500)
delete  0.140000   0.000000   0.140000 (  0.140625)
count   0.079000   0.000000   0.079000 (  0.078125)
select  0.234000   0.000000   0.234000 (  0.234375)
detect  0.234000   0.000000   0.234000 (  0.234375)
all?    0.219000   0.000000   0.219000 (  0.218750)

if input = [1]+[9]*9:

        user     system      total        real
uniq    0.328000   0.000000   0.328000 (  0.328125)
delete  0.188000   0.000000   0.188000 (  0.203125)
count   0.187000   0.000000   0.187000 (  0.218750)
select  0.281000   0.016000   0.297000 (  0.296875)
detect  0.203000   0.000000   0.203000 (  0.203125)
all?    0.204000   0.000000   0.204000 (  0.203125)

if input = [9]*9 + [1]:

        user     system      total        real
uniq    0.313000   0.000000   0.313000 (  0.328125)
delete  0.187000   0.000000   0.187000 (  0.187500)
count   0.172000   0.000000   0.172000 (  0.187500)
select  0.297000   0.000000   0.297000 (  0.312500)
detect  0.313000   0.000000   0.313000 (  0.312500)
all?    0.281000   0.000000   0.281000 (  0.281250)

if input = [1,2,3,4,5,6,7,8,9]:

        user     system      total        real
uniq    0.407000   0.000000   0.407000 (  0.406250)
delete  0.125000   0.000000   0.125000 (  0.125000)
count   0.125000   0.000000   0.125000 (  0.125000)
select  0.218000   0.000000   0.218000 (  0.234375)
detect  0.110000   0.000000   0.110000 (  0.109375)
all?    0.109000   0.000000   0.109000 (  0.109375)

Generate Array of Numbers That Fit to a Probability Distribution in Ruby