How Does Ruby's Sort_By {Rand} Work

How does Ruby's sort_by {rand} work?

In Ruby 1.8/1.9 both sort and sort_by are implemented in C, this is a rough equivalent of how this works:

Say you start with [1,2,3,4] and call sort_by{rand}:

  1. (I invented some random numbers):

    An array of tuples is created: [[0.12232, 1],[0.53434, 2],[0.333, 3],[0.99, 4]]

    In roughly equivalent Ruby code this is: [1,2,3,4].map{|x| [rand, x]}

  2. Ruby's quick sort is performed on the array based off the first element: (note the internal implementation is far from trivial and contains a ton of optimisations for already ordered arrays and such)

    [[0.12232, 1],[0.333, 3],[0.53434, 2],[0.99, 4]]

    In rough Ruby this step is: ary.sort{|x,y| x[0] <=> y[0]}

  3. Pointers are copied from the new sorted array, to the correct position in the original array.

    [1,3,2,4]

    In rough Ruby this step is: ary.map{|x,y| y}

This technique is sometimes referred to as a "Schwartzian Transform". Caching means that the expensive operation is not performed more than N times. Meaning, this is a very efficient way of randomizing an array.

Note: array.shuffle! will be the most efficient built-in way to shuffle an array (in-place) since it uses a modern version of Fisher-Yates:

static VALUE
rb_ary_shuffle_bang(VALUE ary)
{
long i = RARRAY_LEN(ary);

rb_ary_modify(ary);
while (i) {
long j = rb_genrand_real()*i;
VALUE tmp = RARRAY_PTR(ary)[--i];
RARRAY_PTR(ary)[i] = RARRAY_PTR(ary)[j];
RARRAY_PTR(ary)[j] = tmp;
}
return ary;
}

How to save random value of seed random sort_by {rand} into list in ruby?

You need to save the random numbers into an array first:

rows = ["hello", "nihao", "konnichiwa", "hallo"]
myrand = Random.new(100) # seed - 100
rows.map { |x| [myrand.rand, x] }.sort!
=> [[0.27836938509379616, "nihao"], [0.4245175907491331, "konnichiwa"], [0.5434049417909654, "hello"], [0.8447761323199037, "hallo"]]

The above shown result should always be the same, although this is Random, since the seed is known - for Random.new(100) these will always be the first four numbers generated.

You can, technically do exactly what you ask by doing:

rows = ["hello", "nihao", "konnichiwa", "hallo"]
myrand = Random.new(100) # seed - 100
rows = rows.map { |x| [myrand.rand, x] }
=> [[0.5434049417909654, "hello"], [0.27836938509379616, "nihao"], [0.4245175907491331, "konnichiwa"], [0.8447761323199037, "hallo"]]

myrand = Random.new(100) # resetting back to seed - 100

rows.sort_by!{ myrand.rand }
=> [[0.27836938509379616, "nihao"], [0.4245175907491331, "konnichiwa"], [0.5434049417909654, "hello"], [0.8447761323199037, "hallo"]]

How to randomly sort (scramble) an array in Ruby?

Built in now:

[1,2,3,4].shuffle => [2, 1, 3, 4]
[1,2,3,4].shuffle => [1, 3, 2, 4]

How does `sort_by` work with multiple fields?

The problem here is nothing particularly about multiple comparison. If you compare arrays, each of the elements will be compared to the corresponding one in another array. If those values sometimes take numerical values, then they always have to be numerals. Numerals cannot be compared to nil, and doing so will raise an error. Defaulting them to zero is to ensure they are numerals. As long as you default them to a numeral, it will not raise an error. The particular choice of zero was based on where you want to position the entries with the missing values; you could have chosen infinity, negative infinity, etc for different results.

Ruby - return an array in random order

If you don't have [].shuffle, [].sort_by{rand} works as pointed out by sepp2k. .sort_by temporarily replaces each element by something for the purpose of sorting, in this case, a random number.

[].sort{rand-0.5} however, won't properly shuffle. Some languages (e.g. some Javascript implementations) don't properly shuffle arrays if you do a random sort on the array, with sometimes rather public consequences.

JS Analysis (with graphs!): http://www.robweir.com/blog/2010/02/microsoft-random-browser-ballot.html

Ruby is no different! It has the same problem. :)

#sort a bunch of small arrays by rand-0.5
a=[]
100000.times{a << [0,1,2,3,4].sort{rand-0.5}}

#count how many times each number occurs in each position
b=[]
a.each do |x|
x.each_index do |i|
b[i] ||=[]
b[i][x[i]] ||= 0
b[i][x[i]] += 1
end
end
p b

=>

[[22336, 18872, 14814, 21645, 22333],
[17827, 25005, 20418, 18932, 17818],
[19665, 15726, 29575, 15522, 19512],
[18075, 18785, 20283, 24931, 17926],
[22097, 21612, 14910, 18970, 22411]]

Each element should occur in each position about 20000 times. [].sort_by(rand) gives much better results.

#sort with elements first mapped to random numbers
a=[]
100000.times{a << [0,1,2,3,4].sort_by{rand}}

#count how many times each number occurs in each position
...

=>

[[19913, 20074, 20148, 19974, 19891],
[19975, 19918, 20024, 20030, 20053],
[20028, 20061, 19914, 20088, 19909],
[20099, 19882, 19871, 19965, 20183],
[19985, 20065, 20043, 19943, 19964]]

Similarly for [].shuffle (which is probably fastest)

[[20011, 19881, 20222, 19961, 19925],
[19966, 20199, 20015, 19880, 19940],
[20062, 19894, 20065, 19965, 20014],
[19970, 20064, 19851, 20043, 20072],
[19991, 19962, 19847, 20151, 20049]]

Podium Style Sorting in Ruby

arr.sort_by{|a| a['created_at']}.inject([]){ |r, e| r.reverse << e }

Fun problem!

How to sort an array in descending order in Ruby

It's always enlightening to do a benchmark on the various suggested answers. Here's what I found out:


#!/usr/bin/ruby

require 'benchmark'

ary = []
1000.times {
ary << {:bar => rand(1000)}
}

n = 500
Benchmark.bm(20) do |x|
x.report("sort") { n.times { ary.sort{ |a,b| b[:bar] <=> a[:bar] } } }
x.report("sort reverse") { n.times { ary.sort{ |a,b| a[:bar] <=> b[:bar] }.reverse } }
x.report("sort_by -a[:bar]") { n.times { ary.sort_by{ |a| -a[:bar] } } }
x.report("sort_by a[:bar]*-1") { n.times { ary.sort_by{ |a| a[:bar]*-1 } } }
x.report("sort_by.reverse!") { n.times { ary.sort_by{ |a| a[:bar] }.reverse } }
end

user system total real
sort 3.960000 0.010000 3.970000 ( 3.990886)
sort reverse 4.040000 0.000000 4.040000 ( 4.038849)
sort_by -a[:bar] 0.690000 0.000000 0.690000 ( 0.692080)
sort_by a[:bar]*-1 0.700000 0.000000 0.700000 ( 0.699735)
sort_by.reverse! 0.650000 0.000000 0.650000 ( 0.654447)

I think it's interesting that @Pablo's sort_by{...}.reverse! is fastest. Before running the test I thought it would be slower than "-a[:bar]" but negating the value turns out to take longer than it does to reverse the entire array in one pass. It's not much of a difference, but every little speed-up helps.


Please note that these results are different in Ruby 1.9

Here are results for Ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin10.8.0]:

                           user     system      total        real
sort 1.340000 0.010000 1.350000 ( 1.346331)
sort reverse 1.300000 0.000000 1.300000 ( 1.310446)
sort_by -a[:bar] 0.430000 0.000000 0.430000 ( 0.429606)
sort_by a[:bar]*-1 0.420000 0.000000 0.420000 ( 0.414383)
sort_by.reverse! 0.400000 0.000000 0.400000 ( 0.401275)

These are on an old MacBook Pro. Newer, or faster machines, will have lower values, but the relative differences will remain.


Here's a bit updated version on newer hardware and the 2.1.1 version of Ruby:

#!/usr/bin/ruby

require 'benchmark'

puts "Running Ruby #{RUBY_VERSION}"

ary = []
1000.times {
ary << {:bar => rand(1000)}
}

n = 500

puts "n=#{n}"
Benchmark.bm(20) do |x|
x.report("sort") { n.times { ary.dup.sort{ |a,b| b[:bar] <=> a[:bar] } } }
x.report("sort reverse") { n.times { ary.dup.sort{ |a,b| a[:bar] <=> b[:bar] }.reverse } }
x.report("sort_by -a[:bar]") { n.times { ary.dup.sort_by{ |a| -a[:bar] } } }
x.report("sort_by a[:bar]*-1") { n.times { ary.dup.sort_by{ |a| a[:bar]*-1 } } }
x.report("sort_by.reverse") { n.times { ary.dup.sort_by{ |a| a[:bar] }.reverse } }
x.report("sort_by.reverse!") { n.times { ary.dup.sort_by{ |a| a[:bar] }.reverse! } }
end

# >> Running Ruby 2.1.1
# >> n=500
# >> user system total real
# >> sort 0.670000 0.000000 0.670000 ( 0.667754)
# >> sort reverse 0.650000 0.000000 0.650000 ( 0.655582)
# >> sort_by -a[:bar] 0.260000 0.010000 0.270000 ( 0.255919)
# >> sort_by a[:bar]*-1 0.250000 0.000000 0.250000 ( 0.258924)
# >> sort_by.reverse 0.250000 0.000000 0.250000 ( 0.245179)
# >> sort_by.reverse! 0.240000 0.000000 0.240000 ( 0.242340)

New results running the above code using Ruby 2.2.1 on a more recent Macbook Pro. Again, the exact numbers aren't important, it's their relationships:

Running Ruby 2.2.1
n=500
user system total real
sort 0.650000 0.000000 0.650000 ( 0.653191)
sort reverse 0.650000 0.000000 0.650000 ( 0.648761)
sort_by -a[:bar] 0.240000 0.010000 0.250000 ( 0.245193)
sort_by a[:bar]*-1 0.240000 0.000000 0.240000 ( 0.240541)
sort_by.reverse 0.230000 0.000000 0.230000 ( 0.228571)
sort_by.reverse! 0.230000 0.000000 0.230000 ( 0.230040)

Updated for Ruby 2.7.1 on a Mid-2015 MacBook Pro:

Running Ruby 2.7.1
n=500
user system total real
sort 0.494707 0.003662 0.498369 ( 0.501064)
sort reverse 0.480181 0.005186 0.485367 ( 0.487972)
sort_by -a[:bar] 0.121521 0.003781 0.125302 ( 0.126557)
sort_by a[:bar]*-1 0.115097 0.003931 0.119028 ( 0.122991)
sort_by.reverse 0.110459 0.003414 0.113873 ( 0.114443)
sort_by.reverse! 0.108997 0.001631 0.110628 ( 0.111532)

...the reverse method doesn't actually return a reversed array - it returns an enumerator that just starts at the end and works backwards.

The source for Array#reverse is:

               static VALUE
rb_ary_reverse_m(VALUE ary)
{
long len = RARRAY_LEN(ary);
VALUE dup = rb_ary_new2(len);

if (len > 0) {
const VALUE *p1 = RARRAY_CONST_PTR_TRANSIENT(ary);
VALUE *p2 = (VALUE *)RARRAY_CONST_PTR_TRANSIENT(dup) + len - 1;
do *p2-- = *p1++; while (--len > 0);
}
ARY_SET_LEN(dup, RARRAY_LEN(ary));
return dup;
}

do *p2-- = *p1++; while (--len > 0); is copying the pointers to the elements in reverse order if I remember my C correctly, so the array is reversed.

incorrect use of ruby's sort_by function

Use

ALL_VIEWS.replace ALL_VIEWS.sort_by {|view| view.permalink.length }

for the first line. sort_by does not sort in place - it returns the sorted array and does not modify the original one.



Related Topics



Leave a reply



Submit