Which Algorithm Does Ruby's Sort Method Use

Which algorithm does Ruby's sort method use?

Look here: http://www.igvita.com/2009/03/26/ruby-algorithms-sorting-trie-heaps/

It does natively use quicksort however, which is n log n complexity on average.

What does the Ruby sort function do, exactly?

The "spaceship" operator, <=> doesn't return something so English as "a comes before b". It returns what sort needs to know: where two elements are in relation to each other. Specifically, it returns the -1, 0, or 1 value you mentioned.

In a <=> b, if a is less than b (via whatever comparison method is used for the class of which a is an instance), the return is -1. If they're equal, return is 0; and if a is greater than b, the return is 1.

When you do b <=> a, the returned value is based on b rather than a, so if a is smaller, you'll get 1, whereas you got -1 when doing a <=> b.

So while the English meaning is the same, the devil is in the details: that -1, 0, or 1 return value. That value tells Ruby precisely how two elements fit into a sorted array.

The magic with those three numbers is in the quicksort algorithm used by Ruby. It's out of scope to try and explain precisely how that algorithm works, but you can basically look at it as a simple comparison on many values. For each item in an array, <=> is called with another item in the array in order to determine where the two items fall relative to each other. Once enough comparisons have been made, the positions of all those individual items is known and the sorting is done.

As a simple (and not really technically accurate, but close enough) example, consider the array [3, 2, 7, 1]. You can grab a value to compare others to in order to start the sorting. We'll pick 3. Running a comparison of 3 with all other numbers gives us:

  • 3 <=> 2 == 1: 3 is greater than 2, so 2 must be to the left of 3. Our array might look like this now: [2, 3, 7, 1]
  • 3 <=> 7 == -1: 3 is less than 7, so 7 must be the the right of 3. Our array continues to look as it did before, as the 7 was already on the right.
  • 3 <=> 1 == 1: 3 is greater than 1, so the 1 must be on the left of 3. Our array looks like this now: [2, 1, 3, 7]

We know the 7 must be correct since it's the only element on the "greater than 3" side. So we just need to figure out the sort order for everything before the 3: 1 and 2. Running a similar comparison as above, we obviously swap the 1 and 2 to get [1, 2, 3, 7].

I hope this helps!

What is the complexity of `sort_by`?

The sort method requires O(n log n). The sort_by method implements the Schwartzian Transform. It adds an overhead of O(2 n), while the actual sorting remains at O(n log n). This may pay out, because the iterations become faster than the original ones.

sort should be faster for small arrays, while sort_by performs better with large arrays. In theory.


I could not resist to benchmark and here is the result. Array size on x axis, and sorting time in seconds on y axis. The array elements were random strings created with SecureRandom.base64(50). Ruby version was 1.8.7.

Sample Image

The results show that sort_by did not significantly gain over sort with increasing array size.

How does work simple sorting in Ruby?

There are any number of sorting algorithms, and the one Ruby uses is not defined, but if you look at the source code for rb_ary_sort which calls rb_ary_sort_bang you'll see ruby_qsort. If you look inside that you'll find it's just a wrapper around the fairly standard qsort_r. That's quicksort, one of the most common sorting algorithms.

Your block defines the sort order be defining how to compare elements between each other. Good sorting algorithms try to limit the number of comparisons which much be made between the elements.

There's any number of tutorials about quicksort so I'm not going to explain it here, here's CS50 explaining it. But the most interesting illustration might be as Hungarian Folk Dance illustrating the use of a pivot and the divide and conquer approach. The hats show which elements are being compared.

How does the sort method work? Need to understand the source code

Any time I have questions like this I find that it is helpful to look at the same method definition in Rubinius. Rubinius is a port of ruby written entirely in ruby which makes it much easier to read(for me, at least).

In this case, Enumberable#sort just transforms the Enumerable object to an Array object then calls Array#sort.

In Rubinius, Array#sort just calls Array#sortinplace which in turn calls either Array#isort or Array#mergesort, depending on the array size.

Both isort(insertion sort) and mergesort are common sorting algorithims that will be nearly identical regardless of which language they are written in and can be easily googled to get a better understanding.

ruby's = operator and sort method

Before you can understand sorting objects. You need to understand the .sort method in Ruby. If you were to sort 5 cards with numbers on them, you could take a look at all of them, find the lowest one easily, and just pick that one as your first card (presuming you're sorting from lowest to highest, which Ruby does). When your brain sorts, it can look at everything and sort from there.

There are 2 main elements of confusion here that are rarely addressed:

1) Ruby can't sort in the way you think of the word "sort". Ruby can only 'swap' array elements, and it can 'compare' array elements.

2) Ruby uses a comparison operator, called the spaceship, to attribute numbers to help it 'sort'. Those numbers are -1,0,1. People erroneously think those 3 numbers are helping it 'sort' (eg. if there was an array with 3 numbers such a 10,20,30, then the 10 would be a -1, the 20 a 0, and the 30 a 1, and Ruby is just simplifying the sorting by reducing it to -1,0,1. This is wrong. Ruby can't "sort". It can't only compare).

Look at the spaceship operator. It's 3 individual operators lumped into one, the <, the =, and the >. When Ruby compares two variables, it results in one of these numbers.

Spaceship Operator

That said, what does "results" mean? It DOESN'T mean that one of the variables is assigned a 0,1,-1. It simply is a way what Ruby can take two variables and do something with them. Now, if you just run:

puts 4 <=> 5

You'll get the result of -1, since whatever 'part' (eg. <, =, or >) of the comparison operator (spaceship) is true, gets the number that's assigned to it (as seen in the above picture). When Ruby sees this <=> with an array though, it has 2 things it will do to the array only: Leave the array alone OR swap the elements of the array.

If Ruby uses the <=> and gets a 1, it will swap the 2 elements of the array. If Ruby gets a result of -1 or 0, it will leave the array alone.

An example is if Ruby sees the array [2,1]. The sort method would make it pull in these figures like 2<=>1. Since the part of the spaceship (if you want to think of it like that) that's true is the > (ie. 2>1 is true), the result is '1' from Ruby. When Ruby sees a 1 result from the spaceship, it swaps the 2 elements of the array. Now the array is [1,2].

Hopefully at this point, you see that Ruby only compares with the <=> operator, and then swaps (or leaves alone) the 2 elements in the array it compares.

Understand the .sort method is an iterative method, meaning that it's a method that runs a block of code many times. Most people are introduced to the .sort method only after they've seen a methods such as .each or .upto (you don't need to know what those do if you haven't heard of them), but those methods run through the array 1 time ONLY. The .sort method is different in that it will run through your array as many times as it needs to so that it's sorted (by sorted, we mean compared and swapped).

To make sure you understand the Ruby syntax:

foo = [4, 5, 6]
puts foo.sort {|a,b| a <=> b}

The block of code (surrounded by {}'s) is what Ruby would do any way when it sorts from lowest to highest. But suffice it to say that the first iteration of the .sort method will assign the variables between the pipes (a, b) the first two elements of the array. So for the first iteration a=4 and b=5, and since 4<5, that results in a -1, which Ruby takes it to mean to NOT swap the array. It does this for a second iteration, meaning a=5 and b=6, sees that 5<6, results in -1 and leaves the array alone. Since all the <=> results were -1, Ruby stops looping through and feels the array is sorted at [4,5,6].

We can sort from high to low by simply swapping the order of the variables.

bar = [5, 1, 9]
puts bar.sort {|a,b| b <=> a}

Here's what Ruby is doing:

Iteration 1: Array [5,1,9]. a=5, b=1. Ruby sees the b<=>a, and says is 1 < 5? Yes. That results in -1. Stay the same.

Iteration 2: Array [5,1,9]. a=1, b=9. Ruby sees the b<=>a, and says is 9 < 1? No. That results in 1. Swap the 2 array elements. The array is now [5,9,1]

Iteration 3: Array [5,9,1]. Starting over b/c there was a +1 result in the array before going through it all. a=5, b=9. Ruby sees the b<=>a, says is 9<5? No. That results in 1. Swap. [9, 5, 1]

Iteration 4: Array [9,5,1]. a=5, b=1. Ruby sees the b<=>a, says is 1<5? Yes. That results in -1. Therefore, no swapping is performed. Done. [9,5,1].

Imagine an array with the number 50 for the first 999 elements, and a 1 for element 1000. You fully understand the sort method if you realize Ruby has got to go through this array thousands of times doing the same simple compare and swap routine to shift that 1 all the way to the beginning of the array.

Now, we can finally look at .sort when comes to an object.

def <=>(other)
other.score <=> score
end

This should now make a little more sense. When the .sort method is called on an object, like when you ran the:

@players.sort

it pulls up the "def <=>" method with the parameter (eg. 'other') which has the current object from @players (eg. 'whatever the current instance object is of '@players', since it's the sort method, it's eventually going to go through all of the elements of the '@players' array). It's just like when you try to run the puts method on a class, it automatically calls the to_s method inside that class. Same thing for the .sort method automatically looking for the <=> method.

Looking at the code inside of the <=> method, there must be a .score instance variable (with an accessor method) or simply a .score method in that class. And the result of that .score method should (hopefully) be a String or number - the 2 things ruby can 'sort'. If it's a number, then Ruby uses it's <=> 'sort' operation to rearrange all of those objects, now that it knows what part of those objects to sort (in this case, it's the result of the .score method or instance variable).

As a final tidbit, Ruby sorts alphabetically by converting it to numerical values as well. It just considers any letter to be assigned the code from ASCII (meaning since upper case letters have lower numerical values on the ASCII code chart, upper case will be sorted by default to be first).

Hope this helps!

Is sort in Ruby stable?

Both MRI's sort and sort_by are unstable. Some time ago there was a request to make them stable, but it was rejected. The reason: Ruby uses an in-place quicksort algorithm, which performs better if stability is not required. Note that you can still implement stable methods from unstable ones:

module Enumerable
def stable_sort
sort_by.with_index { |x, idx| [x, idx] }
end

def stable_sort_by
sort_by.with_index { |x, idx| [yield(x), idx] }
end
end

Learning Insertion Sort in Ruby

You are overruning the bounds of your array. The example you were given was assuming 1-indexed arrays, but arrays in ruby are 0-indexed. The first line should be

for j in 1...num.length


Related Topics



Leave a reply



Submit