How the Usort() Sorting Algorithm Works

How the usort() sorting algorithm works?

  1. How does the comparator work?

I'm not sure wether this was part of the question, but to be clear about how the comparator works:
You have an order specified by ordered list $order and a special comparator callback list_cmp which (should) return wether argument

  • $a is smaller than $b (return -1 or a value < 0)
  • $a greater than $b (return 1 or a value > 0)
  • $a equal $b (return 0)

list_cmp does this by looking up its order table and rather checking if

  • $a has a smaller (or equal) order in which case the loop exits early with return 0 or if
  • $b has a smaller order in which case the loop exits early with return 1.

Be aware that this is wrong according to the PHP Documentation which states it wants positive/negative/0 as return values. This is only correct if you know that the internals only check for comparator($a,$b) > 0, meaning it only checks if $b is smaller than and not equal to $a, making this a comparison order of $a <= order of $b. It can easily break if the code starts to check for $a being smaller than and not equal to $b.


  1. How does the quicksort sorting work?

For starters I'm assuming you are using PHP 7 or higher. In that case you hit a special case with 6-15 Element sized arrays. PHP 7+ does not seem to use quicksort for short lists, instead it uses an Insertion Sort Variant (which is most probably faster due to Hardware related stuff like caching/code locality). You can check the sorting source code f.e. on the Github PHP Mirror (as an example: PHP 7.0 Zend/Zend_sort.c Line 177-198).

The code works in 3 steps:

  1. comparision: compare neighbor elements array[j] and array[j+1], if array[j] <= array[j+1] move on, else goto 2.
  2. find insertion point: now if array[j] > array[j+1], scan backwards to find the point where array[x] < array[j+1] <= array[x+1] for x < j (obviously only until x hits the start)
  3. insert: shift elements x+1 ... j one up such that it becomes x+2 ... j+1 and insert former element at position x+1

If you apply that code to the pairings (2-1, 2-3, 1-3, 2-4, 3-4, 2-2, 2-1, 2-1, 4-1, 3-1, 1-1, 2-2) it gets obvious what the code does.

-- [2,1],3,4,2,1,2 -> 1./2./3. compare [2,1], find and insert 1 before 2
-- 1,[2,3],4,2,1,2 -> 1./2. compare [2,3], find insert point for 3 (since order of 3 < order of 2)
-- [1,3],2,4,2,1,2 -> 3. compare [1,3], found insert point for 3 before 2
-- 1,3,[2,4],2,1,2 -> 1./2. compare [2,4], find insert point for 4 (since order of 4 < order of 2)
-- 1,[3,4],2,2,1,2 -> 3. compare [3,4], found insert point for 4 before 2
-- 1,3,4,[2,2],1,2 -> 1. compare [2,2], skip
-- 1,3,4,2,[2,1],2 -> 1./2. compare [2,1], find insert point for 1
-- 1,3,4,[2,1],2,2 -> 2. compare [2,1], find insert point for 1
-- 1,3,[4,1],2,2,2 -> 2. compare [4,1], find insert point for 1
-- 1,[3,1],4,2,2,2 -> 2. compare [3,1], find insert point for 1
-- [1,1],3,4,2,2,2 -> 3. compare [1,1], fond insert point for 1 before 3
-- 1,1,3,4,2,[2,2] -> 1. compare [2,2], skip
-- sorted: 1,1,3,4,2,2,2

PS:
Here you already see that it is quite complex to derive the work of even a simple sorting algorithm (22 lines of code) by its comparison patterns. The PHP 7 quicksort implementation is around 10 times as big in lines of code and has some freaky optimizations (besides normal madness due to pivot selection and recursions).

Most of the times it is better to ignore in depth implementation details and only reduce it to the stuff that is needed. Typical questions for a sorting algorithm would be if it is stable/unstable and performing in O(log n) with O(n) Memory consumption. There are easier ways to learn the core algorithms behind those optimized implementations like the Quicksort Dance or any other visualization or a good old (e)book or webpage with examples.

-- Edited

Added a (bad, unoptimized, unsafe) php implementation for insertion sort for another visualization of how it works:

<?php

function my_usort($A, $comparator) {
// Start .. End Positions
$current_pos = 0;
$last_pos = count($A)-1;
// Outer Loop: each step checks that A[0] up to A[current_pos] is sorted.
// When the algorithm finishes we know that A[0] ... A[last_pos] is sorted
while($current_pos < $last_pos) {
echo "Sorted Subarray from \$A is " . json_encode(array_slice($A, 0, $current_pos+1)) . "<br>\n";
echo "\$A looks like this now: " . json_encode($A) .
", comparing [" . $A[$current_pos] . "," . $A[$current_pos +1] . "] (verify step)<br>\n";
// "Verification Step"
// At this point A[0] ... A[current_pos] is sorted.
// Check A[current_pos] <= A[current_pos +1]
if($comparator($A[$current_pos], $A[$current_pos +1]) > 0) {
// nope, A[current_pos] > A[current_pos +1] (list_cmp/comparator returns value > 0)
// "Insertion Step" start, find the correct position for A[current_pos+1] in the already
// sorted list A[0] ... A[current_pos]
$insert_point = $current_pos;
// Swap the missmatching Neighbor pair
echo "swapping: \$A[" . $insert_point . "] and \$A[" . ($insert_point+1) . "]<br>\n";
$tmp = $A[$insert_point +1];
$A[$insert_point +1] = $A[$insert_point];
$A[$insert_point] = $tmp;
$sorted_up_to_current_pos = false;
// Inner Loop: find correct insertion point
while($insert_point > 0 && !$sorted_up_to_current_pos) {
echo "\$A looks like this now: " . json_encode($A) .
", comparing [" . $A[$insert_point-1] . "," . $A[$insert_point] . "] (insertion step)<br>\n";
// "Insertion Step", Swap the missmatching Neighbor pairs until A[0] ... A[current_pos] is sorted again
if($comparator($A[$insert_point-1], $A[$insert_point]) > 0) {
// Swap the missmatching Neighbor pair
echo "swapping: \$A[" . ($insert_point-1) . "] and \$A[" . $insert_point . "]<br>\n";
$tmp = $A[$insert_point];
$A[$insert_point] = $A[$insert_point-1];
$A[$insert_point-1] = $tmp;
// goto new pair
$insert_point = $insert_point -1;
} else {
// found correct spot, done
$sorted_up_to_current_pos = true;
}
}
$A[$insert_point] = $tmp;
echo "\$A looks like this now: " . json_encode($A) . ", insertion done<br>\n";
}
$current_pos = $current_pos + 1;
}
echo "Sorted Array \$A is " . json_encode(array_slice($A, 0, $current_pos+1)) . "<br>\n";
}

function list_cmp($a, $b) {
global $order;
//echo "\$a=$a, \$b=$b </br>\n";

foreach ($order as $key => $value) {
//echo "\$value=$value </br>\n";
if ($a == $value) {
echo "\$a=\$value, returing 0. </br>\n";
return 0;
}
if ($b == $value) {
echo "\$b=\$value, returing 1. </br>\n";
return 1;
}
}
}

$order[0] = 1;
$order[1] = 3;
$order[2] = 4;
$order[3] = 2;

$array[0] = 2;
$array[1] = 1;
$array[2] = 3;
$array[3] = 4;
$array[4] = 2;
$array[5] = 1;
$array[6] = 2;

my_usort($array, "list_cmp");

Output is complete now with currently sorted array, positions:

Sorted Subarray from $A is [2] 
$A looks like this now: [2,1,3,4,2,1,2], comparing [2,1] (verify step)
$b=$value, returing 1.
swapping: $A[0] and $A[1]
$A looks like this now: [1,2,3,4,2,1,2], insertion done
Sorted Subarray from $A is [1,2]
$A looks like this now: [1,2,3,4,2,1,2], comparing [2,3] (verify step)
$b=$value, returing 1.
swapping: $A[1] and $A[2]
$A looks like this now: [1,3,2,4,2,1,2], comparing [1,3] (insertion step)
$a=$value, returing 0.
$A looks like this now: [1,3,2,4,2,1,2], insertion done
Sorted Subarray from $A is [1,3,2]
$A looks like this now: [1,3,2,4,2,1,2], comparing [2,4] (verify step)
$b=$value, returing 1.
swapping: $A[2] and $A[3]
$A looks like this now: [1,3,4,2,2,1,2], comparing [3,4] (insertion step)
$a=$value, returing 0.
$A looks like this now: [1,3,4,2,2,1,2], insertion done
Sorted Subarray from $A is [1,3,4,2]
$A looks like this now: [1,3,4,2,2,1,2], comparing [2,2] (verify step)
$a=$value, returing 0.
Sorted Subarray from $A is [1,3,4,2,2]
$A looks like this now: [1,3,4,2,2,1,2], comparing [2,1] (verify step)
$b=$value, returing 1.
swapping: $A[4] and $A[5]
$A looks like this now: [1,3,4,2,1,2,2], comparing [2,1] (insertion step)
$b=$value, returing 1.
swapping: $A[3] and $A[4]
$A looks like this now: [1,3,4,1,2,2,2], comparing [4,1] (insertion step)
$b=$value, returing 1.
swapping: $A[2] and $A[3]
$A looks like this now: [1,3,1,4,2,2,2], comparing [3,1] (insertion step)
$b=$value, returing 1.
swapping: $A[1] and $A[2]
$A looks like this now: [1,1,3,4,2,2,2], comparing [1,1] (insertion step)
$a=$value, returing 0.
$A looks like this now: [1,1,3,4,2,2,2], insertion done
Sorted Subarray from $A is [1,1,3,4,2,2]
$A looks like this now: [1,1,3,4,2,2,2], comparing [2,2] (verify step)
$a=$value, returing 0.
Sorted Array $A is [1,1,3,4,2,2,2]

What is the mechanism of ordering this array with usort()?

http://php.net/manual/en/function.usort.php states:

value_compare_func
The comparison function must return an integer less
than, equal to, or greater than zero if the first argument is
considered to be respectively less than, equal to, or greater than the
second

When you call the function, it takes any two values and decides whether they're in the right order.

Really all it does is move $b variable up or down the array in respect to $a.

Adding some output to your function can show you what's going on:

<?php
function list_cmp($a, $b) {
global $order;

echo "comparing $a and $b...\n";
foreach ($order as $key => $value) {
echo " checking against $value\n";
if ($a == $value) {
echo " \$a ($a) == $value: returning 0\n";
return 0;
}

if ($b == $value) {
echo " \$b ($b) == $value: returning 1\n";
return 1;
}
}
}

Let's look at the first part of the output:

comparing 4 and 1...
checking against 1
$b (1) == 1: returning 1

Here you can see that it's checking the two values 4 and 1 (we don't know which 1 it is). It then checks each item in $order. The first value is 1, which is saying all the 1s must come before any other value.

$a doesn't match 1 but $b does. So these items are in the wrong order - the first item is greater than the second, so we return 1. i.e. 4 must come after 1.

Let's look at another bit of output:

comparing 4 and 2...
checking against 1
checking against 3
checking against 4
$a (4) == 4: returning 0

Here it has checked the values 1 and 3 - neither of them match the two numbers we're considering, so they're irrelevant. Then we get to $a (4). That equals the next wanted value. This means it's in the right place, so we return 0. i.e. 4 must come before 2 - and it does.

That continues for each pair the sort compares.

In php how does usort() function works

The function cmp itself doesn't do the sorting. It just tells usort if a value is smaller, equal or greater than another value. E.g. if $a = 5 and $b = 9 it will return 1 to indicate that the value in $b is greater than the one in $a.

Sorting is done by usort.

How to view the steps of the usort() in PHP?

Using some debug output in your comparison function you can only see the comparisons that PHP does, but can't see intermediate states of the array.

But the algorithm used in usort is well known - it's QuickSort ( Which sort algorithms does PHP's usort apply? ).

You can see its vizualisation at http://www.algomation.com/algorithm/quick-sort-visualization (or just google "quicksort algorithm visualization")

How does Javascript's sort() work?

Is the array sort callback function called many times during the course of the sort?

Yes

If so, I'd like to know which two numbers are passed into the function each time

You could find out your self with:

array.sort((a,b) => {
console.log(`comparing ${a},${b}`);
return a > b ? 1
: a === b ? 0
: -1;
});

EDIT

This is the output I've got:

25,8
25,7
8,7
25,41

Role of sorting algorithms in spite of having array.sort

Why do we need so many sorting algorithms to sort an array of integers when we have array.sort which already does the same?

Most programming languages ship with some sort of sorting algorithm in their standard libraries. Most of the time, you can just use this default sorting algorithm and you'll be fine.

That said, different sorting algorithms have different performance characteristics and different tradeoffs. Most library implementations of sorting algorithms use comparison sorts like quicksort, timsort, or introsort. These algorithms can be used to sort any objects that can be compared to one another, so they're good generic sorts. For certain cases - such as sorting integers - there are specialized algorithms that can take advantage of the fact that you're sorting integer data. If, for example, you're sorting an array with lots of small numbers in it, it might be faster to use counting sort or radix sort than quicksort, both in theory and in practice.

There are other considerations to take into account as well. Quicksort, for example, is fast on average but has degenerate cases. You might have applications where you absolutely cannot hit these cases, in which case using a custom sorting algorithm might be appropriate.

Is sorting and finding the largest number in an array different ? i mean if an array is sorted obviously the largest number will end up at the end

Yes, these are different problems. You can find the largest element of an array in one pass over the array by simply recording the largest value that you've seen so far. This takes time O(n), uses space O(1), and is extremely efficient. Sorting algorithms need to spend more time and effort than this because they have to also find the second-largest element, the third-largest element, etc. In that sense, sorting is a fundamentally "harder" problem than the problem of finding the largest element of an array.

which is the best sorting algorithm to sort an array ?

As I alluded to earlier, there is no one "best" algorithm to sort an array. It depends on many factors, such as what sorts of elements are stored in the array, what the distribution of those elements are, whether you want worst-case or average-case efficiency, how much memory you want to use, etc. As with most aspects of software engineering, there are lots of different tradeoffs to make, and it's up to you to choose which one you think is best.

Hope this helps!

Which sort algorithms does PHP's usort apply?

At php.net/sort I found this:

Note: Like most PHP sorting functions, sort() uses an implementation of » Quicksort.

I believe that it uses a randomized quicksort, so there is no need for shuffling the array.

I did some tests and PHP's quicksort is not randomized, so shuffle your input array!!

Should I use sorting algorithms like bubble sort, insertion sort or should I use the inbuilt sort() function in c++ to sort arrays?

Should I use sorting algorithms like bubble sort, insertion sort or should I use the inbuilt sort() function in c++ to sort arrays?

It is your wish. You can use anything. But you need to understand the importance of time and space complexity, your requirement before using any code.

And if I should use sort() function, why do I need to learn other sorting algorithms?

The answer to this is tricky. You need to learn stuff because each sorting algorithm is different on implementation, time and space complexities. For instance, bubble sort runs in O(n^2) time while merge sort takes O(n log n) time.
while space taken by bubble sort is O(1) and merge sort takes O(n) space.In terms of code complexity, you could write a bubble sort snippet in 5 minutes while merge sort could take 30 minutes.

I'm not understanding what is the use of such algorithms if there is an inbuilt sort function, do they have something special?

Yes they do. It is to do with requirements. Today , you might not need a space effecient algorithm. But when such a case arises, you may need to prefer a particular algorithm over the other. In general, you learn these concepts just to understand how the current algorithms have been developed.



Related Topics



Leave a reply



Submit