How to Do Stable Sort

What is stability in sorting algorithms and why is it important?

A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.

Background: a "stable" sorting algorithm keeps the items with the same sorting key in order. Suppose we have a list of 5-letter words:

peach
straw
apple
spork

If we sort the list by just the first letter of each word then a stable-sort would produce:

apple
peach
straw
spork

In an unstable sort algorithm, straw or spork may be interchanged, but in a stable one, they stay in the same relative positions (that is, since straw appears before spork in the input, it also appears before spork in the output).

We could sort the list of words using this algorithm: stable sorting by column 5, then 4, then 3, then 2, then 1.
In the end, it will be correctly sorted. Convince yourself of that. (by the way, that algorithm is called radix sort)

Now to answer your question, suppose we have a list of first and last names. We are asked to sort "by last name, then by first". We could first sort (stable or unstable) by the first name, then stable sort by the last name. After these sorts, the list is primarily sorted by the last name. However, where last names are the same, the first names are sorted.

You can't stack unstable sorts in the same fashion.

How do I do stable sort?

Put the key that you originally wanted to sort by and the index into an array, and sort by that.

a.sort_by.with_index { |(x, y), i| [y, i] }
# => [[:a, 0], [:c, 0], [:d, 0], [:b, 1]]

How to perform stable sort in C++ when using a custom comparator?

The comparator for sort and stable_sort must induce a strict weak order. Your comparator does not satisfy this condition.

One of the properties of a strict weak order is that for any permitted values of i and j, at most one of mysort(i,j) and mysort(j,i) can return true. Your comparator returns true for both cases when, e.g., i=2 and j=4.

To fix that, you must modify your comparator. When is arg1 "less" than arg2? There is only one case: When arg1 is even and arg2 is odd. Hence, a usable definition would be:

bool mysort(int arg1,int arg2){
return arg1 % 2 == 0 && arg2 % 2 != 0;
}

BTW, when you define a comparator, you have at no time to think about in which order the algorithm will pass the parameters. All the (earlier, later) discussions are irrelevant. The algorithm will pass parameters in which ever order it likes. There are no guarantees.

How to perform stable sort over multiple columns?

It is a solvable problem. You don't have to throw your hands up and say computers cannot be used to solve problems.

Start with the data, and add a new surrogate "Rank" column.

Date            Id    Rank
-------------- ---- ----
null 7 null
null 1 null
null 9 null
null 2 null
null 8 null
11/1/2017 null null
11/4/2017 3 null
11/5/2017 null null
11/12/2017 10 null
11/13/2017 null null

Then seed Rank to the Id:

UPDATE SortDemo SET Rank = Id
WHERE Id IS NOT NULL

Date Id Rank
-------------- ---- ----
null 7 7
null 1 1
null 9 9
null 2 2
null 8 8
11/1/2017 null null
11/4/2017 3 3
11/5/2017 null null
11/12/2017 10 10
11/13/2017 null null

Then for the remaining items items with a Date, we need to assign it to the "next" rank:

Date            Id    Rank
-------------- ---- ----
null 7 7
null 1 1
null 9 9
null 2 2
null 8 8
11/1/2017 null null <-- 3
11/4/2017 3 3
11/5/2017 null null <-- 10
11/12/2017 10 10
11/13/2017 null null <-- ?

with

UPDATE SortDemo SET Rank = (
SELECT MIN(Rank) FROM SortDemo s2
WHERE s2.Date >= SortDemo.Date)
WHERE SortDemo.Date IS NOT NULL

Date Id Rank
-------------- ---- ----
null 7 7
null 1 1
null 9 9
null 2 2
null 8 8
11/1/2017 null 3 <--
11/4/2017 3 3
11/5/2017 null 10 <--
11/12/2017 10 10
11/13/2017 null null <-- ?

There's also the edge case where there are no items "after" us; which we can fix by going backwards to one the highest previous rank:

UPDATE SortDemo SET Rank = (
SELECT MAX(Rank) FROM SortDemo s2
WHERE s2.Date <= SortDemo.Date)
WHERE SortDemo.Date IS NOT NULL

Date Id Rank
-------------- ---- ----
null 7 7
null 1 1
null 9 9
null 2 2
null 8 8
11/1/2017 null 3
11/4/2017 3 3
11/5/2017 null 10
11/12/2017 10 10
11/13/2017 null 10 <--10

And we're done

We can now sort by surrogate Rank, and break ties by sorting by Date:

SELECT * FROM SortDemo
ORDER BY Rank, Date

Date Id Rank
-------------- ---- ----
null 1 1
null 2 2
11/1/2017 null 3
11/4/2017 3 3
null 7 7
null 8 8
null 9 9
11/5/2017 null 10
11/12/2017 10 10
11/13/2017 null 10

Solution complete. Sql Fiddle

The answer is in escrow until Monday, so i can give people the opportunity to earn reputation for solving a unique problem.



Related Topics



Leave a reply



Submit