Why Must Std::Sort Compare Function Return False When Arguments Are Equal

Will std::sort always compare equal values?

Will std::sort always compare equal values or sometimes it can skip comparing them and therefore duplicate values will not be found?

Yes, some equal value elements will always be compared if duplicates do exist.

Let us assume the opposite: initial array of elements {e} for sorting contains a subset of elements having the same value and a valid sorting algorithm does not call comparison operator < for any pair of the elements from the subset.

Then we construct same sized array of tuples {e,k}, with the first tuple value from the initial array and arbitrary selected second tuple value k, and apply the same sorting algorithm using the lexicographic comparison operator for the tuples. The order of tuples after sorting can deviate from the order of sorted elements {e} only for same value elements, where in the case of array of tuples it will depend on second tuple value k.

Since we assumed that the sorting algorithm does not compare any pair of same value elements, then it will not compare the tuples with the same first tuple value, so the algorithm will be unable to sort them properly. This contradicts our assumptions and proves that some equal value elements (if they exist in the array) will always be compared during sorting.

Why does std::sort work when the comparison function uses greater-than (), but not greater-than-or-equal (=)?

This is just how it is required to work. You need strict weak ordering.

For the rationale, I believe that the sufficient explanation is that this enables you to determine whether those elements are equal (useful for e.g. std::sets). <= or >= can't do that.

<= or >= can also do that, but it seems like it was just decided to use < instead of any other relation. With this decision in mind, standard library facilities are implemented and they heavily rely on it.

Explain the working of compare predicate of standard library sort function C++?

While refering to standard is fine, understanding it can sometimes be quite hard...

Trying in simpler words:

If you want to sort elements, you need some kind of order relation that defines which of two elements shall come first ("be the smaller one"). Some data types come with a natural order, such as integral numbers, for instance: -12 < -10 - 0 - 10 < 12. std::sort (without comparison parameter) uses this natural order to sort elements ascending.

The third parameter gives you the possibility to tell std::sort explicitly how this order shall be defined, imagine the following:

std::vector v({12, 7, 10});
std::sort(v.begin(), v.end(), [](int x, int y){return x > y;});

Looks unnatural, doesn't it? std::sort interprets the comparison as "less", but we implemented a "greater"! By doing so, the greater values then (as considered being "smaller") are sorted in front of smaller ones (considered being "greater"), so we have achieved sorting in descending order...

So the comparison parameter passed to std::sort is used to either sort differently to what the natural order would be or to define explicitly an order for data types where such order does not exist.

Side note: The order relation does not necessarily have to be a total order; however, equivalent elements then can be sorted in arbitrary order (you could, though, use std::stable_sort instead to at least preserve their relative order before sorting):

int a[12, 7, 10, 7];
std::vector<int*> v({&a[0], &a[1], &a[2], &a[3]});
std::sort(v.begin(), v.end(), [](int* x, int* y) { return *x < *y; });
// v[2] == &a[2]
// v[3] == &a[1]
// BUT:
// v[0] == &a[1] && v[1] == &a[3] || v[0] == &a[3] && v[1] == &a[1]

Should sorting algorithm pass same element in the comparison function

I can speak with some authority on this question as I'm the one who wrote this code.

Here is the comparison which is asserting in your example:

https://github.com/llvm-mirror/libcxx/blob/master/include/algorithm#L3994-L3995

As the link is likely to go stale (point to the wrong line) as time goes on, I'll also quote the code here:

            // __m still guards upward moving __i
while (__comp(*__i, *__m))
++__i;

This is known as an "unguarded" loop because there is no check for the iterator __i running off the end of the sequence as it is incremented. The reason this works is because an invariant of this algorithm is that at this point it is known that __i <= __m (which is also in a comment 3 lines above this quote).

If you look at the code above this quote, you will see these comments:

    // The search going up is known to be guarded but the search coming down isn't.
// Prime the downward search with a guard.

So before we get to this point, a guarded search down the sequence is done. That is, this test:

if (__i == --__j)

After this test finds the lower guard, the algorithm then jumps to the unguarded loops which have only one test per iteration, whereas otherwise there would be two tests per iteration (a test on the iterator and a test on the dereferenced value of the iterator).

The use of "unguarded loops" is the cause of an element being compared with itself. During development I measured that the cost of the one extra comparison in the loop was a better deal than including two comparisons per iteration in the loop.

Of course this is an engineering tradeoff. If the comparison function turned out to be horribly expensive compared to the cost of comparing the iterators themselves, one could come to a different conclusion.

This answer is completely consistent with rici's answer which is also correct (and I've upvoted it). I added my voice as I could drop the "presumably" and point to specific lines of code in the algorithm.

std::sort - is passing a faulty comparator undefined behavior?

Warning: extreme language-lawyering follows.

The wording of the most recent draft of the standard puts it this way in [alg.sorting]p3:

For all algorithms that take Compare, there is a version that uses operator< instead. That is, comp(*i, *j) != false defaults to *i < *j != false. For algorithms other than those described in 25.4.3, comp
shall induce a strict weak ordering on the values.

By using the word "shall", the standard is implicitly stating that violating it leads to undefined behavior.

Whether this requires that the given function impose a SWO for all possible values or just for the values given to the algorithm is not clear from the standard. However, since the restriction is stated in a paragraph that is talking about those specific algorithms, it's not unreasonable to assume that it is refering to the range of values provided to the algorithm.

Otherwise, the default operator< could not impose SWO for floats, thanks to NaN.

Does Comparison Function specified with lambda function in std::sort return bool type?

The std::sort implementation in libstdc++ has optimized for a short sequence, after checking the source, we know that for a small sequence short than 16 elements, it will fall back to insertion sort.

In c++, the non-zero number can be automatically converted into a bool value true, with insertion sort, if your sequence has no identical values, it will be reversed(In the insertion sort loop, all values will be moved to the first place, then once loop finished, the sequence is reverted).

It's not a deterministic behavior, since the STL implementation may differ from each other, and the _S_threshold number is unrevealed to the end user.

Enlarge the array to be more than 16 elements, we get totally different result:

The program crashed with buffer overflow! So now we proved it was wrong to use such kinds of lambda [](int n1, int n2) { return n2 - n1; }, the correct way should be sort(begin(number), end(number), [](int n1, int n2) { return n2 > n1; });

#include <iostream> 
#include <functional>
#include <algorithm>
using namespace std;

int main() {
int number[] = {3, 5, 1, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23};
auto print = [](int n) { cout << n << " "; };

sort(begin(number), end(number), [](int n1, int n2) { return n2 - n1; });

for_each(begin(number), end(number), print);
cout << endl;

sort(begin(number), end(number), [](int n1, int n2) { return n1 - n2; });

for_each(begin(number), end(number), print);
cout << endl;

return 0;
}

Will std::sort work correctly if I defined a custom compare function for floating number?

std::sort requires a comparer that defines a strict weak ordering. This means, among other things, that the following condition must be met:

  • We define two items, a and b, to be equivalent (a === b) if !cmp(a, b) && !cmp(b, a)
  • Equivalence is transitive: a === b && b === c => a === c

As you already say in your question, your function cmp() does not meet these conditions, so you cannot use your function in std::sort(). Not only the result of the algorithm will be unpredictable, which is bad unless you are actually looking for this unpredictability (cf. randomize): if you have a few values that are very close to each other, such that any of them compare true with some, but false with some others, the algorithm might enter an infinite loop.

So the answer is no, you cannot use your function cmp() in std::sort() unless you want to risk your program freezing.



Related Topics



Leave a reply



Submit