Is Python's Sorted() Function Guaranteed to Be Stable

Is python's sorted() function guaranteed to be stable?

Yes, the intention of the manual is indeed to guarantee that sorted is stable and indeed that it uses exactly the same algorithm as the sort method. I do realize that the docs aren't 100% clear about this identity; doc patches are always happily accepted!

Why is python's sorted called stable despite not preserving the original order?

Think about it - (1, 2) comes before (1, 3), does it not? Sorting a list by default does not automatically mean "just sort it based off the first element". Otherwise you could say that apple comes before aardvark in the alphabet. In other words, this has nothing to do with stability.

The docs also have a nice explanation about how data structures such as lists and tuples are sorted lexicographically:

In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length.

sorted() order unstable under certain conditions. Is this intentional?

It is not a bug, you missed something. You manipulated the dictionary changing the order of iteration, and it is that order that is kept stable by sorted(). Or put differently, sorted() keeps the input order stable, and you changed that input order by altering the dictionary.

Note that sorted() doesn't 'see' a dictionary here. It is passed a sequence of keys, just as if you used list() on the dictionary first. In this case, both 8 and 16 hash to the same hash-table slot. Which one is listed first depends on the order of insertions:

>>> d1 = {}
>>> d1[8] = None
>>> d1[16] = None
>>> d1
{8: None, 16: None}
>>> list(d1)
[8, 16]
>>> d2 = {}
>>> d2[16] = None
>>> d2[8] = None
>>> d2
{16: None, 8: None}
>>> list(d2)
[16, 8]

Calling list() on the two dictionaries shows that the keys are listed in a different order. sorted() just maintained that order as it iterates over the dictionary, since they both are otherwise sorted in the same location, by value. This is exactly what the documentation tells you would happen.

Note that dictionary equality has no role to play here, at all. That's not what the compare equal part of the documentation is referring to. That only refers to the elements in the sequence; if the elements are equal (or in this case, if the key(element) values are equal), then those elements retain their relative order.

So to focus on possible things you missed here:

  • The signature of the method is sorted(iterable[, key][, reverse]); the key word there is iterable. The first line of the documentation:

    Return a new sorted list from the items in iterable.

    Emphasis mine; it is the items from the iterable that are sorted. The Python glossary defines iterable as:

    An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...] When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values.

    Anything that takes an iterable basically will call iter() on it to loop over the sequence produced.

  • Dictionaries happen to be iterable, and iterating gives you the keys. See the mapping types documentation:

    iter(d)
    Return an iterator over the keys of the dictionary. This is a shortcut for iter(d.keys()).

    The dict.keys() documentation then points to the dictionary views section, which states:

    iter(dictview)
    Return an iterator over the keys, values or items (represented as tuples of (key, value)) in the dictionary.

    Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()). Another way to create the same list is pairs = [(v, k) for (k, v) in d.items()].

    Again, emphasis mine. So sorted() iterates, taking the items to sort. Dictionaries, when iterated, produce keys whose order is not stable.

  • Finally, the section you quoted, never contradicts this:

    The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal.

    So the elements in the sequence iterated do not change order. Nowhere does this state that dictionaries cannot produce a different order when iterated however.

    In other words, it doesn't matter if iterable_a == iterable_b here, it is not about iterable equality, only element equality matters to the sort order stability. If the iteration order differs, that order is kept stable.

How does Python break tie when sorting an iterable

Also in wiki:

Starting with Python 2.2, sorts are guaranteed to be stable. That
means that when multiple records have the same key, their original
order is preserved.

sorted in njit numba

Your function tell to Numba/CPython to sort lines based on the second item which contains Inf values. Two infinity values are considered equal so a sorting algorithm can change the order of the lines having Inf values in the second item of the target lines. The different results is due to different algorithms being used. This is expected because the sorting algorithm is not guaranteed to be stable (ie. preserve the order of equal lines).

To solve the problem, you need to use a stable algorithm. np.argsort can be used to find the ordering of the value and the parameter kind can be tuned so to choose a stable algorithm (called "stable"). Additionally, np.argsort should also be faster than sorted on Numpy arrays because it does not cause many lambda calls nor compute Numpy lines as slow CPython objects. This should do the job:

arr[np.argsort(-arr[:,2], kind='stable')]

Update: @creanion is right about sorted being mandatory stable so Numba should fulfil the same constraints specified in the Python documentation. However, the documentation of Numba specify that the key argument is not yet supported. That being said, this argument clearly impact the result/behaviour of the sorted function. Thus, it looks like it has been at least partially implemented recently.

Whether this is a bug in Numba is disputable since the key item is supposed not to be supported. Indeed, without this parameter the stability is not a problem (two items can either be truly equal or said to be different though they may be equal -- eg. for NaN). Note such stability constraints has a strong implication on the Numba code since it prevent some algorithms to be used like the famous Quick-Sort for example.

ordering of sorted() when ties are present

See Sort Stability and Complex Sorts in Sorting HOWTO:

Sorts are guaranteed to be stable. That means that when multiple records have the same key, their original order is preserved.

Python sorted two keys TWO orders

If your first element in the tuple is an integer, you can sort by its negative value:

sorted(theList, key=lambda (num, letter): (-num, letter))

Is python's set stable?

There's no formal guarantee about the stability of sets. However, in the CPython implementation, as long as nothing changes the set, the items will be produced in the same order. Sets are implemented as open-addressing hashtables (with a prime probe), so inserting or removing items can completely change the order (in particular, when that triggers a resize, which reorganizes how the items are laid out in memory.) You can also have two identical sets that nonetheless produce the items in different order, for example:

>>> s1 = {-1, -2}
>>> s2 = {-2, -1}
>>> s1 == s2
True
>>> list(s1), list(s2)
([-1, -2], [-2, -1])

Unless you're very certain you have the same set and nothing touched it inbetween the two iterations, it's best not to rely on it staying the same. Making seemingly irrelevant changes to, say, functions you call inbetween could produce very hard to find bugs.

Is python's sorted() function guaranteed to be stable?

Yes, the intention of the manual is indeed to guarantee that sorted is stable and indeed that it uses exactly the same algorithm as the sort method. I do realize that the docs aren't 100% clear about this identity; doc patches are always happily accepted!

Stable Sort for Dictionary-Values in Python

Here is one way to do it:

>>> sorted_kv = sorted(d.items(), key=lambda (k,v):int(v), reverse=True)
>>> OrderedDict(sorted_kv)
OrderedDict([('3', '5'), ('26', '4'), ('22', '4'), ('16', '3'), ...

This takes the key/value pairs from the dictionary, sorts them, and creates a new ordered dictionary with the required ordering.

The key= argument to sorted() specifies that the pairs are to be sorted according to the numeric value of the second item.

The reason I needed to call int() is that your dictionary keeps both the keys and the values as strings. Sorting them as-is will work, but will produce the lexicographic ordering instead of the numeric one.



Related Topics



Leave a reply



Submit