How to Get 2.X-Like Sorting Behaviour in Python 3.X

How can I get 2.x-like sorting behaviour in Python 3.x?

Stupid idea: make a first pass to divide all the different items in groups that can be compared between each other, sort the individual groups and finally concatenate them. I assume that an item is comparable to all members of a group, if it is comparable with the first member of a group. Something like this (Python3):

import itertools

def python2sort(x):
it = iter(x)
groups = [[next(it)]]
for item in it:
for group in groups:
try:
item < group[0] # exception if not comparable
group.append(item)
break
except TypeError:
continue
else: # did not break, make new group
groups.append([item])
print(groups) # for debugging
return itertools.chain.from_iterable(sorted(group) for group in groups)

This will have quadratic running time in the pathetic case that none of the items are comparable, but I guess the only way to know that for sure is to check all possible combinations. See the quadratic behavior as a deserved punishment for anyone trying to sort a long list of unsortable items, like complex numbers. In a more common case of a mix of some strings and some integers, the speed should be similar to the speed of a normal sort. Quick test:

In [19]: x = [0, 'one', 2.3, 'four', -5, 1j, 2j,  -5.5, 13 , 15.3, 'aa', 'zz']

In [20]: list(python2sort(x))
[[0, 2.3, -5, -5.5, 13, 15.3], ['one', 'four', 'aa', 'zz'], [1j], [2j]]
Out[20]: [-5.5, -5, 0, 2.3, 13, 15.3, 'aa', 'four', 'one', 'zz', 1j, 2j]

It seems to be a 'stable sort' as well, since the groups are formed in the order the incomparable items are encountered.

How can I get 2.x-like sorting behaviour in Python 3.x?

Stupid idea: make a first pass to divide all the different items in groups that can be compared between each other, sort the individual groups and finally concatenate them. I assume that an item is comparable to all members of a group, if it is comparable with the first member of a group. Something like this (Python3):

import itertools

def python2sort(x):
it = iter(x)
groups = [[next(it)]]
for item in it:
for group in groups:
try:
item < group[0] # exception if not comparable
group.append(item)
break
except TypeError:
continue
else: # did not break, make new group
groups.append([item])
print(groups) # for debugging
return itertools.chain.from_iterable(sorted(group) for group in groups)

This will have quadratic running time in the pathetic case that none of the items are comparable, but I guess the only way to know that for sure is to check all possible combinations. See the quadratic behavior as a deserved punishment for anyone trying to sort a long list of unsortable items, like complex numbers. In a more common case of a mix of some strings and some integers, the speed should be similar to the speed of a normal sort. Quick test:

In [19]: x = [0, 'one', 2.3, 'four', -5, 1j, 2j,  -5.5, 13 , 15.3, 'aa', 'zz']

In [20]: list(python2sort(x))
[[0, 2.3, -5, -5.5, 13, 15.3], ['one', 'four', 'aa', 'zz'], [1j], [2j]]
Out[20]: [-5.5, -5, 0, 2.3, 13, 15.3, 'aa', 'four', 'one', 'zz', 1j, 2j]

It seems to be a 'stable sort' as well, since the groups are formed in the order the incomparable items are encountered.

Find an analogue for the Python 2.7 sorting behavior in Python 3.x

You need to supply an explicit key to sort or sorted, and the key should explain how to compare elements of different types.

A quick solution is to replace element x with a tuple (type(x).__name__, x) so that elements are grouped by type first, and only compared if they have the same type:

some_list = ['a', [4, 5, 6], 3, 2.0, 'b', 1, 1.5, ('c', 'd'), [2, 3, 4]]

some_list.sort(key=lambda x: (type(x).__name__, x))

print(some_list)
# [1.5, 2.0, 1, 3, [2, 3, 4], [4, 5, 6], 'a', 'b', ('c', 'd')]

Note how the integers were separated from the floats. If you don't want that, you need a more complex key. I give an example at the bottom of this post.

You could try other keys, which will result in different orders:

  • key=str or key=repr: Brutally convert everything to string before comparison. This has the downside that numbers are compared lexicographically, so that 1 < 10 < 2.
  • key=lambda x: tuple(more_itertools.collapse(x)) this flattens everything into tuples, so that a single number is equivalent to a list of one number, and a list of lists of lists is equivalent to a simple list. But it will still crash if trying to compare numbers with strings.
  • Some hybrid method of the two solutions above. You can declare a complex key like any function, using the def keyword.
  • Try comparing the elements with their usual comparison, then if it raises an exception, convert the elements into strings and compare again:
from functools import cmp_to_key

def cmp(a,b):
try:
return (a > b) - (a < b)
except TypeError:
a, b = str(a), str(b)
return (a > b) - (a < b)

k = cmp_to_key(cmp)

some_list = ['a', [4, 5, 6], 3, 2.0, 'b', 1, 1.5, ('c', 'd'), [2, 3, 4]]

some_list.sort(key=k)

print(some_list)
# [('c', 'd'), 1, 1.5, 2.0, 3, [2, 3, 4], [4, 5, 6], 'a', 'b']

Sorting a Python list by two fields

like this:

import operator
list1 = sorted(csv1, key=operator.itemgetter(1, 2))

Multiple sorting in Python

Convert x[1] to int(x[1]):

sorted(d,key=lambda x:(int(x[0]), int(x[1])))

Output:

[['1', '1', '3'], ['1', '4', '9'], ['1', '7', '14'], ['1', '12', '3'], ['2', '3', '1']]

Why does this key class for sorting heterogeneous sequences behave oddly?

You do not know what order the comparisons are done in, or even which items are compared, which means you can't really know what effect your __lt__ will have. Your defined __lt__ sometimes depends on the actual values, and sometimes on the string representations of the types, but both versions may be used for the same object in the course of the sort. This means that your ordering is not determined solely by the objects in the list, but also may depend on their initial order. This in turn means that just because objects are mutually comparable does not mean they will be sorted together; they may be "blocked" by an incomparable object between them.

You can get an inkling of what is going on by putting some debugging prints in to see what it's comparing:

class motley:

def __init__(self, value):
self.value = value

def __lt__(self, other):
fallback = False
try:
result = self.value < other.value
except TypeError:
fallback = True
result = repr(type(self.value)) < repr(type(other.value))
symbol = "<" if result else ">"
print(self.value, symbol, other.value, end="")
if fallback:
print(" -- because", repr(type(self.value)), symbol, repr(type(other.value)))
else:
print()
return result

Then:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley)
1 > 0.0
(1+0j) < 1 -- because <class 'complex'> < <class 'int'>
(1+0j) < 1 -- because <class 'complex'> < <class 'int'>
(1+0j) < 0.0 -- because <class 'complex'> < <class 'float'>
False > 0.0
False < 1
(2+3j) > False -- because <class 'complex'> > <class 'bool'>
(2+3j) < 1 -- because <class 'complex'> < <class 'int'>
[(1+0j), 0.0, False, (2+3j), 1]

You can see, for instance, that the type-based ordering is used for comparing the complex number to 1, but not for comparing 1 and 0. Likewise 0.0 < False for "normal" reasons, but 2+3j > False for type-name-based reasons.

The result is that it sorts 1+0j to the beginning, and then leaves 2+3j where it is above False. It never even attempts to compare the two complex numbers to each other, and the only item they are both compared to is 1.

More generally, your approach can lead to an intransitive ordering with appropriate choices for the names of the types involved. For instance, if you define classes A, B, and C, such that A and C can be compared, but they raise exceptions when comparing to B, then by creating objects a, b and c (from the respective classes) such that c < a, you can create a cycle a < b < c < a. a < b < c will be true because the classes will be compared based on their names, but c < a since these types can be directly compared. With an intransitive ordering, there is no hope of a "correct" sorted order.

You can even do this with builtin types, although it requires getting a little creative to think of objects whose type names are in the right alphabetical sequence:

>>> motley(1.0) < motley(lambda: 1) < motley(0) < motley(1.0)
True

(Because 'float' < 'function' :-)

Sorting list by an attribute that can be None

The ordering comparison operators are stricter about types in Python 3, as described here:

The ordering comparison operators (<, <=, >=, >) raise a TypeError
exception when the operands don’t have a meaningful natural ordering.

Python 2 sorts None before any string (even empty string):

>>> None < None
False

>>> None < "abc"
True

>>> None < ""
True

In Python 3 any attempts at ordering NoneType instances result in an exception:

>>> None < "abc"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < str()

The quickest fix I can think of is to explicitly map None instances into something orderable like "":

my_list_sortable = [(x or "") for x in my_list]

If you want to sort your data while keeping it intact, just give sort a customized key method:

def nonesorter(a):
if not a:
return ""
return a

my_list.sort(key=nonesorter)

Arbitrary but deterministic sort a list of incompatible types

Here is a suggestion:

list_1.sort(key=lambda x: (repr(type(x[1])), x[1]))

print(list_1)
# [['B', True], ['Z', 412.41], ['A', 1], ['A', 4], ['G', 12], ['G', 54], ['H', 'No'], ['D', 'Yes']]

This will group together the elements which have the same type for their second element. The actual values of x[1] will only be compared when the types differ, so you won't have a TypeError: '<' not supported between instances of something and something else.

Note how 412.41 was isolated from the other numbers, because it is a float, whereas the other numbers are int.

Sorting key function that uses custom comparison

Use the functools.cmp_to_key helper.

>>> import functools
>>> digits = ['3', '30', '34', '32', '9', '5']
>>> sorted(
... digits,
... key=functools.cmp_to_key(lambda a, b: cmp(a+b, b+a)),
... reverse=True)
['9', '5', '34', '3', '32', '30']

The Python 3 sorting functions take a ‘key’ parameter:

key specifies a function of one argument that is used to extract a
comparison key from each list element: key=str.lower. The default
value is None (compare the elements directly).

The functools.cmp_to_key helper is designed to help you transition to
that style:

functools.cmp_to_key(func)

Transform an old-style comparison function to a key function. […]
This function is primarily used as a transition tool for programs
being converted from Python 2 which supported the use of comparison
functions.

This works in the latest Python 2 and Python 3.

The trick is done by creating a key function that takes an item for
comparison, and returns a custom object, which knows how to compare
itself as specified by your comparison function.

>>> key_func = functools.cmp_to_key(lambda a, b: cmp(a+b, b+a))
>>> key_func("32")
<functools.K object at 0x7f6781ce0980>
>>> key_func("32") < key_func("5")
True

See the Sorting HOWTO for this and other tricks.



Related Topics



Leave a reply



Submit