How Does the Key Argument in Python's Sorted Function Work

How does the key argument in python's sorted function work?

The function you pass in to key is given each of the items that are being sorted, and returns a "key" that Python can sort by. So, if you want to sort a list of strings by the reverse of the string, you could do this:

list_of_strings.sort(key=lambda s: s[::-1])

This lets you specify the value each item is sorted by, without having to change the item. That way, you don't have to build a list of reversed strings, sort that, then reverse them back.

# DON'T do this

data = ['abc', 'def', 'ghi', 'jkl']
reversed_data = [s[::-1] for s in data]
reversed_data.sort()
data = [s[::-1] for s in reversed_data]

# Do this

data.sort(key=lambda s: s[::-1])

In your case, the code is sorting each item by the second item in the tuple, whereas normally it would initially sort by the first item in the tuple, then break ties with the second item.

How does the key argument to sorted work?

The key argument to sorted expects a function, which sorted then applies to each item of the thing to be sorted. The results of key(item) are compared to each other, instead of each original item, during the sorting process.

You can imagine it working a bit like this:

def sorted(thing_to_sort, key):
#
# ... lots of complicated stuff ...
#
if key(x) < key(y):
# do something
else:
# do something else
#
# ... lots more complicated stuff ...
#
return result

As you can see, the parentheses () are added to the function key inside sorted, applying it to x and y, which are items of thing_to_sort.

In your first example, str.lower is the function that gets applied to each x and y.

itemgetter is a bit different. It's a function which returns another function, and in your example, it's that other function which gets applied to x and y.

You can see how itemgetter works in the console:

>>> from operator import itemgetter
>>> item = ('john', 'A', 15)
>>> func = itemgetter(2)
>>> func(item)
15

It can be a little hard to get your head around "higher order" functions (ones which accept or return other functions) at first, but they're very useful for lots of different tasks, so it's worth experimenting with them until you feel comfortable.

What does the 'sorted' function's 'key' argument do?

That key sends each element to the str function for comparison purposes. When strings are sorted, they are compared alphabetically. Since '1' is before '2', '12' comes before '2', in the same way that 'and' would come before 'ball'. There is no such thing as place value in strings, as there is with numbers.

Why is it OK to pass a class to the key attribute of Python's sorted() function?

python uses duck typing, so it doesn't check if the key is a function, it just calls it. the key argument is used in sorted to determine what to compare. In this case, instead of comparing [3, 30, 7] it's comparing [K(3), K(30), K(7)]. That's also why there's a __lt__ method implemented, for the less than comparison

How key in python sorted method can use a list

The key argument to sorted means "Pretend the value is the result of this function instead of the actual value." So when you sort 'abc' with the lookup table you gave, it does this:

                # [1st, 2nd, 3rd] sort order
lookup.get('a') # [ -3, 0, 0]
lookup.get('b') # [ 0, -1, -2]
lookup.get('c') # [ 0, -2, -1]

Then it will figure out the sorted order of the above values. Lists are sorted lexicographically meaning the first element is compared first, just like in a dictionary ("aardvark" comes before "beaver" and also before "ant").

After looking at the first elements (-3, 0, 0) we know 'a' has the smallest value, but we don't know which of 'b' and 'c' is smaller. But as soon as we see the second elements (0, -1, -2), we know that 'c' is smaller, so the final order is 'acb' without ever consulting the third elements (0, -2, -1).

Syntax behind sorted(key=lambda: ...)

key is a function that will be called to transform the collection's items before they are compared. The parameter passed to key must be something that is callable.

The use of lambda creates an anonymous function (which is callable). In the case of sorted the callable only takes one parameters. Python's lambda is pretty simple. It can only do and return one thing really.

The syntax of lambda is the word lambda followed by the list of parameter names then a single block of code. The parameter list and code block are delineated by colon. This is similar to other constructs in python as well such as while, for, if and so on. They are all statements that typically have a code block. Lambda is just another instance of a statement with a code block.

We can compare the use of lambda with that of def to create a function.

adder_lambda = lambda parameter1,parameter2: parameter1+parameter2
def adder_regular(parameter1, parameter2): return parameter1+parameter2

lambda just gives us a way of doing this without assigning a name. Which makes it great for using as a parameter to a function.

variable is used twice here because on the left hand of the colon it is the name of a parameter and on the right hand side it is being used in the code block to compute something.

What arguments does Python sort() function have?

Arguments of sort and sorted

Both sort and sorted have three keyword arguments: cmp, key and reverse.

L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
cmp(x, y) -> -1, 0, 1

sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list

Using key and reverse is preferred, because they work much faster than an equivalent cmp.

key should be a function which takes an item and returns a value to compare and sort by. reverse allows to reverse sort order.

Using key argument

You can use operator.itemgetter as a key argument to sort by second, third etc. item in a tuple.

Example

>>> from operator import itemgetter

>>> a = range(5)
>>> b = a[::-1]
>>> c = map(lambda x: chr(((x+3)%5)+97), a)
>>> sequence = zip(a,b,c)

# sort by first item in a tuple
>>> sorted(sequence, key = itemgetter(0))
[(0, 4, 'd'), (1, 3, 'e'), (2, 2, 'a'), (3, 1, 'b'), (4, 0, 'c')]

# sort by second item in a tuple
>>> sorted(sequence, key = itemgetter(1))
[(4, 0, 'c'), (3, 1, 'b'), (2, 2, 'a'), (1, 3, 'e'), (0, 4, 'd')]

# sort by third item in a tuple
>>> sorted(sequence, key = itemgetter(2))
[(2, 2, 'a'), (3, 1, 'b'), (4, 0, 'c'), (0, 4, 'd'), (1, 3, 'e')]

Explanation

Sequences can contain any objects, not even comparable, but if we can define a function which produces something we can compare for each of the items, we can pass this function in key argument to sort or sorted.

itemgetter, in particular, creates such a function that fetches the given item from its operand. An example from its documentation:

After, f=itemgetter(2), the call f(r) returns r[2].

Mini-benchmark, key vs cmp

Just out of curiosity, key and cmp performance compared, smaller is better:

>>> from timeit import Timer
>>> Timer(stmt="sorted(xs,key=itemgetter(1))",setup="from operator import itemgetter;xs=range(100);xs=zip(xs,xs);").timeit(300000)
6.7079150676727295
>>> Timer(stmt="sorted(xs,key=lambda x:x[1])",setup="xs=range(100);xs=zip(xs,xs);").timeit(300000)
11.609490871429443
>>> Timer(stmt="sorted(xs,cmp=lambda a,b: cmp(a[1],b[1]))",setup="xs=range(100);xs=zip(xs,xs);").timeit(300000)
22.335839986801147

So, sorting with key seems to be at least twice as fast as sorting with cmp. Using itemgetter instead of lambda x: x[1] makes sort even faster.



Related Topics



Leave a reply



Submit