How does the key argument in python's sorted function work?
The function you pass in to key
is given each of the items that are being sorted, and returns a "key" that Python can sort by. So, if you want to sort a list of strings by the reverse of the string, you could do this:
list_of_strings.sort(key=lambda s: s[::-1])
This lets you specify the value each item is sorted by, without having to change the item. That way, you don't have to build a list of reversed strings, sort that, then reverse them back.
# DON'T do this
data = ['abc', 'def', 'ghi', 'jkl']
reversed_data = [s[::-1] for s in data]
reversed_data.sort()
data = [s[::-1] for s in reversed_data]
# Do this
data.sort(key=lambda s: s[::-1])
In your case, the code is sorting each item by the second item in the tuple, whereas normally it would initially sort by the first item in the tuple, then break ties with the second item.
How does the key argument to sorted work?
The key
argument to sorted
expects a function, which sorted
then applies to each item of the thing to be sorted. The results of key(item)
are compared to each other, instead of each original item
, during the sorting process.
You can imagine it working a bit like this:
def sorted(thing_to_sort, key):
#
# ... lots of complicated stuff ...
#
if key(x) < key(y):
# do something
else:
# do something else
#
# ... lots more complicated stuff ...
#
return result
As you can see, the parentheses ()
are added to the function key
inside sorted
, applying it to x
and y
, which are items of thing_to_sort
.
In your first example, str.lower
is the function that gets applied to each x
and y
.
itemgetter
is a bit different. It's a function which returns another function, and in your example, it's that other function which gets applied to x
and y
.
You can see how itemgetter
works in the console:
>>> from operator import itemgetter
>>> item = ('john', 'A', 15)
>>> func = itemgetter(2)
>>> func(item)
15
It can be a little hard to get your head around "higher order" functions (ones which accept or return other functions) at first, but they're very useful for lots of different tasks, so it's worth experimenting with them until you feel comfortable.
What does the 'sorted' function's 'key' argument do?
That key sends each element to the str
function for comparison purposes. When strings are sorted, they are compared alphabetically. Since '1'
is before '2'
, '12'
comes before '2'
, in the same way that 'and'
would come before 'ball'
. There is no such thing as place value in strings, as there is with numbers.
Why is it OK to pass a class to the key attribute of Python's sorted() function?
python uses duck typing, so it doesn't check if the key is a function, it just calls it. the key argument is used in sorted to determine what to compare. In this case, instead of comparing [3, 30, 7] it's comparing [K(3), K(30), K(7)]. That's also why there's a __lt__ method implemented, for the less than comparison
How key in python sorted method can use a list
The key
argument to sorted
means "Pretend the value is the result of this function instead of the actual value." So when you sort 'abc'
with the lookup table you gave, it does this:
# [1st, 2nd, 3rd] sort order
lookup.get('a') # [ -3, 0, 0]
lookup.get('b') # [ 0, -1, -2]
lookup.get('c') # [ 0, -2, -1]
Then it will figure out the sorted order of the above values. Lists are sorted lexicographically meaning the first element is compared first, just like in a dictionary ("aardvark" comes before "beaver" and also before "ant").
After looking at the first elements (-3, 0, 0) we know 'a' has the smallest value, but we don't know which of 'b' and 'c' is smaller. But as soon as we see the second elements (0, -1, -2), we know that 'c' is smaller, so the final order is 'acb' without ever consulting the third elements (0, -2, -1).
Syntax behind sorted(key=lambda: ...)
key
is a function that will be called to transform the collection's items before they are compared. The parameter passed to key
must be something that is callable.
The use of lambda
creates an anonymous function (which is callable). In the case of sorted
the callable only takes one parameters. Python's lambda
is pretty simple. It can only do and return one thing really.
The syntax of lambda
is the word lambda
followed by the list of parameter names then a single block of code. The parameter list and code block are delineated by colon. This is similar to other constructs in python as well such as while
, for
, if
and so on. They are all statements that typically have a code block. Lambda is just another instance of a statement with a code block.
We can compare the use of lambda with that of def to create a function.
adder_lambda = lambda parameter1,parameter2: parameter1+parameter2
def adder_regular(parameter1, parameter2): return parameter1+parameter2
lambda just gives us a way of doing this without assigning a name. Which makes it great for using as a parameter to a function.
variable
is used twice here because on the left hand of the colon it is the name of a parameter and on the right hand side it is being used in the code block to compute something.
What arguments does Python sort() function have?
Arguments of sort
and sorted
Both sort
and sorted
have three keyword arguments: cmp
, key
and reverse
.
L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
cmp(x, y) -> -1, 0, 1
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list
Using key
and reverse
is preferred, because they work much faster than an equivalent cmp
.
key
should be a function which takes an item and returns a value to compare and sort by. reverse
allows to reverse sort order.
Using key
argument
You can use operator.itemgetter
as a key argument to sort by second, third etc. item in a tuple.
Example
>>> from operator import itemgetter
>>> a = range(5)
>>> b = a[::-1]
>>> c = map(lambda x: chr(((x+3)%5)+97), a)
>>> sequence = zip(a,b,c)
# sort by first item in a tuple
>>> sorted(sequence, key = itemgetter(0))
[(0, 4, 'd'), (1, 3, 'e'), (2, 2, 'a'), (3, 1, 'b'), (4, 0, 'c')]
# sort by second item in a tuple
>>> sorted(sequence, key = itemgetter(1))
[(4, 0, 'c'), (3, 1, 'b'), (2, 2, 'a'), (1, 3, 'e'), (0, 4, 'd')]
# sort by third item in a tuple
>>> sorted(sequence, key = itemgetter(2))
[(2, 2, 'a'), (3, 1, 'b'), (4, 0, 'c'), (0, 4, 'd'), (1, 3, 'e')]
Explanation
Sequences can contain any objects, not even comparable, but if we can define a function which produces something we can compare for each of the items, we can pass this function in key
argument to sort
or sorted
.
itemgetter
, in particular, creates such a function that fetches the given item from its operand. An example from its documentation:
After,
f=itemgetter(2)
, the callf(r)
returnsr[2]
.
Mini-benchmark, key
vs cmp
Just out of curiosity, key
and cmp
performance compared, smaller is better:
>>> from timeit import Timer
>>> Timer(stmt="sorted(xs,key=itemgetter(1))",setup="from operator import itemgetter;xs=range(100);xs=zip(xs,xs);").timeit(300000)
6.7079150676727295
>>> Timer(stmt="sorted(xs,key=lambda x:x[1])",setup="xs=range(100);xs=zip(xs,xs);").timeit(300000)
11.609490871429443
>>> Timer(stmt="sorted(xs,cmp=lambda a,b: cmp(a[1],b[1]))",setup="xs=range(100);xs=zip(xs,xs);").timeit(300000)
22.335839986801147
So, sorting with key
seems to be at least twice as fast as sorting with cmp
. Using itemgetter
instead of lambda x: x[1]
makes sort even faster.
Related Topics
How to Convert List of Key-Value Tuples into Dictionary
How to Print a List in Python "Nicely"
The Zip() Function in Python 3
Python-Requests Close Http Connection
Dummy Variables When Not All Categories Are Present
How to Include Related Model Fields Using Django Rest Framework
How to Use a Multiprocessing.Manager()
Ssl.Sslerror: [Ssl: Certificate_Verify_Failed] Certificate Verify Failed (_Ssl.C:749)
Plotting a Fast Fourier Transform in Python
How to Extract Top-Level Domain Name (Tld) from Url
How to Install Pyqt4 on Windows Using Pip
Create a Day-Of-Week Column in a Pandas Dataframe Using Python
Multithreaded Blas in Python/Numpy
Explicitly Select Items from a List or Tuple
How to Use Virtualenv with Python