Operator.Itemgetter or Lambda

operator.itemgetter or lambda

According to my benchmark on a list of 1000 tuples, using itemgetter is almost twice as quick as the plain lambda method. The following is my code:

In [1]: a = list(range(1000))

In [2]: b = list(range(1000))

In [3]: import random

In [4]: random.shuffle(a)

In [5]: random.shuffle(b)

In [6]: c = list(zip(a, b))

In [7]: %timeit c.sort(key=lambda x: x[1])
81.4 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: random.shuffle(c)

In [9]: from operator import itemgetter

In [10]: %timeit c.sort(key=itemgetter(1))
47 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I have also tested the performance (run time in µs) of this two method for various list size.

+-----------+--------+------------+
| List size | lambda | itemgetter |
+-----------+--------+------------+
| 100 | 8.19 | 5.09 |
+-----------+--------+------------+
| 1000 | 81.4 | 47 |
+-----------+--------+------------+
| 10000 | 855 | 498 |
+-----------+--------+------------+
| 100000 | 14600 | 10100 |
+-----------+--------+------------+
| 1000000 | 172000 | 131000 |
+-----------+--------+------------+

Sample Image

(The code producing the above image can be found here)

Combined with the conciseness to select multiple elements from a list, itemgetter is clearly the winner to use in sort method.

How do operator.itemgetter() and sort() work?

Looks like you're a little bit confused about all that stuff.

operator is a built-in module providing a set of convenient operators. In two words operator.itemgetter(n) constructs a callable that assumes an iterable object (e.g. list, tuple, set) as input, and fetches the n-th element out of it.

So, you can't use key=a[x][1] there, because python has no idea what x is. Instead, you could use a lambda function (elem is just a variable name, no magic there):

a.sort(key=lambda elem: elem[1])

Or just an ordinary function:

def get_second_elem(iterable):
return iterable[1]

a.sort(key=get_second_elem)

So, here's an important note: in python functions are first-class citizens, so you can pass them to other functions as a parameter.

Other questions:

  1. Yes, you can reverse sort, just add reverse=True: a.sort(key=..., reverse=True)
  2. To sort by more than one column you can use itemgetter with multiple indices: operator.itemgetter(1,2), or with lambda: lambda elem: (elem[1], elem[2]). This way, iterables are constructed on the fly for each item in list, which are than compared against each other in lexicographic(?) order (first elements compared, if equal - second elements compared, etc)
  3. You can fetch value at [3,2] using a[2,1] (indices are zero-based). Using operator... It's possible, but not as clean as just indexing.

Refer to the documentation for details:

  1. operator.itemgetter explained
  2. Sorting list by custom key in Python

Using itemgetter instead of lambda

You could do:

from operator import itemgetter

res = [{"operation": 1, "path": 2}, {"operation": 1, "path": 1}]

res = sorted(res, key=itemgetter("operation", "path"))

print(res)

Output

[{'operation': 1, 'path': 1}, {'operation': 1, 'path': 2}]

How to use operator.itemgetter to check a condition in a filter function instead of a lambda?

Yes, you can use itemgetter(), but not without using a lambda (or the def functionname(): ... statement syntax to define a callable, or other compositions of objects).

The first argument to filter() must be a callable object. While operator.itemgetter(1), on its own, is such an object and can be used in filter(), the expression operator.itemgetter(1)%2==0 doesn't work because you can't just use the % operator on itemgetter() objects and so raises a TypeError. Even if it didn't do that, you'd be passing the outcome of the operator.itemgetter(1)%2==0 expression to filter(), so it'd be called once, before filter() runs, and that's not what you want either.

The lambda you passed in, lambda x: not x[1]%2, is also a callable object, namely a function. It works because the lambda <args>: <expression> syntax doesn't immediately call the <expression> part, but instead returns a function object, so a callable object. The expression contained in the function (not x[1]%2), is executed only when the function is called (so, by filter() for each element).

A different way of spelling the same thing would use a def statement to create a named function first:

def second_element_is_even(tup):
return not tup[1] % 2

list(filter(second_element_is_even, tupleSet))

That's exactly the same thing as the version with lambda, except now the function object is available under the name second_element_is_even too.

If you must use an itemgetter() object, then you would have to write a different callable that then called for each filtered item, and uses the itemgetter() object as a tool to produce the desired filtering decision. Like a lambda:

get_second = operator.itemgetter(1)
list(filter(lambda x: not get_second(x) % 2, tupleSet))

That's not very useful, as x[1] is really a lot more readable.

However, if you ever needed for the specific element to be variable (sometimes it has to be the first, sometimes the second element of the tuple, sometimes it should be the sum), then an itemgetter object could be used to codify this. You can produce a element is even function that accepts a key to figure out what it should test:

def even_tester(key=None):
if key is None:
# identity function, test the value directly
key = lambda v: v
def is_even(value):
return not key(value) % 2
return is_even

# test if the second element is even
list(filter(even_tester(operator.itemgetter(1)), tupleSet))

# test if the first element is even
list(filter(even_tester(operator.itemgetter(1)), tupleSet))

# test if the sum of the elements is even
list(filter(even_tester(sum), tupleSet))

even_tester() here works a lot like itemgetter(): calling it gives you a new object that itself can be called too. What it does when called is determined by the argument you pass to even_tester().

Inverse of `operator.itemgetter`

If I've understood what you want correctly, the following would do that:

import functools
import operator

L = list('abcd')

def foo(indexable):
return functools.partial(operator.__getitem__, indexable)

g = foo(L)

for i in xrange(len(L)):
print g(i),

Update:

I've experimented further and was surprised to discover a slightly faster solution, which is this nothing other than simply this:

def foo2(indexable):
return indexable.__getitem__

Which, when run using a little testbed I threw together, produced the following results:

fastest to slowest *_test() function timings:
10,000 elements, 1,000 timeit calls, best of 3

foo2_test() : 1.46 (0.00 times slower)
lambda_test() : 4.15 (1.84 times slower)
foo_test() : 4.28 (1.93 times slower)

Each test function used just access each element of a list in a tight loop using a different technique.

Curious about how this applied to your sorting answer to the linked question, I obtained these differing results using it for sorting a list rather than just accessing each of the list's elements once:

fastest to slowest *_test() function timings:
10,000 elements, 1,000 timeit calls, best of 3

foo2_test() : 13.03 (0.00 times slower)
foo_test() : 14.70 (0.13 times slower)
lambda_test() : 16.25 (0.25 times slower)

While foo2() was the fastest In both cases, in the sorting version it was only so by a very small amount.

Here's a listing of the full testbed used to get the first set of results for simple access:

import functools
import operator

import timeit
import types

N = 1000
R = 3
SZ = 10000
SUFFIX = '_test'
SUFFIX_LEN = len(SUFFIX)

def setup():
import random
global a_list
a_list = [random.randrange(100) for _ in xrange(SZ)]

def lambda_test():
global a_list
f = lambda i: a_list[i]
for i in xrange(len(a_list)): f(i)

def foo(indexable):
return functools.partial(operator.__getitem__, indexable)

def foo_test():
global a_list
g = foo(a_list)
for i in xrange(len(a_list)): g(i)

def foo2(indexable):
return indexable.__getitem__

def foo2_test():
global a_list
g = foo2(a_list)
for i in xrange(len(a_list)): g(i)

# find all the functions named *SUFFIX in the global namespace
funcs = tuple(value for id,value in globals().items()
if id.endswith(SUFFIX) and type(value) is types.FunctionType)

# run the timing tests and collect results
timings = [(f.func_name[:-SUFFIX_LEN],
min(timeit.repeat(f, setup=setup, repeat=R, number=N))
) for f in funcs]
timings.sort(key=lambda x: x[1]) # sort by speed (ironic use of lambda?)
fastest = timings[0][1] # time fastest one took to run
longest = max(len(t[0]) for t in timings) # len of longest func name (w/o suffix)

print 'fastest to slowest *_test() function timings:\n' \
' {:,d} elements, {:,d} timeit calls, best of {:d}\n'.format(SZ, N, R)

def times_slower(speed, fastest):
return speed/fastest - 1.0

for i in timings:
print "{0:>{width}}{suffix}() : {1:.2f} ({2:.2f} times slower)".format(
i[0], i[1], times_slower(i[1], fastest), width=longest, suffix=SUFFIX)

And here's the portion that was different when testing sort usage:

def setup():
import random
global a_list
a_list = [random.randrange(100) for _ in xrange(SZ)]

def lambda_test():
global a_list
sorted(range(len(a_list)), key=lambda i:a_list[i])

def foo(indexable):
return functools.partial(operator.__getitem__, indexable)

def foo_test():
global a_list
sorted(range(len(a_list)), key=foo(a_list))

def foo2(indexable):
return indexable.__getitem__

def foo2_test():
global a_list
sorted(range(len(a_list)), key=foo2(a_list))

for loop in a list which contains dictionary

Your list is already sorted from highest to lowest in terms of points. Lists are ordered, so you can rely on it to maintain order as you iterate it.

Therefore, you can print your names inside this for loop:

for team in WorldCup:
print(team['name'])

However, I'm assuming this is ordering is just luck, and you want to handle any possible ordering of teams. If so, we can sort the teams in descending order by points using sorted() and reverse=True.

For the sorting key, similar to other answers, we can use operator.itemgetter(). Using itemgetter instead of lambda is usually slightly faster, as shown in this answer.

from operator import itemgetter

WorldCup = [
{'name':"Spain", 'points':9},
{'name':"Portugal",'points':7},
{'name':"Iran",'points':6}
]

sorted_teams_desc = sorted(WorldCup, key=itemgetter('points'), reverse=True)

for team in sorted_teams_desc:
print(team['name'])

Output:

Spain
Portugal
Iran

Is there a way to cast values when using operator.itemgetter() as sort key?

There are ways, for example using iteration_utilities.chained 1 and functools.partial:

>>> import operator import itemgetter
>>> from iteration_utilities import chained
>>> from functools import partial

>>> itemgetter_int = chained(operator.itemgetter(0, 1), partial(map, int), tuple)
>>> sorted(nums, key=itemgetter_int)
[['0', '2'], ['1', '2'], ['1', '3'], ['2', '2'], ['11', '2']]

It works but it's definetly slower than using a lambda or custom defined function.

If you really need speed you could cythonize the lambda-function (or write it in C by hand) but if you just need it in one place just use a throw-away lambda. Especially if it's in sorted because it has O(nlog(n)) comparisons so the O(n) function calls probably don't contribute much to the overall execution time.


1: This is a function in a 3rd party extension module that I authored. It needs to be seperatly installed, for example via conda or pip.



Related Topics



Leave a reply



Submit