List Comprehension Vs. Lambda + Filter

List comprehension vs. lambda + filter

It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter+lambda, but use whichever you find easier.

There are two things that may slow down your use of filter.

The first is the function call overhead: as soon as you use a Python function (whether created by def or lambda) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn't think much about performance until you've timed your code and found it to be a bottleneck, but the difference will be there.

The other overhead that might apply is that the lambda is being forced to access a scoped variable (value). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value through a closure and this difference won't apply.

The other option to consider is to use a generator instead of a list comprehension:

def filterbyvalue(seq, value):
for el in seq:
if el.attribute==value: yield el

Then in your main code (which is where readability really matters) you've replaced both list comprehension and filter with a hopefully meaningful function name.

When would using the filter function be used instead of a list comprehension?

There's no harm in using either. A similar comment can be made about map.

I tend to use whatever one feels easier to read. In your case I would avoid using the lambda as it is a bit verbose, and instead use the comprehension.

I would use filter or map methods if I already had a function existing I could just pass to the method, which would be more terse than the comprehension.

For example, say I write a program for finding the length of the largest name:

# Using map
longest = max(map(len, names))

# Using generator expression
longest = max(len(name) for name in names))

In the above example I would choose map over the generator expression, but it's entirely personal preference.

lambda versus list comprehension performance

Your tests are doing very different things. With S being 1M elements and T being 300:

[x for x in S for y in T if x==y]= 54.875

This option does 300M equality comparisons.

 

filter(lambda x:x in S,T)= 0.391000032425

This option does 300 linear searches through S.

 

[val for val in S if val in T]= 12.6089999676

This option does 1M linear searches through T.

 

list(set(S) & set(T))= 0.125

This option does two set constructions and one set intersection.


The differences in performance between these options is much more related to the algorithms each one is using, rather than any difference between list comprehensions and lambda.

Python: list comprehensions vs. lambda

When the list is so small there is no significant difference between the two. If the input list can grow large then there is a worse problem: you're iterating over the whole list, while you could stop at the first element. You could accomplish this with a for loop, but if you want to use a comprehension-like statement, here come generator expressions:

# like list comprehensions but with () instead of []
gen = (b for a, b in foo if a == 'b')
my_element = next(gen)

or simply:

my_element = next(b for a, b in foo if a == 'b')

If you want to learn more about generator expressions give a look at PEP 289.


Note that even with generators and iterators you have more than one choice.

# Python 3:
my_element = next(filter(lambda x: x[0] == 'b', foo))

# Python 2:
from itertools import ifilter
my_element = next(ifilter(lambda (x, y): x == 'b', foo))

I personally don't like and don't recommend this because it is much less readable. It turns out that this is actually slower than my first snippet, but more in general using filter() instead of a generator expression might be faster in some special cases.

In any case if you need benchmarking your code, I recommend using the timeit module.

List comprehension vs map

map may be microscopically faster in some cases (when you're NOT making a lambda for the purpose, but using the same function in map and a listcomp). List comprehensions may be faster in other cases and most (not all) pythonistas consider them more direct and clearer.

An example of the tiny speed advantage of map when using exactly the same function:

$ python -m timeit -s'xs=range(10)' 'map(hex, xs)'
100000 loops, best of 3: 4.86 usec per loop
$ python -m timeit -s'xs=range(10)' '[hex(x) for x in xs]'
100000 loops, best of 3: 5.58 usec per loop

An example of how performance comparison gets completely reversed when map needs a lambda:

$ python -m timeit -s'xs=range(10)' 'map(lambda x: x+2, xs)'
100000 loops, best of 3: 4.24 usec per loop
$ python -m timeit -s'xs=range(10)' '[x+2 for x in xs]'
100000 loops, best of 3: 2.32 usec per loop

Is list comprehension implemented via map and lambda function?

No, list comprehensions are not implemented by map and lambda under the hood, not in CPython and not in Pypy3 either.

CPython (3.9.13 here) compiles the list comprehension into a special code object that outputs a list and calls it as a function:

~ $ echo 'x = [a + 1 for a in [1, 2, 3, 4]]' | python3 -m dis
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x107446f50, file "<stdin>", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 2 ((1, 2, 3, 4))
8 GET_ITER
10 CALL_FUNCTION 1
12 STORE_NAME 0 (x)
14 LOAD_CONST 3 (None)
16 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x107446f50, file "<stdin>", line 1>:
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (a)
8 LOAD_FAST 1 (a)
10 LOAD_CONST 0 (1)
12 BINARY_ADD
14 LIST_APPEND 2
16 JUMP_ABSOLUTE 4
>> 18 RETURN_VALUE

Whereas the equivalent list(map(lambda: ...)) thing is just function calls:

~ $ echo 'x = list(map(lambda a: a + 1, [1, 2, 3, 4]))' | python3 -m dis
1 0 LOAD_NAME 0 (list)
2 LOAD_NAME 1 (map)
4 LOAD_CONST 0 (<code object <lambda> at 0x102701f50, file "<stdin>", line 1>)
6 LOAD_CONST 1 ('<lambda>')
8 MAKE_FUNCTION 0
10 BUILD_LIST 0
12 LOAD_CONST 2 ((1, 2, 3, 4))
14 LIST_EXTEND 1
16 CALL_FUNCTION 2
18 CALL_FUNCTION 1
20 STORE_NAME 2 (x)
22 LOAD_CONST 3 (None)
24 RETURN_VALUE

Disassembly of <code object <lambda> at 0x102701f50, file "<stdin>", line 1>:
1 0 LOAD_FAST 0 (a)
2 LOAD_CONST 1 (1)
4 BINARY_ADD
6 RETURN_VALUE

List comprehension instead of lambda in DataFrame.apply()?

What about

G['year'] = ["'{:02d}".format(x % 100) for x in G.year]

?

Filter Lambda Function

There are two issues with your code:

  1. The list variable name shadows the list() builtin -- pick a different name for your original list instead.
  2. Your lambda function isn't correct. Instead of lambda x: x == k, it should be lambda x: 'k' in x.
data = ["rabbit", "chuck", "Joe", "war", "rock", "docker"]
listfilter = list(filter(lambda x: ('k' in x), data))

# Prints ["chuck", "rock", "docker"]
print(listfilter)


Related Topics



Leave a reply



Submit