python equivalent of filter() getting two output lists (i.e. partition of a list)
Try this:
def partition(pred, iterable):
trues = []
falses = []
for item in iterable:
if pred(item):
trues.append(item)
else:
falses.append(item)
return trues, falses
Usage:
>>> trues, falses = partition(lambda x: x > 10, [1,4,12,7,42])
>>> trues
[12, 42]
>>> falses
[1, 4, 7]
There is also an implementation suggestion in itertools recipes:
from itertools import filterfalse, tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
The recipe comes from the Python 3.x documentation. In Python 2.x filterfalse
is called ifilterfalse
.
python equivalent of scala partition
Using filter (require two iteration):
>>> items = [1,2,3,4,5]
>>> inGroup = filter(is_even, items) # list(filter(is_even, items)) in Python 3.x
>>> outGroup = filter(lambda n: not is_even(n), items)
>>> inGroup
[2, 4]
>>> outGroup
Simple loop:
def partition(item, filter_):
inGroup, outGroup = [], []
for n in items:
if filter_(n):
inGroup.append(n)
else:
outGroup.append(n)
return inGroup, outGroup
Example:
>>> items = [1,2,3,4,5]
>>> inGroup, outGroup = partition(items, is_even)
>>> inGroup
[2, 4]
>>> outGroup
[1, 3, 5]
List splitting by predicate
Maybe
for r in results:
(okays if success_condition(r) else errors).append(r)
But that doesn't look/feel very Pythonic.
Not directly relevant, but if one is looking for efficiency, caching the method look-ups would be better:
okays_append = okays.append
errors_append = errors.append
for r in results:
(okays_append if success_condition(r) else errors_append)(r)
Which is even less Pythonic.
I'm getting a list of lists in my reducer output rather than a paired value and I am unsure of what to change in my code
From the documentation for Counter.most_common
explains why you get a list of lists.
most_common(n=None) method of collections.Counter instance
List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]
Because sorting from highest to lowest frequency is like sorting in descending order, but sorting alphabetically is in ascending order, you can use a custom tuple where you take the negative of the frequency and sort everything in ascending order.
from collections import Counter
words = Counter(['coronavirus'] * 4 + ['economy'] * 2 + ['china'] * 2 + ['whatever'])
x = Counter(words)
most_common = x.most_common(3)
# After sorting you need to discard the freqency from each (word, freq) tuple
result = ' '.join(word for word, _ in sorted(most_common, key=lambda x: (-x[1], x[0])))
How to split a list based on a condition?
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
is there a more elegant way to do this?
That code is perfectly readable, and extremely clear!
# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims = [f for f in files if f[2].lower() not in IMAGE_TYPES]
Again, this is fine!
There might be slight performance improvements using sets, but it's a trivial difference, and I find the list comprehension far easier to read, and you don't have to worry about the order being messed up, duplicates being removed as so on.
In fact, I may go another step "backward", and just use a simple for loop:
images, anims = [], []
for f in files:
if f.lower() in IMAGE_TYPES:
images.append(f)
else:
anims.append(f)
The a list-comprehension or using set()
is fine until you need to add some other check or another bit of logic - say you want to remove all 0-byte jpeg's, you just add something like..
if f[1] == 0:
continue
Filter a list into two parts by a predicate
I don't think there is a built-in and your version is sub-optimal because it traverses the list twice and calls the predicate on each list element twice.
(defun filter-list-into-two-parts (predicate list)
(loop for x in list
if (funcall predicate x) collect x into yes
else collect x into no
finally (return (values yes no))))
I return two values instead of the list thereof; this is more idiomatic (you will be using multiple-value-bind
to extract yes
and no
from the multiple values returned, instead of using destructuring-bind
to parse the list, it conses less and is faster, see also values function in Common Lisp).
A more general version would be
(defun split-list (key list &key (test 'eql))
(let ((ht (make-hash-table :test test)))
(dolist (x list ht)
(push x (gethash (funcall key x) ht '())))))
(split-list (lambda (x) (mod x 3)) (loop for i from 0 to 9 collect i))
==> #S(HASH-TABLE :TEST FASTHASH-EQL (2 . (8 5 2)) (1 . (7 4 1)) (0 . (9 6 3 0)))
List comprehension vs. lambda + filter
It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter
+lambda
, but use whichever you find easier.
There are two things that may slow down your use of filter
.
The first is the function call overhead: as soon as you use a Python function (whether created by def
or lambda
) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn't think much about performance until you've timed your code and found it to be a bottleneck, but the difference will be there.
The other overhead that might apply is that the lambda is being forced to access a scoped variable (value
). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value
through a closure and this difference won't apply.
The other option to consider is to use a generator instead of a list comprehension:
def filterbyvalue(seq, value):
for el in seq:
if el.attribute==value: yield el
Then in your main code (which is where readability really matters) you've replaced both list comprehension and filter with a hopefully meaningful function name.
Is it possible to compile results into unique lists from inside of a comprehension?
Not without using side effects and throwing away the result. You can do this though:
even = []
odd = []
for i in my_list:
(odd if i % 2 else even).append(i)
This problem in general is called partitioning the list, you can find some solutions by searching SO, but none are much cleaner (in Python).
if-else in python list comprehensions
You can only construct one list at a time with list comprehension. You'll want something like:
nums = [foo for foo in mixed_list if foo.isdigit()]
strings = [foo for foo in mixed_list if not foo.isdigit()]
Related Topics
Shipping Python Modules in Pyspark to Other Nodes
Use a Library Locally Instead of Installing It
Pandas Select Rows and Columns Based on Boolean Condition
Python Datetime to String Without Microsecond Component
Get an Attribute Value Based on the Name Attribute with Beautifulsoup
How to Call Python Functions Dynamically
Python - Activate Conda Env Through Shell Script
Find and Replace Values in Xml Using Python
How to Remove Duplicate Words in a String with Python
What's the Difference Between "Update" and "Update_Idletasks"
Class Inheritance in Python 3.7 Dataclasses
Is It Pythonic to Import Inside Functions
How to Edit a Seaborn Legend Title and Labels for Figure-Level Functions