Find the most common element in a list
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby
][1]. itertools
offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print
statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL
is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby
groups by the item only (via operator.itemgetter
). The auxiliary function, called once per grouping during the max
computation, receives and internally unpacks a group - a tuple with two items (item, iterable)
where the iterable's items are also two-item tuples, (item, original index)
[[the items of SL
]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max
operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index
of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
Finding the most common element in a list of lists
Alternative solution without using the Counter
method from collections
:
def get_freq_tuple(data):
counts = {}
for pairs in data:
for pair in pairs:
counts[pair] = counts.get(pair, 0) + 1
return [pair for pair in counts if counts[pair] == max(counts.values())]
if __name__ == "__main__":
pairs = [[(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (3, 2), (3, 1), (2, 1),
(3, 1), (3, 2), (3, 3), (3, 2), (2, 2)],
[(2, 2), (2, 1)],
[(1, 1), (1, 2), (2, 2), (2, 1)]]
print(get_freq_tuple(pairs))
Output:
[(2, 2)]
Explanation:
- Count the occurrences of each tuple and store them in a dictionary. The key of the dictionary is the tuple and the value is the occurrence.
- Filter the tuples in the dictionary by maximum occurrences of the tuples.
Disclaimer:
- Using
Counter
method fromcollections
is much more efficient.
References:
- Python dictionary
How to find most common elements of a list?
If you are using an earlier version of Python or you have a very good reason to roll your own word counter (I'd like to hear it!), you could try the following approach using a dict
.
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> word_counter = {}
>>> for word in word_list:
... if word in word_counter:
... word_counter[word] += 1
... else:
... word_counter[word] = 1
...
>>> popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
>>>
>>> top_3 = popular_words[:3]
>>>
>>> top_3
['Jellicle', 'Cats', 'and']
Top Tip: The interactive Python interpretor is your friend whenever you want to play with an algorithm like this. Just type it in and watch it go, inspecting elements along the way.
How to find all of the most common elements in a python list (order alphabetically in case of tie)?
Here is a possible solution (as discussed in the comments):
from collections import Counter
lst = # your list of characters
c = Counter(lst) # O(n)
largest = max(counts.values()) # O(n)
largest_with_ties = [k for k, v in counts.items() if v == largest] # O(n)
result = sorted(largest_with_ties)
Now, what's the complexity of sorted(largest_with_ties)
? One could say that it's O(nlogn) (because there could be n/2 ties). However, the number of characters in largest_with_ties
cannot be as large as the number of elements in lst
. And that's because there is a much smaller number of characters compared to the possible number of ints. In other words, lst
could potentially contain 10^20 numbers (just an example). But largest_with_ties
can only contain different characters, and the number of characters that can be represented (for example) with UTF8 is limited to more or less 10^6. Therefore, technically the complexity of this last operation is O(1). In general, we could say that it's O(nlogn) but with an upper limit of O(10^6log10^6).
How to get the most common element from a list in python
You can use collections.Counter
for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
From the docs:
Find the most common element in list of lists
Apply Counter().update()
option on the elements of your list,
Based on suggestion from @BlueSheepToken
from collections import Counter
words = [['a','b','a','a'],['c','c','c','d','d','d']]
counter = Counter(words[0])
for i in words[1:]:
counter.update(i)
counter.most_common()
output:
[('a', 3), ('c', 3), ('d', 3), ('b', 1)]
How to find most common element in a list of list?
There are many ways, but I wanted to let you know that there are some nice tools for that kind of things in the standard modules, e.g. collections.Counter
:
In [1]: lst = [['1','2','3','4'],['1','1','1','1'],['1','2','3','4']]
In [2]: from collections import Counter
In [3]: from operator import itemgetter
In [4]: max((Counter(l).most_common(1)[0] for l in lst), key=itemgetter(1))[0]
Out[4]: '1'
Or, you could (kinda) employ your current solution for each of the sublists:
In [5]: max(((max(set(l), key=l.count), l) for l in lst),
...: key=lambda x: x[1].count(x[0]))[0]
Out[5]: '1'
Related Topics
Indexing One Array by Another in Numpy
Checking Whole String with a Regex
Why Does Python Code Run Faster in a Function
How Does Asyncio Actually Work
Label Encoding Across Multiple Columns in Scikit-Learn
Python: List VS Dict for Look Up Table
How to Find the Mime Type of a File in Python
Python Subprocess/Popen with a Modified Environment
Handling Very Large Numbers in Python
How to Switch to New Window in Selenium for Python
How to Check for Palindrome Using Python Logic
Scope of Python Variable in for Loop
How to Merge Multiple Dicts with Same Key or Different Key
Python: Changes to My Copy Variable Affect the Original Variable
Checking Running Python Script Within the Python Script