Find the most common element in a list
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby
][1]. itertools
offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print
statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL
is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby
groups by the item only (via operator.itemgetter
). The auxiliary function, called once per grouping during the max
computation, receives and internally unpacks a group - a tuple with two items (item, iterable)
where the iterable's items are also two-item tuples, (item, original index)
[[the items of SL
]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max
operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index
of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
How to find most common elements of a list?
If you are using an earlier version of Python or you have a very good reason to roll your own word counter (I'd like to hear it!), you could try the following approach using a dict
.
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> word_counter = {}
>>> for word in word_list:
... if word in word_counter:
... word_counter[word] += 1
... else:
... word_counter[word] = 1
...
>>> popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
>>>
>>> top_3 = popular_words[:3]
>>>
>>> top_3
['Jellicle', 'Cats', 'and']
Top Tip: The interactive Python interpretor is your friend whenever you want to play with an algorithm like this. Just type it in and watch it go, inspecting elements along the way.
Finding the most common element in a list of lists
Alternative solution without using the Counter
method from collections
:
def get_freq_tuple(data):
counts = {}
for pairs in data:
for pair in pairs:
counts[pair] = counts.get(pair, 0) + 1
return [pair for pair in counts if counts[pair] == max(counts.values())]
if __name__ == "__main__":
pairs = [[(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (3, 2), (3, 1), (2, 1),
(3, 1), (3, 2), (3, 3), (3, 2), (2, 2)],
[(2, 2), (2, 1)],
[(1, 1), (1, 2), (2, 2), (2, 1)]]
print(get_freq_tuple(pairs))
Output:
[(2, 2)]
Explanation:
- Count the occurrences of each tuple and store them in a dictionary. The key of the dictionary is the tuple and the value is the occurrence.
- Filter the tuples in the dictionary by maximum occurrences of the tuples.
Disclaimer:
- Using
Counter
method fromcollections
is much more efficient.
References:
- Python dictionary
Find the most common element in list of lists
Apply Counter().update()
option on the elements of your list,
Based on suggestion from @BlueSheepToken
from collections import Counter
words = [['a','b','a','a'],['c','c','c','d','d','d']]
counter = Counter(words[0])
for i in words[1:]:
counter.update(i)
counter.most_common()
output:
[('a', 3), ('c', 3), ('d', 3), ('b', 1)]
How to find all of the most common elements in a python list (order alphabetically in case of tie)?
Here is a possible solution (as discussed in the comments):
from collections import Counter
lst = # your list of characters
c = Counter(lst) # O(n)
largest = max(counts.values()) # O(n)
largest_with_ties = [k for k, v in counts.items() if v == largest] # O(n)
result = sorted(largest_with_ties)
Now, what's the complexity of sorted(largest_with_ties)
? One could say that it's O(nlogn) (because there could be n/2 ties). However, the number of characters in largest_with_ties
cannot be as large as the number of elements in lst
. And that's because there is a much smaller number of characters compared to the possible number of ints. In other words, lst
could potentially contain 10^20 numbers (just an example). But largest_with_ties
can only contain different characters, and the number of characters that can be represented (for example) with UTF8 is limited to more or less 10^6. Therefore, technically the complexity of this last operation is O(1). In general, we could say that it's O(nlogn) but with an upper limit of O(10^6log10^6).
How to find most common element in a list of list?
There are many ways, but I wanted to let you know that there are some nice tools for that kind of things in the standard modules, e.g. collections.Counter
:
In [1]: lst = [['1','2','3','4'],['1','1','1','1'],['1','2','3','4']]
In [2]: from collections import Counter
In [3]: from operator import itemgetter
In [4]: max((Counter(l).most_common(1)[0] for l in lst), key=itemgetter(1))[0]
Out[4]: '1'
Or, you could (kinda) employ your current solution for each of the sublists:
In [5]: max(((max(set(l), key=l.count), l) for l in lst),
...: key=lambda x: x[1].count(x[0]))[0]
Out[5]: '1'
How to get the most common element from a list in python
You can use collections.Counter
for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
From the docs:
How to find several most frequent elements in a list
Here is a solution that works for any kind of data, not only for positive integers in a range known beforehand.
We count using a collections.Counter
, extract the maximum count which is the count of the most_common number, then make a list of the numbers who have the same count:
from collections import Counter
numbers = [7, 1, 7, 9, 2, 9, 7, 3, 0, 9]
counts = Counter(numbers)
max_count = counts.most_common(1)[0][1]
out = [value for value, count in counts.most_common() if count == max_count]
print(out)
# [7, 9]
Finding the most frequent/common element in a collection?
I have to say that:
list.groupBy(identity).mapValues(_.size).maxBy(_._2)._1
Or just:
list.groupBy(identity).maxBy(_._2.size)._1
Doesn't really seem like that much work to me.
If you're worried about the overhead of building up the lists for each value when you only need counts, you could do the following:
list.foldLeft(Map.empty[Int, Int].withDefaultValue(0)) {
case (m, v) => m.updated(v, m(v) + 1)
}.maxBy(_._2)._1
Or even keep track of the maximum as you go, to avoid the extra traversal at the end:
list.foldLeft(
Map.empty[Int, Int].withDefaultValue(0), -1 -> Double.NegativeInfinity
) {
case ((m, (maxV, maxCount)), v) =>
val count = m(v) + 1
if (count > maxCount) (m.updated(v, count), v -> count)
else (m.updated(v, count), maxV -> maxCount)
}._2._1
This is obviously much less readable than the one-liners above, though, so I'd recommend sticking with them unless you can show (i.e., with benchmarking, not speculation) that they're a bottleneck in your application.
Related Topics
Python Subprocess.Call a Bash Alias
Download File Through Google Chrome in Headless Mode
Unsupported Operand Type(S) for +: 'Int' and 'Str'
What Is the Most Pythonic Way to Pop a Random Element from a List
Pandas Groupby and Select Rows with the Minimum Value in a Specific Column
Cleanest Way to Get Last Item from Python Iterator
How to Create a Custom Activation Function with Keras
How to Add an Integer to Each Element in a List
Absolute VS. Explicit Relative Import of Python Module
Python Os.Path.Join on Windows
Python: Count Repeated Elements in the List
Redirecting Stdout and Stderr to a Pyqt4 Qtextedit from a Secondary Thread
What Do the Python File Extensions, .Pyc .Pyd .Pyo Stand For
Simple Argparse Example Wanted: 1 Argument, 3 Results
Pandas Split Column into Multiple Columns by Comma
"Pythonic" Method to Parse a String of Comma-Separated Integers into a List of Integers