How to Find Most Common Elements of a List

Find the most common element in a list

With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:

import itertools
import operator

def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]

This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:

print most_common(['goose', 'duck', 'duck', 'goose'])

emits:

SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose

As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).

groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].

Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.

This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:

def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]

same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)

Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).

from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

How to find most common elements of a list?

If you are using an earlier version of Python or you have a very good reason to roll your own word counter (I'd like to hear it!), you could try the following approach using a dict.

Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> word_counter = {}
>>> for word in word_list:
... if word in word_counter:
... word_counter[word] += 1
... else:
... word_counter[word] = 1
...
>>> popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
>>>
>>> top_3 = popular_words[:3]
>>>
>>> top_3
['Jellicle', 'Cats', 'and']

Top Tip: The interactive Python interpretor is your friend whenever you want to play with an algorithm like this. Just type it in and watch it go, inspecting elements along the way.

Finding the most common element in a list of lists

Alternative solution without using the Counter method from collections:

def get_freq_tuple(data):
counts = {}
for pairs in data:
for pair in pairs:
counts[pair] = counts.get(pair, 0) + 1

return [pair for pair in counts if counts[pair] == max(counts.values())]

if __name__ == "__main__":
pairs = [[(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (3, 2), (3, 1), (2, 1),
(3, 1), (3, 2), (3, 3), (3, 2), (2, 2)],
[(2, 2), (2, 1)],
[(1, 1), (1, 2), (2, 2), (2, 1)]]
print(get_freq_tuple(pairs))

Output:

[(2, 2)]

Explanation:

  • Count the occurrences of each tuple and store them in a dictionary. The key of the dictionary is the tuple and the value is the occurrence.
  • Filter the tuples in the dictionary by maximum occurrences of the tuples.

Disclaimer:

  • Using Counter method from collections is much more efficient.

References:

  • Python dictionary

Find the most common element in list of lists

Apply Counter().update() option on the elements of your list,
Based on suggestion from @BlueSheepToken

from collections import Counter

words = [['a','b','a','a'],['c','c','c','d','d','d']]
counter = Counter(words[0])
for i in words[1:]:
counter.update(i)

counter.most_common()

output:

[('a', 3), ('c', 3), ('d', 3), ('b', 1)]

How to find all of the most common elements in a python list (order alphabetically in case of tie)?

Here is a possible solution (as discussed in the comments):

from collections import Counter

lst = # your list of characters
c = Counter(lst) # O(n)
largest = max(counts.values()) # O(n)
largest_with_ties = [k for k, v in counts.items() if v == largest] # O(n)
result = sorted(largest_with_ties)

Now, what's the complexity of sorted(largest_with_ties)? One could say that it's O(nlogn) (because there could be n/2 ties). However, the number of characters in largest_with_ties cannot be as large as the number of elements in lst. And that's because there is a much smaller number of characters compared to the possible number of ints. In other words, lst could potentially contain 10^20 numbers (just an example). But largest_with_ties can only contain different characters, and the number of characters that can be represented (for example) with UTF8 is limited to more or less 10^6. Therefore, technically the complexity of this last operation is O(1). In general, we could say that it's O(nlogn) but with an upper limit of O(10^6log10^6).

How to find most common element in a list of list?

There are many ways, but I wanted to let you know that there are some nice tools for that kind of things in the standard modules, e.g. collections.Counter:

In [1]: lst = [['1','2','3','4'],['1','1','1','1'],['1','2','3','4']]
In [2]: from collections import Counter
In [3]: from operator import itemgetter
In [4]: max((Counter(l).most_common(1)[0] for l in lst), key=itemgetter(1))[0]
Out[4]: '1'

Or, you could (kinda) employ your current solution for each of the sublists:

In [5]: max(((max(set(l), key=l.count), l) for l in lst),
...: key=lambda x: x[1].count(x[0]))[0]
Out[5]: '1'

How to get the most common element from a list in python

You can use collections.Counter for this:

from collections import Counter

a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]

c = Counter(a)

print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'

From the docs:

Sample Image
Sample Image

How to find several most frequent elements in a list

Here is a solution that works for any kind of data, not only for positive integers in a range known beforehand.

We count using a collections.Counter, extract the maximum count which is the count of the most_common number, then make a list of the numbers who have the same count:

from collections import Counter

numbers = [7, 1, 7, 9, 2, 9, 7, 3, 0, 9]
counts = Counter(numbers)
max_count = counts.most_common(1)[0][1]
out = [value for value, count in counts.most_common() if count == max_count]
print(out)
# [7, 9]

Finding the most frequent/common element in a collection?

I have to say that:

list.groupBy(identity).mapValues(_.size).maxBy(_._2)._1

Or just:

list.groupBy(identity).maxBy(_._2.size)._1

Doesn't really seem like that much work to me.

If you're worried about the overhead of building up the lists for each value when you only need counts, you could do the following:

list.foldLeft(Map.empty[Int, Int].withDefaultValue(0)) {
case (m, v) => m.updated(v, m(v) + 1)
}.maxBy(_._2)._1

Or even keep track of the maximum as you go, to avoid the extra traversal at the end:

list.foldLeft(
Map.empty[Int, Int].withDefaultValue(0), -1 -> Double.NegativeInfinity
) {
case ((m, (maxV, maxCount)), v) =>
val count = m(v) + 1
if (count > maxCount) (m.updated(v, count), v -> count)
else (m.updated(v, count), maxV -> maxCount)
}._2._1

This is obviously much less readable than the one-liners above, though, so I'd recommend sticking with them unless you can show (i.e., with benchmarking, not speculation) that they're a bottleneck in your application.



Related Topics



Leave a reply



Submit