Item Frequency Count in Python

How to count the frequency of the elements in an unordered list?

If the list is sorted, you can use groupby from the itertools standard library (if it isn't, you can just sort it first, although this takes O(n lg n) time):

from itertools import groupby

a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
[len(list(group)) for key, group in groupby(sorted(a))]

Output:

[4, 4, 2, 1, 2]

Item frequency count in Python

The Counter class in the collections module is purpose built to solve this type of problem:

from collections import Counter
words = "apple banana apple strawberry banana lemon"
Counter(words.split())
# Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})

Count frequency of combinations of elements in python

Interesting problem! I am using itertools.combinations() to generate all possible combinations and collections.Counter() to count for every combination how often it appears:

import pandas as pd
import itertools
from collections import Counter

# create sample data
df = pd.DataFrame([
['detergent', np.nan],
['bread', 'water', None],
['bread', 'umbrella', 'milk', 'diaper', 'beer'],
['umbrella', 'water'],
['water', 'umbrella'],
['umbrella', 'water']
])

def get_all_combinations_without_nan_or_None(row):
# remove nan, None and double values
set_without_nan = {value for value in row if isinstance(value, str)}

# generate all possible combinations of the values in a row
all_combinations = []
for i in range(1, len(set_without_nan)+1):
result = list(itertools.combinations(set_without_nan, i))
all_combinations.extend(result)

return all_combinations

# get all posssible combinations of values in a row
all_rows = df.apply(get_all_combinations_without_nan_or_None, 1).values
all_rows_flatten = list(itertools.chain.from_iterable(all_rows))

# use Counter to count how many there are of each combination
count_combinations = Counter(all_rows_flatten)

Docs on collections.Counter():
https://docs.python.org/2/library/collections.html#collections.Counter

Docs on itertools.combinations():

https://docs.python.org/2/library/itertools.html#itertools.combinations

Counting the frequency of the first element of a list within a list

Just split your task into a two parts:

  1. Retrieve first element from each inner list;
  2. Sort elements in descending order by number of occurrences.

You can easily find an answer of each of this problem on Stack Overflow:

  1. Extract first item of each sublist;
  2. Sorting a List by frequency of occurrence in a list.

Even question you've additionally answered in comments already have been answered: How to access the first element or number in counter using Python.

You should make some research efforts before asking question otherwise you waste both yours time and time of users who will answer your question.

Anyway, let's go back to your problem.

  1. To get first value of inner list you should iterate over outer list and retrieve first element from every item. With simple for loop it will look like this:

    source_list = [[4, 2, 1, 3], [4, 3, 1, 2], [4, 3, 1, 2], 
    [1, 3, 4, 2], [2, 3, 4, 1], [2, 1, 3, 4]]
    first_items_list = []
    for inner_list in source_list:
    first_items_list.append(inner_list[0])

    You can also use list comprehension:

    source_list = [[4, 2, 1, 3], [4, 3, 1, 2], [4, 3, 1, 2], 
    [1, 3, 4, 2], [2, 3, 4, 1], [2, 1, 3, 4]]
    first_items_list = [inner_list[0] for inner_list in source_list]
  2. Now we have to find max element in list of first items by frequency. To not reinvent the wheel you can apply collections.Counter and specifically it's Counter.most_common() method.

    To initialize Counter you need to pass first_items_list generated in code above to Counter() constructor. Then you need to call Counter.most_common() and pass 1 to arguments as you need just most common element:

    from collections import Counter
    ...
    counter = Counter(first_items_list)
    item, number_of_occurrences = counter.most_common(1)[0]

To simplify the code you can change list comprehension to generator expression and pass it directly into Counter constructor:

source = [[4, 2, 1, 3], [4, 3, 1, 2], [4, 3, 1, 2], 
[1, 3, 4, 2], [2, 3, 4, 1], [2, 1, 3, 4]]
print("Most common:", Counter(inner[0] for inner in source).most_common(1)[0][0])

How do I count the occurrences of a list item?

If you only want a single item's count, use the count method:

>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3


Important: this is very slow if you are counting multiple different items

Each count call goes over the entire list of n elements. Calling count in a loop n times means n * n total checks, which can be catastrophic for performance.

If you want to count multiple items, use Counter, which only does n total checks.

Frequency or Count of a list of sets in Python

If you convert the sets into tuples, you can then use a Counter directly on the input data. You can then use a list comprehension to convert the Counter into the format you desire:

from collections import Counter

my_test_list = [{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'DOHA', 'JAKARTA'},{'DOHA', 'ROME'},{'MAURITIUS','ROME'},{'MAURITIUS', 'ROME'},{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'JAKARTA', 'ROME'}, {'DOHA', 'ROME'},{'NEW YORK NY', 'WASHINGTON, DC'},{'ACCRA', 'WASHINGTON, DC'}]

counts = Counter(tuple(s) for s in my_test_list)

result = [k + ({ 'frequency' : v },) for k, v in counts.items()]
print(result)

Output:

[
('DOHA', 'ROME', {'frequency': 4}),
('DOHA', 'JAKARTA', {'frequency': 3}),
('ROME', 'MAURITIUS', {'frequency': 2}),
('ROME', 'JAKARTA', {'frequency': 1}),
('WASHINGTON, DC', 'NEW YORK NY', {'frequency': 1}),
('WASHINGTON, DC', 'ACCRA', {'frequency': 1})
]

Count frequency of itemsets in the given data frame

This function will returns a dictionary which contains the occurrences of the tuple's count in the entire rows of the data frame.

from collections import defaultdict
def count(df, sequence):
dict_data = defaultdict(int)
shape = df.shape[0]
for items in sequence:
for row in range(shape):
dict_data[items] += all([item in df.iloc[row, :].values for item in items])
return dict_data

You can pass in the data frame and the set to the count() function and it will return the occurrences of the tuples in the entire rows of the data frame for you i.e

>>> count(data, itemsets)
defaultdict(<class 'int'>, {(39, 205): 2})

And you can easily change it from defaultdict to dictionary by using the dict() method i.e.

>>> dict(count(data, itemsets))
{(39, 205): 2}

But both of them still works the same.

How to count frequency of a element in numpy array?

Use numpy.unique with return_counts=True parameter, which will return the count of each of the elements in the array.

# sample array
In [89]: np.random.seed(23)
In [90]: arr = np.random.randint(0, 10, 20)

In [92]: a, cnts = np.unique(arr, return_counts=True)
In [94]: high_freq, high_freq_element = cnts.max(), a[cnts.argmax()]

In [95]: high_freq, high_freq_element
Out[95]: (4, 9)

For selecting only the elements which appear above a certain frequency threshold, you can use:

In [96]: threshold = 2

# select elements which occur only more than 2 times
In [97]: a[cnts > threshold]
Out[97]: array([3, 5, 6, 9])

Frequency counts for unique values in a NumPy array

Take a look at np.bincount:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]

And then:

zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

or:

np.vstack((ii,y[ii])).T
# array([[ 1, 5],
[ 2, 3],
[ 5, 1],
[25, 1]])

or however you want to combine the counts and the unique values.

How do to Count the frequency of an item in a list?

Here's a solution:

from collections import Counter

score = 0
hand = [1,2,4,3,5,6,1,1,1]

counts = Counter(hand)

for num, count in counts.items():
if count >= 4:
hand = list(filter((num).__ne__, hand))
score += 1

print(hand)
print(score)

And the output is:

[2, 4, 3, 5, 6]
1


Related Topics



Leave a reply



Submit