Python Pandas Count the Number of Occurances Inside Lists in a Column

Count occurrences of list item in dataframe column, grouped by another column

We can use DataFrame.explode + crosstab:

# If Not Already a List
# df['words'] = df['words'].str.split(', ')

new_df = df.explode('words')
new_df = pd.crosstab(
new_df['words'], new_df['category']
).reset_index().rename_axis(columns=None)

Or with groupby size + unstack after explode:

new_df = (
df.explode('words') # Explode List into Rows
.groupby(['words', 'category']).size() # Calculate Group Sizes
.unstack(fill_value=0) # Convert Category values to column names
.reset_index().rename_axis(columns=None) # Cleanup
)

or DataFrame.value_counts + unstack after explode:

new_df = (
df.explode('words') # Explode List into Rows
.value_counts() # Count Value Pairs
.unstack(level='category', # Convert Category values to column names
fill_value=0)
.reset_index().rename_axis(columns=None) # Cleanup
)

new_df:

      words  1  2  3
0 cat 3 1 0
1 dog 2 1 0
2 elephant 0 1 2
3 mouse 1 1 0

Setup:

import pandas as pd

df = pd.DataFrame({
'words': [['cat', 'dog', 'dog'], ['cat', 'cat', 'mouse'],
['mouse', 'cat', 'dog', 'elephant'], ['elephant', 'elephant']],
'category': [1, 1, 2, 3]
})

How to count occurrences of values of a list in a column of a different dataframe?

One quick fix reindex

df.Column.value_counts().reindex(list,fill_value=0)
HIGH 3
MEDIUM 0
LOW 4
Name: Column, dtype: int64

Another way pd.Categorical

pd.Categorical(df.Column,list).value_counts()
HIGH 3
MEDIUM 0
LOW 4
dtype: int64

Count strings occurrence in a pandas column which is a list

You can use get_dummies() with reindex over axis=1 and sum() and series.to_dict():

df['names'].str.join('|').str.get_dummies().reindex(columns=lst).sum().to_dict()

{'credits received': 1.0, 'points': 2.0, 'rewards': 0.0}

Faster way to count total occurrences of values in a column of lists in pandas?

Use list comprehension with flattening instead sum:

test = pd.Series([x for item in data.SPLIT for x in item]).value_counts()

Or flatten by chain.from_iterable:

from itertools import chain

test = pd.Series(list(chain.from_iterable(data.SPLIT))).value_counts()

Or use also collections.Counter:

from itertools import chain
from collections import Counter

test = pd.Series(Counter(chain.from_iterable(data.SPLIT)))

Or:

import functools, operator

test = pd.Series(functools.reduce(operator.iconcat, data.SPLIT, [])).value_counts()

Pure pandas solution:

test = pd.DataFrame(data.SPLIT.values.tolist()).stack().value_counts()

python, count unique list values of a list inside a data frame

Set the index of dataframe as age, then use Series.explode on column what colours do you like?' then use groupby on level=0 and aggregate the series using value_counts:

df1 = (
df.set_index('age')['what colours do you like?'].explode()
.rename('color').groupby(level=0).value_counts().reset_index(name='count')
)

Result:

print(df1)
age color count
0 18-25 yellow 2
1 18-25 green 1
2 18-25 orange 1
3 26-30 blue 2
4 26-30 red 2
5 26-30 green 1
6 26-30 orange 1


Related Topics



Leave a reply



Submit