Count occurrences of list item in dataframe column, grouped by another column
We can use DataFrame.explode
+ crosstab
:
# If Not Already a List
# df['words'] = df['words'].str.split(', ')
new_df = df.explode('words')
new_df = pd.crosstab(
new_df['words'], new_df['category']
).reset_index().rename_axis(columns=None)
Or with groupby size
+ unstack
after explode
:
new_df = (
df.explode('words') # Explode List into Rows
.groupby(['words', 'category']).size() # Calculate Group Sizes
.unstack(fill_value=0) # Convert Category values to column names
.reset_index().rename_axis(columns=None) # Cleanup
)
or DataFrame.value_counts
+ unstack
after explode
:
new_df = (
df.explode('words') # Explode List into Rows
.value_counts() # Count Value Pairs
.unstack(level='category', # Convert Category values to column names
fill_value=0)
.reset_index().rename_axis(columns=None) # Cleanup
)
new_df
:
words 1 2 3
0 cat 3 1 0
1 dog 2 1 0
2 elephant 0 1 2
3 mouse 1 1 0
Setup:
import pandas as pd
df = pd.DataFrame({
'words': [['cat', 'dog', 'dog'], ['cat', 'cat', 'mouse'],
['mouse', 'cat', 'dog', 'elephant'], ['elephant', 'elephant']],
'category': [1, 1, 2, 3]
})
How to count occurrences of values of a list in a column of a different dataframe?
One quick fix reindex
df.Column.value_counts().reindex(list,fill_value=0)
HIGH 3
MEDIUM 0
LOW 4
Name: Column, dtype: int64
Another way pd.Categorical
pd.Categorical(df.Column,list).value_counts()
HIGH 3
MEDIUM 0
LOW 4
dtype: int64
Count strings occurrence in a pandas column which is a list
You can use get_dummies()
with reindex
over axis=1
and sum()
and series.to_dict()
:
df['names'].str.join('|').str.get_dummies().reindex(columns=lst).sum().to_dict()
{'credits received': 1.0, 'points': 2.0, 'rewards': 0.0}
Faster way to count total occurrences of values in a column of lists in pandas?
Use list comprehension with flattening instead sum
:
test = pd.Series([x for item in data.SPLIT for x in item]).value_counts()
Or flatten by chain.from_iterable
:
from itertools import chain
test = pd.Series(list(chain.from_iterable(data.SPLIT))).value_counts()
Or use also collections.Counter
:
from itertools import chain
from collections import Counter
test = pd.Series(Counter(chain.from_iterable(data.SPLIT)))
Or:
import functools, operator
test = pd.Series(functools.reduce(operator.iconcat, data.SPLIT, [])).value_counts()
Pure pandas solution:
test = pd.DataFrame(data.SPLIT.values.tolist()).stack().value_counts()
python, count unique list values of a list inside a data frame
Set the index of dataframe as age
, then use Series.explode
on column what colours do you like?'
then use groupby
on level=0
and aggregate the series using value_counts
:
df1 = (
df.set_index('age')['what colours do you like?'].explode()
.rename('color').groupby(level=0).value_counts().reset_index(name='count')
)
Result:
print(df1)
age color count
0 18-25 yellow 2
1 18-25 green 1
2 18-25 orange 1
3 26-30 blue 2
4 26-30 red 2
5 26-30 green 1
6 26-30 orange 1
Related Topics
How to Create Multiple Data Frames Using a for Loop in Python
Python How to Remove Escape Characters from a String
Pip Install Pandas: Installing Dependencies Error
Django Login - Missing 1 Required Positional Argument
Python Comparing List Values to Keys in List of Dicts
Checking If a Pair of Value Is Inside a 2D Array Python
Why Calling .Sort() Function on Pandas Series Sorts Its Values In-Place and Returns Nothing
Algorithm for Distributing a Number Between Certain Number of Chunks
Reading a CSV That Sometimes Contain Multiple Whitespaces
How to Read a File Without Newlines
How to Locate Elements on Webpage With Headless Chrome
How to Match a Newline Character in a Raw String
Python | Make the Percentage of a List
Formal and Actual Parameters in a Function in Python
How to Merge Two Cnn That Are Trained Over Different Data Stream