How to get value counts for multiple columns at once in Pandas DataFrame?
Just call apply
and pass pd.Series.value_counts
:
In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
a b c d
0 4 6 4 3
1 6 4 6 7
Python get value counts from multiple columns and average from another column
You can .melt
the dataframe then group
then melted frame on genre
and aggregate using a dictionary that specifies the columns and their corresponding aggregation functions:
# filter and melt the dataframe
m = df.filter(regex=r'Rating|Genre').melt('Rating', value_name='Genre')
# group and aggregate
dct = {'Value_Count': ('Genre', 'count'), 'Average_Rating': ('Rating', 'mean')}
df_out = m.groupby('Genre', as_index=False).agg(**dct)
>>> df_out
Genre Value_Count Average_Rating
0 Action 2 8.30
1 Adventure 3 7.20
2 Comedy 3 7.60
3 Family 2 6.65
4 Horror 3 8.40
Convert value counts of multiple columns to pandas dataframe
You can melt the data, then use pd.crosstab
:
melt = df.melt('Name')
pd.crosstab(melt['value'], melt['variable'], normalize='columns')
Or a bit faster (yet more verbose) with melt
and groupby().value_counts()
:
(df.melt('Name')
.groupby('variable')['value'].value_counts(normalize=True)
.unstack('variable', fill_value=0)
)
Output:
variable Batch CN DXYR Emp Lateral GDX MMT
value
0 0.50 0.5 0.25 0.25 0.25 0.50
1 0.25 0.0 0.75 0.25 0.25 0.25
2 0.25 0.5 0.00 0.50 0.50 0.25
Update: apply
also works:
df.drop(columns=['Name']).apply(pd.Series.value_counts, normalize=True)
count occurrence of a value in multiple columns of a dataframe Pandas
df.stack().value_counts()
C 3
A11 2
A12 2
D11 2
B11 1
E12 1
B 1
A 1
D12 1
E 1
if you need the names:
df.stack().value_counts().reset_index(name='count').rename({'index':'value'}, axis = 1)
value count
0 C 3
1 A11 2
2 A12 2
3 D11 2
4 B11 1
5 E12 1
6 B 1
7 A 1
8 D12 1
9 E 1
Python: get a frequency count based on two columns (variables) in pandas dataframe some row appears
You can use groupby's size
:
In [11]: df.groupby(["Group", "Size"]).size()
Out[11]:
Group Size
Moderate Medium 1
Small 1
Short Small 2
Tall Large 1
dtype: int64
In [12]: df.groupby(["Group", "Size"]).size().reset_index(name="Time")
Out[12]:
Group Size Time
0 Moderate Medium 1
1 Moderate Small 1
2 Short Small 2
3 Tall Large 1
Value counts of 2 columns in a pandas dataframe
Let's try with SeriesGroupBy.value_counts
and set normalize=True
to get the values as a percentage:
out = df.groupby('year')['operation'].value_counts(normalize=True)
out
:
year operation
2014 yes 0.666667
no 0.333333
2015 yes 0.666667
0.333333
Name: operation, dtype: float64
Can also set sort=False
to not sort with highest value per level 0:
out = df.groupby('year')['operation'].value_counts(normalize=True, sort=False)
out
:
year operation
2014 no 0.333333
yes 0.666667
2015 0.333333
yes 0.666667
Name: operation, dtype: float64
Series.reset_index
can be used with name=
set to create a DataFrame instead of a Series and give a name to the unnamed values column:
new_df = (
df.groupby('year')['operation'].value_counts(normalize=True)
.reset_index(name='freq')
)
year operation freq
0 2014 yes 0.666667
1 2014 no 0.333333
2 2015 yes 0.666667
3 2015 0.333333
DataFrame Used:
df = pd.DataFrame({'year': [2014, 2014, 2014, 2015, 2015, 2015],
'operation': ['yes', 'yes', 'no', '', 'yes', 'yes']})
pandas value_counts applied to each column
For the dataframe,
df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])
the following code
for c in df.columns:
print "---- %s ---" % c
print df[c].value_counts()
will produce the following result:
---- id ---
34 2
22 1
dtype: int64
---- temp ---
null 3
dtype: int64
---- name ---
mark 3
dtype: int64
Value counts by multi-column groupby
When you use value_counts
, you have the option to normalize the results. You can use this parameter, and then index the resulting DataFrame to only include the U
rows:
out = (df.groupby(['ID', 'Item'])
.Direction.value_counts(normalize=True)
.rename('ratio').reset_index())
out.loc[out.Direction.eq('U')]
ID Item Direction ratio
1 1 ball U 0.500000
2 1 box U 0.666667
6 2 box U 0.333333
Counting total number of occurrences in selected (multiple) columns in Pandas
Use DataFrame.melt
with GroupBy.size
:
cols = ['position_1', 'position_2', 'position_3']
df = df[cols].melt().groupby('value').size().reset_index(name='count')
print (df)
value count
0 abc 3
1 bbc 2
2 ccd 2
3 jbp 3
4 jkp 1
5 klp 1
6 kpd 1
7 mne 2
8 ppt 2
9 ytz 1
Related Topics
Using Perl, Python, or Ruby, How to Write a Program to "Click" on the Screen at Scheduled Time
Xcode 3.2 Ruby and Python Templates
What Does Blazeds Livecycle Data Services Do, That Something Like Pyamf or Rubyamf Not Do
Separate a Row of Strings into Separate Rows
Is There a Python Equivalent for Rspec to Do Tdd
Simple File Server to Serve Current Directory
Python, Ruby, Haskell - Do They Provide True Multithreading
Looking for Recommendation on How to Convert PDF into Structured Format
In Python Can One Implement Mixin Behavior Without Using Inheritance
Ruby Equivalent to Python's Help()
Is There a Function That Checks If a Character in a String Is a Letter in the Alphabet? (Swift)
Python Sockets Error Typeerror: a Bytes-Like Object Is Required, Not 'Str' with Send Function
Why Doesn't This Division Work in Python
Python MySQL Connector Database Query with %S Fails
Moving Balls in Tkinter Canvas
Merge Pandas Dataframes Where One Value Is Between Two Others