How to Get Value Counts for Multiple Columns at Once in Pandas Dataframe

How to get value counts for multiple columns at once in Pandas DataFrame?

Just call apply and pass pd.Series.value_counts:

In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
a b c d
0 4 6 4 3
1 6 4 6 7

Python get value counts from multiple columns and average from another column

You can .melt the dataframe then group then melted frame on genre and aggregate using a dictionary that specifies the columns and their corresponding aggregation functions:

# filter and melt the dataframe
m = df.filter(regex=r'Rating|Genre').melt('Rating', value_name='Genre')

# group and aggregate
dct = {'Value_Count': ('Genre', 'count'), 'Average_Rating': ('Rating', 'mean')}
df_out = m.groupby('Genre', as_index=False).agg(**dct)


>>> df_out

Genre Value_Count Average_Rating
0 Action 2 8.30
1 Adventure 3 7.20
2 Comedy 3 7.60
3 Family 2 6.65
4 Horror 3 8.40

Convert value counts of multiple columns to pandas dataframe

You can melt the data, then use pd.crosstab:

melt = df.melt('Name')
pd.crosstab(melt['value'], melt['variable'], normalize='columns')

Or a bit faster (yet more verbose) with melt and groupby().value_counts():

(df.melt('Name')
.groupby('variable')['value'].value_counts(normalize=True)
.unstack('variable', fill_value=0)
)

Output:

variable  Batch   CN  DXYR  Emp Lateral   GDX   MMT
value
0 0.50 0.5 0.25 0.25 0.25 0.50
1 0.25 0.0 0.75 0.25 0.25 0.25
2 0.25 0.5 0.00 0.50 0.50 0.25

Update: apply also works:

df.drop(columns=['Name']).apply(pd.Series.value_counts, normalize=True)

count occurrence of a value in multiple columns of a dataframe Pandas

df.stack().value_counts()

C 3
A11 2
A12 2
D11 2
B11 1
E12 1
B 1
A 1
D12 1
E 1

if you need the names:

df.stack().value_counts().reset_index(name='count').rename({'index':'value'}, axis = 1)

value count
0 C 3
1 A11 2
2 A12 2
3 D11 2
4 B11 1
5 E12 1
6 B 1
7 A 1
8 D12 1
9 E 1

Python: get a frequency count based on two columns (variables) in pandas dataframe some row appears

You can use groupby's size:

In [11]: df.groupby(["Group", "Size"]).size()
Out[11]:
Group Size
Moderate Medium 1
Small 1
Short Small 2
Tall Large 1
dtype: int64

In [12]: df.groupby(["Group", "Size"]).size().reset_index(name="Time")
Out[12]:
Group Size Time
0 Moderate Medium 1
1 Moderate Small 1
2 Short Small 2
3 Tall Large 1

Value counts of 2 columns in a pandas dataframe

Let's try with SeriesGroupBy.value_counts and set normalize=True to get the values as a percentage:

out = df.groupby('year')['operation'].value_counts(normalize=True)

out:

year  operation
2014 yes 0.666667
no 0.333333
2015 yes 0.666667
0.333333
Name: operation, dtype: float64

Can also set sort=False to not sort with highest value per level 0:

out = df.groupby('year')['operation'].value_counts(normalize=True, sort=False)

out:

year  operation
2014 no 0.333333
yes 0.666667
2015 0.333333
yes 0.666667
Name: operation, dtype: float64

Series.reset_index can be used with name= set to create a DataFrame instead of a Series and give a name to the unnamed values column:

new_df = (
df.groupby('year')['operation'].value_counts(normalize=True)
.reset_index(name='freq')
)
   year operation      freq
0 2014 yes 0.666667
1 2014 no 0.333333
2 2015 yes 0.666667
3 2015 0.333333

DataFrame Used:

df = pd.DataFrame({'year': [2014, 2014, 2014, 2015, 2015, 2015],
'operation': ['yes', 'yes', 'no', '', 'yes', 'yes']})

pandas value_counts applied to each column

For the dataframe,

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3]) 

the following code

for c in df.columns:
print "---- %s ---" % c
print df[c].value_counts()

will produce the following result:

---- id ---
34 2
22 1
dtype: int64
---- temp ---
null 3
dtype: int64
---- name ---
mark 3
dtype: int64

Value counts by multi-column groupby

When you use value_counts, you have the option to normalize the results. You can use this parameter, and then index the resulting DataFrame to only include the U rows:

out = (df.groupby(['ID', 'Item'])
.Direction.value_counts(normalize=True)
.rename('ratio').reset_index())

out.loc[out.Direction.eq('U')]

   ID  Item Direction     ratio
1 1 ball U 0.500000
2 1 box U 0.666667
6 2 box U 0.333333

Counting total number of occurrences in selected (multiple) columns in Pandas

Use DataFrame.melt with GroupBy.size:

cols = ['position_1', 'position_2', 'position_3'] 
df = df[cols].melt().groupby('value').size().reset_index(name='count')
print (df)
value count
0 abc 3
1 bbc 2
2 ccd 2
3 jbp 3
4 jkp 1
5 klp 1
6 kpd 1
7 mne 2
8 ppt 2
9 ytz 1


Related Topics



Leave a reply



Submit