Pandas Get Frequency of Item Occurrences in a Column as Percentage

Pandas get frequency of item occurrences in a column as percentage

Use value_counts with normalize=True:

df['gender'].value_counts(normalize=True) * 100

The result is a fraction in range (0, 1]. We multiply by 100 here in order to get the %.

How to get the frequency of occurrence of a column as a percent

You can count the unique values with value_counts and use the normalize parameter:

df['houses'].value_counts(normalize=True) * 100

ranch 50.000000
condo 33.333333
townhome 16.666667
Name: houses, dtype: float64

Edit: to convert to a DataFrame:

(df['houses'].value_counts(normalize=True) * 100).to_frame()

Given a pandas Series that represents frequencies of a value, how can I turn those frequencies into percentages?

This function is implemented in pandas, actually even in value_counts(). No need to calculate :)

just type:

df.sex.value_counts(normalize=True)

which gives exactly the desired output.

Please note that value_counts() excludes NA values, so numbers might not add up to 1.
See here: http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.Series.value_counts.html
(A column of a DataFrame is a Series)

Get count of values in a column and show their percentage in a plot

Assume this is your DataFrame:

data = {'labels': ["A-F", "G-L", "M-R", "S-Z"], 'count':[1882, 3096, 3830, 1017]}
df = pd.DataFrame.from_dict(data)

print(df)

labels count
0 A-F 1882
1 G-L 3096
2 M-R 3830
3 S-Z 1017

Now you have to calculate the percentage of each row:

df['percentage'] = (df['count'] / df['count'].sum()) * 100

print(df)

labels count percentage
0 A-F 1882 19.155216
1 G-L 3096 31.511450
2 M-R 3830 38.982188
3 S-Z 1017 10.351145

and then plot the labels vs the percentage using the df.plot() function and specifying its kind, which I assume is a barplot.

df.plot(kind='bar', x='labels', y='percentage')

This will produce the following plot:
enter image description here

Edit:
The value_counts() method returns a pd.Series() object. To plot it you can run the following line:

df.range.value_counts(normalize=True).plot(kind='bar')

How to get the percentage of two values of a column, given every row has either if those two values

You can do this for all unique values in a column at once by using the value_counts function to get the count of each unique value, then divide by the count to get the percentage for each one.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Carol'],
'Id': [1, 2, 3],
'Sex': ['F', 'M', 'F'],
'Age': [30, 32, 34],
'City': ['NY', 'LA', 'Chicago']
}

df = pd.DataFrame(data)

pcts = df['Sex'].value_counts() / df['Sex'].count() * 100

print('% M', pcts['M'])
print('% F', pcts['F'])

Calculating the percentage of a category in Pandas

Try:

promoted.department.value_counts()/train.department.value_counts()*100

It should give you the desired output:

Sales & Marketing    7.2030
Operations 9.0148
Technology 10.7593
..... ...
Name: department, dtype: int64


Related Topics



Leave a reply



Submit