How to Get Value Counts for Multiple Columns at Once in Pandas Dataframe

How to get value counts for multiple columns at once in Pandas DataFrame?

Just call apply and pass pd.Series.value_counts:

In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
   a  b  c  d
0  4  6  4  3
1  6  4  6  7

Python get value counts from multiple columns and average from another column

You can .melt the dataframe then group then melted frame on genre and aggregate using a dictionary that specifies the columns and their corresponding aggregation functions:

# filter and melt the dataframe
m = df.filter(regex=r'Rating|Genre').melt('Rating', value_name='Genre')

# group and aggregate
dct = {'Value_Count': ('Genre', 'count'), 'Average_Rating': ('Rating', 'mean')}
df_out = m.groupby('Genre', as_index=False).agg(**dct)

>>> df_out

       Genre  Value_Count  Average_Rating
0     Action            2            8.30
1  Adventure            3            7.20
2     Comedy            3            7.60
3     Family            2            6.65
4     Horror            3            8.40

Convert value counts of multiple columns to pandas dataframe

You can melt the data, then use pd.crosstab:

melt = df.melt('Name')
pd.crosstab(melt['value'], melt['variable'], normalize='columns')

Or a bit faster (yet more verbose) with melt and groupby().value_counts():

(df.melt('Name')
   .groupby('variable')['value'].value_counts(normalize=True)
   .unstack('variable', fill_value=0)
)

Output:

variable  Batch   CN  DXYR  Emp Lateral   GDX   MMT
value                                              
0          0.50  0.5  0.25         0.25  0.25  0.50
1          0.25  0.0  0.75         0.25  0.25  0.25
2          0.25  0.5  0.00         0.50  0.50  0.25

Update: apply also works:

df.drop(columns=['Name']).apply(pd.Series.value_counts, normalize=True)

count occurrence of a value in multiple columns of a dataframe Pandas

df.stack().value_counts()

C      3
A11    2
A12    2
D11    2
B11    1
E12    1
B      1
A      1
D12    1
E      1

if you need the names:

df.stack().value_counts().reset_index(name='count').rename({'index':'value'}, axis = 1)
 
  value  count
0     C      3
1   A11      2
2   A12      2
3   D11      2
4   B11      1
5   E12      1
6     B      1
7     A      1
8   D12      1
9     E      1

Python: get a frequency count based on two columns (variables) in pandas dataframe some row appears

You can use groupby's size:

In [11]: df.groupby(["Group", "Size"]).size()
Out[11]:
Group     Size
Moderate  Medium    1
          Small     1
Short     Small     2
Tall      Large     1
dtype: int64

In [12]: df.groupby(["Group", "Size"]).size().reset_index(name="Time")
Out[12]:
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1

Value counts of 2 columns in a pandas dataframe

Let's try with SeriesGroupBy.value_counts and set normalize=True to get the values as a percentage:

out = df.groupby('year')['operation'].value_counts(normalize=True)

out:

year  operation
2014  yes          0.666667
      no           0.333333
2015  yes          0.666667
                   0.333333
Name: operation, dtype: float64

Can also set sort=False to not sort with highest value per level 0:

out = df.groupby('year')['operation'].value_counts(normalize=True, sort=False)

out:

year  operation
2014  no           0.333333
      yes          0.666667
2015               0.333333
      yes          0.666667
Name: operation, dtype: float64

Series.reset_index can be used with name= set to create a DataFrame instead of a Series and give a name to the unnamed values column:

new_df = (
    df.groupby('year')['operation'].value_counts(normalize=True)
        .reset_index(name='freq')
)

   year operation      freq
0  2014       yes  0.666667
1  2014        no  0.333333
2  2015       yes  0.666667
3  2015            0.333333

DataFrame Used:

df = pd.DataFrame({'year': [2014, 2014, 2014, 2015, 2015, 2015],
                   'operation': ['yes', 'yes', 'no', '', 'yes', 'yes']})

pandas value_counts applied to each column

For the dataframe,

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])

the following code

for c in df.columns:
    print "---- %s ---" % c
    print df[c].value_counts()

will produce the following result:

---- id ---
34    2
22    1
dtype: int64
---- temp ---
null    3
dtype: int64
---- name ---
mark    3
dtype: int64

Value counts by multi-column groupby

When you use value_counts, you have the option to normalize the results. You can use this parameter, and then index the resulting DataFrame to only include the U rows:

out = (df.groupby(['ID', 'Item'])
          .Direction.value_counts(normalize=True)
          .rename('ratio').reset_index())

out.loc[out.Direction.eq('U')]

   ID  Item Direction     ratio
1   1  ball         U  0.500000
2   1   box         U  0.666667
6   2   box         U  0.333333

Counting total number of occurrences in selected (multiple) columns in Pandas

Use DataFrame.melt with GroupBy.size:

cols = ['position_1', 'position_2', 'position_3'] 
df = df[cols].melt().groupby('value').size().reset_index(name='count')
print (df)
  value  count
0   abc      3
1   bbc      2
2   ccd      2
3   jbp      3
4   jkp      1
5   klp      1
6   kpd      1
7   mne      2
8   ppt      2
9   ytz      1

How to Get Value Counts for Multiple Columns at Once in Pandas Dataframe