Find Most Frequent Combination of Values in a Data.Frame

Finding the most frequent combination in DataFrame

You can group by the two columns together and count the number of occurrences of each pair, then sort the pairs by this count.

The following code does the job:

df.groupby(["From", "To"]).size().sort_values(ascending=False)

and, for the example of the question, it returns:

From        To
-----------------------
Home Office 3
Restaurant Office 1
Airport Home 1

Python: How to find most frequent combination of elements?

Use custom function all_subsets, then flatten values by Series.explode and last use Series.value_counts:

from itertools import chain, combinations

#https://stackoverflow.com/a/5898031
#only converted to list and removed empty tuples by range(1,...
def all_subsets(ss):
return list(chain(*map(lambda x: combinations(ss, x), range(1, len(ss)+1))))

s = df.groupby('id')['code'].apply(all_subsets).explode().value_counts()
print (s)
(2,) 3
(2, 5) 3
(5,) 3
(1, 2) 2
(3, 6) 2
..
(1, 5, 8) 1
(9,) 1
(1, 3, 4, 6) 1
(5, 8, 9) 1
(4, 6) 1

How to list the most frequent combination of column that contain data

It think this is what you want. Instead of returning a list of columns, this returns a list or lists of columns, to account for instances where there is a tie for the 'best' number of non-NA rows.

import pandas as pd
from itertools import combinations
from math import nan

def best_combinations(df, n_cols):
best_cols = []
best_length = 0
for cols in combinations(df.columns, n_cols):
subdf = df.loc[:, list(cols)].dropna()
if len(subdf) > best_length:
best_length = len(subdf)
best_cols = [cols]
elif (len(subdf) == best_length) and (best_length > 0):
best_cols.append(cols)
return best_cols, best_length

On your dataframe:

df = pd.DataFrame({
'A': {0: '2', 1: '4', 2: '3', 3: '4', 4: '6', 5: nan, 6: nan},
'B': {0: '6', 1: '5', 2: '4', 3: '5', 4: '7', 5: nan, 6: nan},
'C': {0: '3', 1: '6', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan},
'D': {0: '7', 1: '7', 2: nan, 3: nan, 4: nan, 5: '5', 6: '7'},
'E': {0: '7', 1: '5', 2: nan, 3: nan, 4: nan, 5: '6', 6: '5'},
'F': {0: '3', 1: '4', 2: nan, 3: nan, 4: nan, 5: '7', 6: '8'}}
)

best_combinations(df, 2)
# returns:
[('A', 'B')], 5

best_combinations(df, 3)
[('D', 'E', 'F')], 4

Find most frequent combination of values in a data.frame

Here's an approach with data.table:

dt <- data.table(dat)
setkeyv(dt, names(dt))
dt[, .N, by = key(dt)]
dt[, .N, by = key(dt)][N == max(N)]
# age sex bmi N
# 1: 55 1 25 2

And an approach with base R:

x <- data.frame(table(dat))
x[x$Freq == max(x$Freq), ]
# age sex bmi Freq
# 11 55 1 25 2

I don't know how well either of these scale though, particularly if the number of combinations is going to be large. So, test back and report!


Replace x$Freq == max(x$Freq) with which.max(x$Freq) and N == max(N) with which.max(N) if you are really just interested in one row of results.

Pull most frequent combination of 2 columns from panda dataframe by count

Your error is being caused by the [0] at the end of your line where you do the groupby. You didn't post the full error message, but I would bet you have a KeyError: 0. That is due to you no longer having a 0 in your index. If you take a look at the DataFrame created after the groupby you'll see that you now have a hierarchical index, created from the unique value combinations of column1 and column2.

The quick solution? Replace [0] with .iloc[0] to get the row in the zero-index-location.

output = df.groupby(['column1','column2']).count().sort_values(by=['column1','column2'], axis = 0).iloc[0]

Or use .head(1), to get the top row of the DataFrame.

Counting most common combination of values in dataframe column

Use itertools.combinations, explode and value_counts

import itertools

(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())

Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64

Or:

import itertools

(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())

Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64

Python - pandas - find most frequent combination with tie-resolution - performance

Let us try groupby with transform , then get the count of most common value, then sort_values with drop_duplicates

df['help'] = df.groupby(['id','string_col_A','string_col_B'])['string_col_A'].transform('count')
out = df.sort_values(['help','creation_date'],na_position='first').drop_duplicates('id',keep='last').drop(['help','creation_date'],1)
out
Out[122]:
id string_col_A string_col_B
3 x21ab STR_X4 STR_Y4
5 x11aa STR_X3 STR_Y3
0 x12ga STR_X1 STR_Y1

Pandas get most frequent values used together in the same column

You can use groupby.apply(set) and then count the values with .value_counts:

df.groupby('user_id')['Channel'].apply(set).value_counts()\
.reset_index(name='n')\
.rename(columns={'index':'Channels_together'})

Output

  Channels_together  n
0 {a, b} 2
1 {a, c, b} 1

If you want your values in str format we can write a lambda function to sort our set and convert it to string:

df.groupby('user_id')['Channel'].apply(lambda x: ', '.join(sorted(set(x)))).value_counts()\
.reset_index(name='n')\
.rename(columns={'index':'Channels_together'})

Output

  Channels_together  n
0 a, b 2
1 a, b, c 1


Related Topics



Leave a reply



Submit