Unique Combinations of Values in Selected Columns in Pandas Data Frame and Count

unique combinations of values in selected columns in pandas data frame and count

You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

update

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size which returns the number of unique groups:

In[202]:
df1.groupby(['A','B']).size()

Out[202]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

So now to restore the grouped columns, we call reset_index:

In[203]:
df1.groupby(['A','B']).size().reset_index()

Out[203]: 
     A    B  0
0   no   no  1
1   no  yes  2
2  yes   no  4
3  yes  yes  3

This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

Out[204]: 
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

groupby does accept the arg as_index which we could have set to False so it doesn't make the grouped columns the index, but this generates a series and you'd still have to restore the indices and so on....:

In[205]:
df1.groupby(['A','B'], as_index=False).size()

Out[205]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

How to count unique combinations of values in selected columns in pandas data frame including frequencies with the value of 0?

Use Series.reindex with MultiIndex.from_product:

s = df.groupby(['Colour', 'TOY_ID']).size()

s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
Colour  TOY_ID    
Blue    31490.0       50
        31569.0       50
        50360636.0    20
        50366678.0     0
Green   31490.0       17
        31569.0        0
        50360636.0     0
        50366678.0    10
Yellow  31490.0        0
        31569.0        0
        50360636.0    25
        50366678.0     9
Name: a, dtype: int64

How to obtain all unique combinations of values of particular columns

There is a method for this - pandas.DataFrame.drop_duplicates:

>>> df.drop_duplicates()
   Col1 Col2  Col3
0    12   AB    13
1    11   AB    13
3    12   AC    14

You can do it inplace as well:

>>> df.drop_duplicates(inplace=True)
>>> df
   Col1 Col2  Col3
0    12   AB    13
1    11   AB    13
3    12   AC    14

If you need to get unique values of certain columns:

>>> df[['Col2','Col3']].drop_duplicates()
  Col2  Col3
0   AB    13
3   AC    14

as @jezrael suggests, you can also consider using subset parameter of drop_duplicates():

>>> df.drop_duplicates(subset=['Col2','Col3'])
   Col1 Col2  Col3
0    12   AB    13
3    12   AC    14

Counting all combinations of values in multiple columns

This seems like a nice problem for pd.get_dummies:

new_df = (
    pd.concat([df, pd.get_dummies(df['star'])], axis=1)
    .groupby(['month', 'item'], as_index=False)
    [df['star'].unique()]
    .sum()
)

Output:

>>> new_df
   month  item  1  2  3
0      1    10  2  1  1
1      1    20  0  0  1
2      2    20  0  2  1

Renaming, too:

u = df['star'].unique()
new_df = (
    pd.concat([df, pd.get_dummies(df['star'])], axis=1)
    .groupby(['month', 'item'], as_index=False)
    [u]
    .sum()
    .rename({k: f'star_{k}_cnt' for k in df['star'].unique()}, axis=1)
)

Output:

>>> new_df
   month  item  star_1_cnt  star_2_cnt  star_3_cnt
0      1    10           2           1           1
1      1    20           0           0           1
2      2    20           0           2           1

Obligatory one- (or two-) liners:

# Renames the columns
u = df['star'].unique()
new_df = pd.concat([df, pd.get_dummies(df['star'])], axis=1).groupby(['month', 'item'], as_index=False)[u].sum().rename({k: f'star_{k}_cnt' for k in df['star'].unique()}, axis=1)

Finding unique combinations of columns from a dataframe

First select only columns for output and add drop_duplicates, last add new column by range:

df = df[['age','maritalstatus']].drop_duplicates()
df['no'] = range(len(df.index))
print (df)
     age maritalstatus  no
0  Young       married   0
1  young       married   1
2  young        Single   2
3    old        single   3
4    old       married   4
5   teen       married   5
7  adult        single   6

If want convert all values to lowercase first:

df = df[['age','maritalstatus']].apply(lambda x: x.str.lower()).drop_duplicates()
df['no'] = range(len(df.index))
print (df)
     age maritalstatus  no
0  young       married   0
2  young        single   1
3    old        single   2
4    old       married   3
5   teen       married   4
7  adult        single   5

EDIT:

First convert to lowercase:

df[['age','maritalstatus']] = df[['age','maritalstatus']].apply(lambda x: x.str.lower())
print (df)
  user    age maritalstatus  product
0    A  young       married      111
1    B  young       married      222
2    C  young        single      111
3    D    old        single      222
4    E    old       married      111
5    F   teen       married      222
6    G   teen       married      555
7    H  adult        single      444
8    I  adult        single      333

And then use merge for unique product converted to list:

df2 = pd.DataFrame([{'user':'X', 'age':'young', 'maritalstatus':'married'}])
print (df2)
     age maritalstatus user
0  young       married    X

a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[111, 222]

df2 = pd.DataFrame([{'user':'X', 'age':'adult', 'maritalstatus':'married'}])
print (df2)
     age maritalstatus user
0  adult       married    X

a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[]

But if need column use transform:

df['prod'] = df.groupby(['age', 'maritalstatus'])['product'].transform('unique')
print (df)
  user    age maritalstatus  product        prod
0    A  young       married      111  [111, 222]
1    B  young       married      222  [111, 222]
2    C  young        single      111       [111]
3    D    old        single      222       [222]
4    E    old       married      111       [111]
5    F   teen       married      222  [222, 555]
6    G   teen       married      555  [222, 555]
7    H  adult        single      444  [444, 333]
8    I  adult        single      333  [444, 333]

EDIT1:

a = (pd.merge(df, df2, on=['age','maritalstatus'])
       .groupby('user_y')['product']
       .apply(lambda x: x.unique().tolist())
       .to_dict())
print (a)
{'X': [111, 222]}

Detail:

print (pd.merge(df, df2, on=['age','maritalstatus']))
  user_x    age maritalstatus  product user_y
0      A  young       married      111      X
1      B  young       married      222      X

Add a count unique combinations across rows in pandas

I think yes, you can use:

cols = df.columns.difference(['id']).tolist()
#should working like
#cols = ['cat_1','cat_2', 'cat_3', 'cat_4', 'cat_5', 'cat_6', 'cat_7']
df = df.groupby(cols, sort=False).size().reset_index(name='count')
print (df)
   cat_1    cat_2   cat_3 cat_4  count
0  Chips     Null    Null  Null      1
1  Chips  Avocado    Null  Null      1
2  Chips    Pasta    Null  Null      2
3  Chips    Pasta  Cheese  Null      1
4  Chips    Sauce  Cheese  Null      1
5  Pasta     Null    Null  Null      2
6  Pasta    Bread    Null  Null      2
7  Pasta   Cheese    Null  Null      1

Unique Combinations of Values in Selected Columns in Pandas Data Frame and Count