Unique Combinations of Values in Selected Columns in Pandas Data Frame and Count

unique combinations of values in selected columns in pandas data frame and count

You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3

update

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size which returns the number of unique groups:

In[202]:
df1.groupby(['A','B']).size()

Out[202]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64

So now to restore the grouped columns, we call reset_index:

In[203]:
df1.groupby(['A','B']).size().reset_index()

Out[203]:
A B 0
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3

This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

Out[204]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3

groupby does accept the arg as_index which we could have set to False so it doesn't make the grouped columns the index, but this generates a series and you'd still have to restore the indices and so on....:

In[205]:
df1.groupby(['A','B'], as_index=False).size()

Out[205]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64

How to count unique combinations of values in selected columns in pandas data frame including frequencies with the value of 0?

Use Series.reindex with MultiIndex.from_product:

s = df.groupby(['Colour', 'TOY_ID']).size()

s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
Colour TOY_ID
Blue 31490.0 50
31569.0 50
50360636.0 20
50366678.0 0
Green 31490.0 17
31569.0 0
50360636.0 0
50366678.0 10
Yellow 31490.0 0
31569.0 0
50360636.0 25
50366678.0 9
Name: a, dtype: int64

How to obtain all unique combinations of values of particular columns

There is a method for this - pandas.DataFrame.drop_duplicates:

>>> df.drop_duplicates()
Col1 Col2 Col3
0 12 AB 13
1 11 AB 13
3 12 AC 14

You can do it inplace as well:

>>> df.drop_duplicates(inplace=True)
>>> df
Col1 Col2 Col3
0 12 AB 13
1 11 AB 13
3 12 AC 14

If you need to get unique values of certain columns:

>>> df[['Col2','Col3']].drop_duplicates()
Col2 Col3
0 AB 13
3 AC 14

as @jezrael suggests, you can also consider using subset parameter of drop_duplicates():

>>> df.drop_duplicates(subset=['Col2','Col3'])
Col1 Col2 Col3
0 12 AB 13
3 12 AC 14

Counting all combinations of values in multiple columns

This seems like a nice problem for pd.get_dummies:

new_df = (
pd.concat([df, pd.get_dummies(df['star'])], axis=1)
.groupby(['month', 'item'], as_index=False)
[df['star'].unique()]
.sum()
)

Output:

>>> new_df
month item 1 2 3
0 1 10 2 1 1
1 1 20 0 0 1
2 2 20 0 2 1

Renaming, too:

u = df['star'].unique()
new_df = (
pd.concat([df, pd.get_dummies(df['star'])], axis=1)
.groupby(['month', 'item'], as_index=False)
[u]
.sum()
.rename({k: f'star_{k}_cnt' for k in df['star'].unique()}, axis=1)
)

Output:

>>> new_df
month item star_1_cnt star_2_cnt star_3_cnt
0 1 10 2 1 1
1 1 20 0 0 1
2 2 20 0 2 1

Obligatory one- (or two-) liners:

# Renames the columns
u = df['star'].unique()
new_df = pd.concat([df, pd.get_dummies(df['star'])], axis=1).groupby(['month', 'item'], as_index=False)[u].sum().rename({k: f'star_{k}_cnt' for k in df['star'].unique()}, axis=1)

Finding unique combinations of columns from a dataframe

First select only columns for output and add drop_duplicates, last add new column by range:

df = df[['age','maritalstatus']].drop_duplicates()
df['no'] = range(len(df.index))
print (df)
age maritalstatus no
0 Young married 0
1 young married 1
2 young Single 2
3 old single 3
4 old married 4
5 teen married 5
7 adult single 6

If want convert all values to lowercase first:

df = df[['age','maritalstatus']].apply(lambda x: x.str.lower()).drop_duplicates()
df['no'] = range(len(df.index))
print (df)
age maritalstatus no
0 young married 0
2 young single 1
3 old single 2
4 old married 3
5 teen married 4
7 adult single 5

EDIT:

First convert to lowercase:

df[['age','maritalstatus']] = df[['age','maritalstatus']].apply(lambda x: x.str.lower())
print (df)
user age maritalstatus product
0 A young married 111
1 B young married 222
2 C young single 111
3 D old single 222
4 E old married 111
5 F teen married 222
6 G teen married 555
7 H adult single 444
8 I adult single 333

And then use merge for unique product converted to list:

df2 = pd.DataFrame([{'user':'X', 'age':'young', 'maritalstatus':'married'}])
print (df2)
age maritalstatus user
0 young married X

a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[111, 222]

df2 = pd.DataFrame([{'user':'X', 'age':'adult', 'maritalstatus':'married'}])
print (df2)
age maritalstatus user
0 adult married X

a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[]

But if need column use transform:

df['prod'] = df.groupby(['age', 'maritalstatus'])['product'].transform('unique')
print (df)
user age maritalstatus product prod
0 A young married 111 [111, 222]
1 B young married 222 [111, 222]
2 C young single 111 [111]
3 D old single 222 [222]
4 E old married 111 [111]
5 F teen married 222 [222, 555]
6 G teen married 555 [222, 555]
7 H adult single 444 [444, 333]
8 I adult single 333 [444, 333]

EDIT1:

a = (pd.merge(df, df2, on=['age','maritalstatus'])
.groupby('user_y')['product']
.apply(lambda x: x.unique().tolist())
.to_dict())
print (a)
{'X': [111, 222]}

Detail:

print (pd.merge(df, df2, on=['age','maritalstatus']))
user_x age maritalstatus product user_y
0 A young married 111 X
1 B young married 222 X

Add a count unique combinations across rows in pandas

I think yes, you can use:

cols = df.columns.difference(['id']).tolist()
#should working like
#cols = ['cat_1','cat_2', 'cat_3', 'cat_4', 'cat_5', 'cat_6', 'cat_7']
df = df.groupby(cols, sort=False).size().reset_index(name='count')
print (df)
cat_1 cat_2 cat_3 cat_4 count
0 Chips Null Null Null 1
1 Chips Avocado Null Null 1
2 Chips Pasta Null Null 2
3 Chips Pasta Cheese Null 1
4 Chips Sauce Cheese Null 1
5 Pasta Null Null Null 2
6 Pasta Bread Null Null 2
7 Pasta Cheese Null Null 1


Related Topics



Leave a reply



Submit