unique combinations of values in selected columns in pandas data frame and count
You can groupby
on cols 'A' and 'B' and call size
and then reset_index
and rename
the generated column:
In [26]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
update
A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size
which returns the number of unique groups:
In[202]:
df1.groupby(['A','B']).size()
Out[202]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64
So now to restore the grouped columns, we call reset_index
:
In[203]:
df1.groupby(['A','B']).size().reset_index()
Out[203]:
A B 0
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
This restores the indices but the size aggregation is turned into a generated column 0
, so we have to rename this:
In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[204]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
groupby
does accept the arg as_index
which we could have set to False
so it doesn't make the grouped columns the index, but this generates a series
and you'd still have to restore the indices and so on....:
In[205]:
df1.groupby(['A','B'], as_index=False).size()
Out[205]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64
How to count unique combinations of values in selected columns in pandas data frame including frequencies with the value of 0?
Use Series.reindex
with MultiIndex.from_product
:
s = df.groupby(['Colour', 'TOY_ID']).size()
s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
Colour TOY_ID
Blue 31490.0 50
31569.0 50
50360636.0 20
50366678.0 0
Green 31490.0 17
31569.0 0
50360636.0 0
50366678.0 10
Yellow 31490.0 0
31569.0 0
50360636.0 25
50366678.0 9
Name: a, dtype: int64
How to obtain all unique combinations of values of particular columns
There is a method for this - pandas.DataFrame.drop_duplicates
:
>>> df.drop_duplicates()
Col1 Col2 Col3
0 12 AB 13
1 11 AB 13
3 12 AC 14
You can do it inplace
as well:
>>> df.drop_duplicates(inplace=True)
>>> df
Col1 Col2 Col3
0 12 AB 13
1 11 AB 13
3 12 AC 14
If you need to get unique values of certain columns:
>>> df[['Col2','Col3']].drop_duplicates()
Col2 Col3
0 AB 13
3 AC 14
as @jezrael suggests, you can also consider using subset
parameter of drop_duplicates()
:
>>> df.drop_duplicates(subset=['Col2','Col3'])
Col1 Col2 Col3
0 12 AB 13
3 12 AC 14
Counting all combinations of values in multiple columns
This seems like a nice problem for pd.get_dummies
:
new_df = (
pd.concat([df, pd.get_dummies(df['star'])], axis=1)
.groupby(['month', 'item'], as_index=False)
[df['star'].unique()]
.sum()
)
Output:
>>> new_df
month item 1 2 3
0 1 10 2 1 1
1 1 20 0 0 1
2 2 20 0 2 1
Renaming, too:
u = df['star'].unique()
new_df = (
pd.concat([df, pd.get_dummies(df['star'])], axis=1)
.groupby(['month', 'item'], as_index=False)
[u]
.sum()
.rename({k: f'star_{k}_cnt' for k in df['star'].unique()}, axis=1)
)
Output:
>>> new_df
month item star_1_cnt star_2_cnt star_3_cnt
0 1 10 2 1 1
1 1 20 0 0 1
2 2 20 0 2 1
Obligatory one- (or two-) liners:
# Renames the columns
u = df['star'].unique()
new_df = pd.concat([df, pd.get_dummies(df['star'])], axis=1).groupby(['month', 'item'], as_index=False)[u].sum().rename({k: f'star_{k}_cnt' for k in df['star'].unique()}, axis=1)
Finding unique combinations of columns from a dataframe
First select only columns for output and add drop_duplicates
, last add new column by range
:
df = df[['age','maritalstatus']].drop_duplicates()
df['no'] = range(len(df.index))
print (df)
age maritalstatus no
0 Young married 0
1 young married 1
2 young Single 2
3 old single 3
4 old married 4
5 teen married 5
7 adult single 6
If want convert all values to lowercase first:
df = df[['age','maritalstatus']].apply(lambda x: x.str.lower()).drop_duplicates()
df['no'] = range(len(df.index))
print (df)
age maritalstatus no
0 young married 0
2 young single 1
3 old single 2
4 old married 3
5 teen married 4
7 adult single 5
EDIT:
First convert to lowercase
:
df[['age','maritalstatus']] = df[['age','maritalstatus']].apply(lambda x: x.str.lower())
print (df)
user age maritalstatus product
0 A young married 111
1 B young married 222
2 C young single 111
3 D old single 222
4 E old married 111
5 F teen married 222
6 G teen married 555
7 H adult single 444
8 I adult single 333
And then use merge
for unique product
converted to list
:
df2 = pd.DataFrame([{'user':'X', 'age':'young', 'maritalstatus':'married'}])
print (df2)
age maritalstatus user
0 young married X
a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[111, 222]
df2 = pd.DataFrame([{'user':'X', 'age':'adult', 'maritalstatus':'married'}])
print (df2)
age maritalstatus user
0 adult married X
a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[]
But if need column use transform
:
df['prod'] = df.groupby(['age', 'maritalstatus'])['product'].transform('unique')
print (df)
user age maritalstatus product prod
0 A young married 111 [111, 222]
1 B young married 222 [111, 222]
2 C young single 111 [111]
3 D old single 222 [222]
4 E old married 111 [111]
5 F teen married 222 [222, 555]
6 G teen married 555 [222, 555]
7 H adult single 444 [444, 333]
8 I adult single 333 [444, 333]
EDIT1:
a = (pd.merge(df, df2, on=['age','maritalstatus'])
.groupby('user_y')['product']
.apply(lambda x: x.unique().tolist())
.to_dict())
print (a)
{'X': [111, 222]}
Detail:
print (pd.merge(df, df2, on=['age','maritalstatus']))
user_x age maritalstatus product user_y
0 A young married 111 X
1 B young married 222 X
Add a count unique combinations across rows in pandas
I think yes, you can use:
cols = df.columns.difference(['id']).tolist()
#should working like
#cols = ['cat_1','cat_2', 'cat_3', 'cat_4', 'cat_5', 'cat_6', 'cat_7']
df = df.groupby(cols, sort=False).size().reset_index(name='count')
print (df)
cat_1 cat_2 cat_3 cat_4 count
0 Chips Null Null Null 1
1 Chips Avocado Null Null 1
2 Chips Pasta Null Null 2
3 Chips Pasta Cheese Null 1
4 Chips Sauce Cheese Null 1
5 Pasta Null Null Null 2
6 Pasta Bread Null Null 2
7 Pasta Cheese Null Null 1
Related Topics
Sqlalchemy: Unexpected Results When Using 'And' and 'Or'
Pandas Select Rows and Columns Based on Boolean Condition
Importerror: No Module Named Win32Com.Client
Adding Labels in X Y Scatter Plot with Seaborn
Error Running Basic Tensorflow Example
Why Might Python's 'From' Form of an Import Statement Bind a Module Name
How Do Python Functions Handle the Types of Parameters That You Pass In
Python: How to Send Mail with To, Cc and Bcc
Python: One Try Multiple Except
Calling a Function Upon Button Press
Constructing a Co-Occurrence Matrix in Python Pandas
Stratified Train/Test-Split in Scikit-Learn
What Does "Typeerror 'Xxx' Object Is Not Callable" Means
Read from File After Write, Before Closing
How to Add Timezone into a Naive Datetime Instance in Python
Solving Embarassingly Parallel Problems Using Python Multiprocessing