Count Unique Values Per Groups with Pandas

Count unique values per groups with Pandas

You need nunique:

df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'facebook.com' 1
'google.com' 1
'twitter.com' 2
'vk.com' 3
Name: ID, dtype: int64

If you need to strip ' characters:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com 1
google.com 1
twitter.com 2
vk.com 3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain ID
0 fb 1
1 ggl 1
2 twitter 2
3 vk 3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

Count unique values using pandas groupby

I think you can use SeriesGroupBy.nunique:

print (df.groupby('param')['group'].nunique())
param
a 2
b 1
Name: group, dtype: int64

Another solution with unique, then create new df by DataFrame.from_records, reshape to Series by stack and last value_counts:

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a 2
b 1
dtype: int64

Python group by and count distinct values in a column and create delimited list

You can use str.len in your code:

df3 = (df.groupby('company')['product']
.apply(lambda x: list(x.unique()))
.reset_index()
.assign(count=lambda d: d['product'].str.len()) ## added line
)

output:

     company            product  count
0 Amazon [E-comm] 1
1 Facebook [Social Media] 1
2 Google [Search, Android] 2
3 Microsoft [OS, X-box] 2

Count of unique values per group as new column with pandas

GroupBy.transform('nunique')

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1


GroupBy.nunique + pd.Series.map

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1

Count the number of unique values per group

Looks like you want transform + nunique;

df['a_b_3'] = df.groupby('_a')['_b'].transform('nunique')        
df
_a _b a_b_3
0 1 3 3
1 1 4 3
2 1 5 3
3 2 3 1
4 2 3 1
5 3 3 2
6 3 9 2

This is effectively groupby + nunique + map:

v = df.groupby('_a')['_b'].nunique()
df['a_b_3'] = df['_a'].map(v)

df
_a _b a_b_3
0 1 3 3
1 1 4 3
2 1 5 3
3 2 3 1
4 2 3 1
5 3 3 2
6 3 9 2

Pandas groupby and count unique value of column

We can drop all lines with start=='P1', then groupby id and count unique finish:

(df[df['start'].ne('P1')]       # drop rows with `start` == 'P1'
.groupby('id') # group by `id`
['finish'].nunique() # count unique `finish`
.reset_index(name='result') # match the output
)

Output:

  id  result
0 A 3
1 B 1

Pandas count average number of unique numbers across groups

IIUC, first you want to aggregate over each age and household:

agg = (df.groupby(['age_group', 'household_key'])
.agg({'DAY': 'nunique'})
)

and then groupby again for the mean, e.g.,

agg.groupby('age_group').mean()

will give you the mean for each age_group across the household_key.



Related Topics



Leave a reply



Submit