Count unique values per groups with Pandas
You need nunique
:
df = df.groupby('domain')['ID'].nunique()
print (df)
domain
'facebook.com' 1
'google.com' 1
'twitter.com' 2
'vk.com' 3
Name: ID, dtype: int64
If you need to strip
'
characters:
df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com 1
google.com 1
twitter.com 2
vk.com 3
Name: ID, dtype: int64
Or as Jon Clements commented:
df.groupby(df.domain.str.strip("'"))['ID'].nunique()
You can retain the column name like this:
df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain ID
0 fb 1
1 ggl 1
2 twitter 2
3 vk 3
The difference is that nunique()
returns a Series and agg()
returns a DataFrame.
Count unique values using pandas groupby
I think you can use SeriesGroupBy.nunique
:
print (df.groupby('param')['group'].nunique())
param
a 2
b 1
Name: group, dtype: int64
Another solution with unique
, then create new df
by DataFrame.from_records
, reshape to Series
by stack
and last value_counts
:
a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a 2
b 1
dtype: int64
Python group by and count distinct values in a column and create delimited list
You can use str.len
in your code:
df3 = (df.groupby('company')['product']
.apply(lambda x: list(x.unique()))
.reset_index()
.assign(count=lambda d: d['product'].str.len()) ## added line
)
output:
company product count
0 Amazon [E-comm] 1
1 Facebook [Social Media] 1
2 Google [Search, Android] 2
3 Microsoft [OS, X-box] 2
Count of unique values per group as new column with pandas
GroupBy.transform('nunique')
On v0.23.4
, your solution works for me.
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
GroupBy.nunique
+ pd.Series.map
Additionally, with your existing solution, you could map
the series back to mID
:
df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
Count the number of unique values per group
Looks like you want transform
+ nunique
;
df['a_b_3'] = df.groupby('_a')['_b'].transform('nunique')
df
_a _b a_b_3
0 1 3 3
1 1 4 3
2 1 5 3
3 2 3 1
4 2 3 1
5 3 3 2
6 3 9 2
This is effectively groupby
+ nunique
+ map
:
v = df.groupby('_a')['_b'].nunique()
df['a_b_3'] = df['_a'].map(v)
df
_a _b a_b_3
0 1 3 3
1 1 4 3
2 1 5 3
3 2 3 1
4 2 3 1
5 3 3 2
6 3 9 2
Pandas groupby and count unique value of column
We can drop all lines with start=='P1'
, then groupby id
and count unique finish
:
(df[df['start'].ne('P1')] # drop rows with `start` == 'P1'
.groupby('id') # group by `id`
['finish'].nunique() # count unique `finish`
.reset_index(name='result') # match the output
)
Output:
id result
0 A 3
1 B 1
Pandas count average number of unique numbers across groups
IIUC, first you want to aggregate over each age
and household
:
agg = (df.groupby(['age_group', 'household_key'])
.agg({'DAY': 'nunique'})
)
and then groupby again for the mean, e.g.,
agg.groupby('age_group').mean()
will give you the mean for each age_group
across the household_key
.
Related Topics
Parsing Date/Time String with Timezone Abbreviated Name in Python
Pygame Already Installed; However, Python Terminal Says "No Module Named 'Pygame' " (Ubuntu 20.04.1)
How to Mock Requests and the Response
Converting Integer to Binary in Python
Difference Between Variables Inside and Outside of _Init_()
Access an Arbitrary Element in a Dictionary in Python
Python Functions Call by Reference
Send File Using Post from a Python Script
Replace All Elements of Python Numpy Array That Are Greater Than Some Value
What Does the Caret (^) Operator Do
Concatenate Rows of Two Dataframes in Pandas