Count Unique Values Using Pandas Groupby

Count unique values using pandas groupby

I think you can use SeriesGroupBy.nunique:

print (df.groupby('param')['group'].nunique())
param
a 2
b 1
Name: group, dtype: int64

Another solution with unique, then create new df by DataFrame.from_records, reshape to Series by stack and last value_counts:

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a 2
b 1
dtype: int64

Pandas 'count(distinct)' equivalent

I believe this is what you want:

table.groupby('YEARMONTH').CLIENTCODE.nunique()

Example:

In [2]: table
Out[2]:
CLIENTCODE YEARMONTH
0 1 201301
1 1 201301
2 2 201301
3 1 201302
4 2 201302
5 2 201302
6 3 201302

In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
Out[3]:
YEARMONTH
201301 2
201302 3

Count unique values per groups with Pandas

You need nunique:

df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'facebook.com' 1
'google.com' 1
'twitter.com' 2
'vk.com' 3
Name: ID, dtype: int64

If you need to strip ' characters:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com 1
google.com 1
twitter.com 2
vk.com 3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain ID
0 fb 1
1 ggl 1
2 twitter 2
3 vk 3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

Counting unique values in a column in pandas dataframe like in Qlik?

Count distinct values, use nunique:

df['hID'].nunique()
5

Count only non-null values, use count:

df['hID'].count()
8

Count total values including null values, use the size attribute:

df['hID'].size
8

Edit to add condition

Use boolean indexing:

df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])

OR using query:

df.query('mID == "A"')['hID'].agg(['nunique','count','size'])

Output:

nunique    5
count 5
size 5
Name: hID, dtype: int64

Python group by and count distinct values in a column and create delimited list

You can use str.len in your code:

df3 = (df.groupby('company')['product']
.apply(lambda x: list(x.unique()))
.reset_index()
.assign(count=lambda d: d['product'].str.len()) ## added line
)

output:

     company            product  count
0 Amazon [E-comm] 1
1 Facebook [Social Media] 1
2 Google [Search, Android] 2
3 Microsoft [OS, X-box] 2

Pandas groupby and count unique value of column

We can drop all lines with start=='P1', then groupby id and count unique finish:

(df[df['start'].ne('P1')]       # drop rows with `start` == 'P1'
.groupby('id') # group by `id`
['finish'].nunique() # count unique `finish`
.reset_index(name='result') # match the output
)

Output:

  id  result
0 A 3
1 B 1

counting unique values using .groupby in pandas dataframe

I believe you want count of each pair location, Species. And also, to assign groupby output back to the original dataframe, we usually use transform:

df['Abundance'] = df.groupby(['location','Species']).Species.transform('size')

Output:

   ID location Species  Count  Abundance
0 1 A Cat 2 2
1 2 A Cat 2 2
2 3 C Dog 2 1
3 4 C Cat 2 1
4 5 E Cat 4 2
5 6 E Cat 4 2
6 7 E Dog 4 1
7 8 E Bird 4 1

Pandas GroupBy and add count of unique values as a new column

Use transform to broadcast the result:

df['timestamp_count'] = (
df.groupby(["source", "day"])['timestamp'].transform('nunique'))
df

day source timestamp timestamp_count
0 1 facebook 2018-08-04 11:16:32.416 2
1 1 facebook 2019-01-03 10:25:38.216 2
2 1 twitter 2018-10-14 13:26:22.123 1
3 2 facebook 2019-01-30 12:16:32.416 1

Group by column in Pandas and count Unique values in each group

Use pd.crosstab:

print(pd.crosstab(df["Period"], df["Result"]))

Prints:

Result  False  True
Period
1 2 2
2 1 3
3 4 0
4 1 3


Related Topics



Leave a reply



Submit