Count Unique Values Using Pandas Groupby

Count unique values using pandas groupby

I think you can use SeriesGroupBy.nunique:

print (df.groupby('param')['group'].nunique())
param
a    2
b    1
Name: group, dtype: int64

Another solution with unique, then create new df by DataFrame.from_records, reshape to Series by stack and last value_counts:

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a    2
b    1
dtype: int64

Pandas 'count(distinct)' equivalent

I believe this is what you want:

table.groupby('YEARMONTH').CLIENTCODE.nunique()

Example:

In [2]: table
Out[2]: 
   CLIENTCODE  YEARMONTH
0           1     201301
1           1     201301
2           2     201301
3           1     201302
4           2     201302
5           2     201302
6           3     201302

In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
Out[3]: 
YEARMONTH
201301       2
201302       3

Count unique values per groups with Pandas

You need nunique:

df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'facebook.com'    1
'google.com'      1
'twitter.com'     2
'vk.com'          3
Name: ID, dtype: int64

If you need to strip ' characters:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com    1
google.com      1
twitter.com     2
vk.com          3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
    domain  ID
0       fb   1
1      ggl   1
2  twitter   2
3       vk   3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

Counting unique values in a column in pandas dataframe like in Qlik?

Count distinct values, use nunique:

df['hID'].nunique()
5

Count only non-null values, use count:

df['hID'].count()
8

Count total values including null values, use the size attribute:

df['hID'].size
8

Edit to add condition

Use boolean indexing:

df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])

OR using query:

df.query('mID == "A"')['hID'].agg(['nunique','count','size'])

Output:

nunique    5
count      5
size       5
Name: hID, dtype: int64

Python group by and count distinct values in a column and create delimited list

You can use str.len in your code:

df3 = (df.groupby('company')['product']
         .apply(lambda x: list(x.unique()))
         .reset_index()
         .assign(count=lambda d: d['product'].str.len())  ## added line
      )

output:

     company            product  count
0     Amazon           [E-comm]      1
1   Facebook     [Social Media]      1
2     Google  [Search, Android]      2
3  Microsoft        [OS, X-box]      2

Pandas groupby and count unique value of column

We can drop all lines with start=='P1', then groupby id and count unique finish:

(df[df['start'].ne('P1')]       # drop rows with `start` == 'P1'
   .groupby('id')               # group by `id`
   ['finish'].nunique()         # count unique `finish`
   .reset_index(name='result')  # match the output
)

Output:

  id  result
0  A       3
1  B       1

counting unique values using .groupby in pandas dataframe

I believe you want count of each pair location, Species. And also, to assign groupby output back to the original dataframe, we usually use transform:

df['Abundance'] = df.groupby(['location','Species']).Species.transform('size')

Output:

   ID location Species  Count  Abundance
0   1        A     Cat      2          2
1   2        A     Cat      2          2
2   3        C     Dog      2          1
3   4        C     Cat      2          1
4   5        E     Cat      4          2
5   6        E     Cat      4          2
6   7        E     Dog      4          1
7   8        E    Bird      4          1

Pandas GroupBy and add count of unique values as a new column

Use transform to broadcast the result:

df['timestamp_count'] = (
    df.groupby(["source", "day"])['timestamp'].transform('nunique'))
df

   day    source                timestamp  timestamp_count
0    1  facebook  2018-08-04 11:16:32.416                2
1    1  facebook  2019-01-03 10:25:38.216                2
2    1   twitter  2018-10-14 13:26:22.123                1
3    2  facebook  2019-01-30 12:16:32.416                1

Group by column in Pandas and count Unique values in each group

Use pd.crosstab:

print(pd.crosstab(df["Period"], df["Result"]))

Prints:

Result  False  True
Period             
1           2     2
2           1     3
3           4     0
4           1     3

Count Unique Values Using Pandas Groupby