Pandas 'Count(Distinct)' Equivalent

Pandas 'count(distinct)' equivalent

I believe this is what you want:

table.groupby('YEARMONTH').CLIENTCODE.nunique()

Example:

In [2]: table
Out[2]:
CLIENTCODE YEARMONTH
0 1 201301
1 1 201301
2 2 201301
3 1 201302
4 2 201302
5 2 201302
6 3 201302

In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
Out[3]:
YEARMONTH
201301 2
201302 3

Count unique values using pandas groupby

I think you can use SeriesGroupBy.nunique:

print (df.groupby('param')['group'].nunique())
param
a 2
b 1
Name: group, dtype: int64

Another solution with unique, then create new df by DataFrame.from_records, reshape to Series by stack and last value_counts:

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a 2
b 1
dtype: int64

Count unique values per groups with Pandas

You need nunique:

df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'facebook.com' 1
'google.com' 1
'twitter.com' 2
'vk.com' 3
Name: ID, dtype: int64

If you need to strip ' characters:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com 1
google.com 1
twitter.com 2
vk.com 3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain ID
0 fb 1
1 ggl 1
2 twitter 2
3 vk 3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

Counting unique values in a column in pandas dataframe like in Qlik?

Count distinct values, use nunique:

df['hID'].nunique()
5

Count only non-null values, use count:

df['hID'].count()
8

Count total values including null values, use the size attribute:

df['hID'].size
8

Edit to add condition

Use boolean indexing:

df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])

OR using query:

df.query('mID == "A"')['hID'].agg(['nunique','count','size'])

Output:

nunique    5
count 5
size 5
Name: hID, dtype: int64

Laravel Eloquent - distinct() and count() not working properly together

The following should work

$ad->getcodes()->distinct()->count('pid');

Pandas aggregate count distinct

How about either of:

>>> df
date duration user_id
0 2013-04-01 30 0001
1 2013-04-01 15 0001
2 2013-04-01 20 0002
3 2013-04-02 15 0002
4 2013-04-02 30 0002
>>> df.groupby("date").agg({"duration": np.sum, "user_id": pd.Series.nunique})
duration user_id
date
2013-04-01 65 2
2013-04-02 45 1
>>> df.groupby("date").agg({"duration": np.sum, "user_id": lambda x: x.nunique()})
duration user_id
date
2013-04-01 65 2
2013-04-02 45 1

count distinct and window functions

You can use a correlated subquery for this:

SELECT id, trxn_dt, trxn_amt, trxn_category,
(SELECT COUNT(DISTINCT trxn_category)
FROM mytable AS t2
WHERE t2.id = t1.id) AS cnt
FROM mytable AS t1

Demo here

Distinct Count with two tables

You should join the customer table with an already distinct-ed table (using inner query)

SELECT Customer.Dogs, COUNT(distinctTransactions.TransactionID) AS TotTrans
FROM (select distinct TransactionID,CustomerID from Transaction) as
distinctTransactions, Customer
WHERE distinctTransactions.CustomerID = Customer.CustomerID
GROUP BY Dogs


Related Topics



Leave a reply



Submit