Pandas 'count(distinct)' equivalent
I believe this is what you want:
table.groupby('YEARMONTH').CLIENTCODE.nunique()
Example:
In [2]: table
Out[2]:
CLIENTCODE YEARMONTH
0 1 201301
1 1 201301
2 2 201301
3 1 201302
4 2 201302
5 2 201302
6 3 201302
In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
Out[3]:
YEARMONTH
201301 2
201302 3
Count unique values using pandas groupby
I think you can use SeriesGroupBy.nunique
:
print (df.groupby('param')['group'].nunique())
param
a 2
b 1
Name: group, dtype: int64
Another solution with unique
, then create new df
by DataFrame.from_records
, reshape to Series
by stack
and last value_counts
:
a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a 2
b 1
dtype: int64
Count unique values per groups with Pandas
You need nunique
:
df = df.groupby('domain')['ID'].nunique()
print (df)
domain
'facebook.com' 1
'google.com' 1
'twitter.com' 2
'vk.com' 3
Name: ID, dtype: int64
If you need to strip
'
characters:
df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com 1
google.com 1
twitter.com 2
vk.com 3
Name: ID, dtype: int64
Or as Jon Clements commented:
df.groupby(df.domain.str.strip("'"))['ID'].nunique()
You can retain the column name like this:
df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain ID
0 fb 1
1 ggl 1
2 twitter 2
3 vk 3
The difference is that nunique()
returns a Series and agg()
returns a DataFrame.
Counting unique values in a column in pandas dataframe like in Qlik?
Count distinct values, use nunique
:
df['hID'].nunique()
5
Count only non-null values, use count
:
df['hID'].count()
8
Count total values including null values, use the size
attribute:
df['hID'].size
8
Edit to add condition
Use boolean indexing:
df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])
OR using query
:
df.query('mID == "A"')['hID'].agg(['nunique','count','size'])
Output:
nunique 5
count 5
size 5
Name: hID, dtype: int64
Laravel Eloquent - distinct() and count() not working properly together
The following should work
$ad->getcodes()->distinct()->count('pid');
Pandas aggregate count distinct
How about either of:
>>> df
date duration user_id
0 2013-04-01 30 0001
1 2013-04-01 15 0001
2 2013-04-01 20 0002
3 2013-04-02 15 0002
4 2013-04-02 30 0002
>>> df.groupby("date").agg({"duration": np.sum, "user_id": pd.Series.nunique})
duration user_id
date
2013-04-01 65 2
2013-04-02 45 1
>>> df.groupby("date").agg({"duration": np.sum, "user_id": lambda x: x.nunique()})
duration user_id
date
2013-04-01 65 2
2013-04-02 45 1
count distinct and window functions
You can use a correlated subquery for this:
SELECT id, trxn_dt, trxn_amt, trxn_category,
(SELECT COUNT(DISTINCT trxn_category)
FROM mytable AS t2
WHERE t2.id = t1.id) AS cnt
FROM mytable AS t1
Demo here
Distinct Count with two tables
You should join the customer table with an already distinct-ed table (using inner query)
SELECT Customer.Dogs, COUNT(distinctTransactions.TransactionID) AS TotTrans
FROM (select distinct TransactionID,CustomerID from Transaction) as
distinctTransactions, Customer
WHERE distinctTransactions.CustomerID = Customer.CustomerID
GROUP BY Dogs
Related Topics
How to Read the Rgb Value of a Given Pixel in Python
Remove and Replace Printed Items
Extract Images from PDF Without Resampling, in Python
What Exactly Is File.Flush() Doing
How to Ignore Deprecation Warnings in Python
Converting String with Utc Offset to a Datetime Object
Differencebetween Slice Assignment That Slices the Whole List and Direct Assignment
What Does It Mean to "Call" a Function in Python
Zip Variable Empty After First Use
How to Plot Multiple Seaborn Jointplot in Subplot
Python' Is Not Recognized as an Internal or External Command
Why Does Python Code Use Len() Function Instead of a Length Method
How to Use Angularjs with the Jinja2 Template Engine
Decode Escaped Characters in Url