Count unique values using pandas groupby
I think you can use SeriesGroupBy.nunique
:
print (df.groupby('param')['group'].nunique())
param
a 2
b 1
Name: group, dtype: int64
Another solution with unique
, then create new df
by DataFrame.from_records
, reshape to Series
by stack
and last value_counts
:
a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a 2
b 1
dtype: int64
Pandas 'count(distinct)' equivalent
I believe this is what you want:
table.groupby('YEARMONTH').CLIENTCODE.nunique()
Example:
In [2]: table
Out[2]:
CLIENTCODE YEARMONTH
0 1 201301
1 1 201301
2 2 201301
3 1 201302
4 2 201302
5 2 201302
6 3 201302
In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
Out[3]:
YEARMONTH
201301 2
201302 3
Count unique values per groups with Pandas
You need nunique
:
df = df.groupby('domain')['ID'].nunique()
print (df)
domain
'facebook.com' 1
'google.com' 1
'twitter.com' 2
'vk.com' 3
Name: ID, dtype: int64
If you need to strip
'
characters:
df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com 1
google.com 1
twitter.com 2
vk.com 3
Name: ID, dtype: int64
Or as Jon Clements commented:
df.groupby(df.domain.str.strip("'"))['ID'].nunique()
You can retain the column name like this:
df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain ID
0 fb 1
1 ggl 1
2 twitter 2
3 vk 3
The difference is that nunique()
returns a Series and agg()
returns a DataFrame.
Counting unique values in a column in pandas dataframe like in Qlik?
Count distinct values, use nunique
:
df['hID'].nunique()
5
Count only non-null values, use count
:
df['hID'].count()
8
Count total values including null values, use the size
attribute:
df['hID'].size
8
Edit to add condition
Use boolean indexing:
df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])
OR using query
:
df.query('mID == "A"')['hID'].agg(['nunique','count','size'])
Output:
nunique 5
count 5
size 5
Name: hID, dtype: int64
Python group by and count distinct values in a column and create delimited list
You can use str.len
in your code:
df3 = (df.groupby('company')['product']
.apply(lambda x: list(x.unique()))
.reset_index()
.assign(count=lambda d: d['product'].str.len()) ## added line
)
output:
company product count
0 Amazon [E-comm] 1
1 Facebook [Social Media] 1
2 Google [Search, Android] 2
3 Microsoft [OS, X-box] 2
Pandas groupby and count unique value of column
We can drop all lines with start=='P1'
, then groupby id
and count unique finish
:
(df[df['start'].ne('P1')] # drop rows with `start` == 'P1'
.groupby('id') # group by `id`
['finish'].nunique() # count unique `finish`
.reset_index(name='result') # match the output
)
Output:
id result
0 A 3
1 B 1
counting unique values using .groupby in pandas dataframe
I believe you want count of each pair location, Species
. And also, to assign groupby
output back to the original dataframe, we usually use transform
:
df['Abundance'] = df.groupby(['location','Species']).Species.transform('size')
Output:
ID location Species Count Abundance
0 1 A Cat 2 2
1 2 A Cat 2 2
2 3 C Dog 2 1
3 4 C Cat 2 1
4 5 E Cat 4 2
5 6 E Cat 4 2
6 7 E Dog 4 1
7 8 E Bird 4 1
Pandas GroupBy and add count of unique values as a new column
Use transform
to broadcast the result:
df['timestamp_count'] = (
df.groupby(["source", "day"])['timestamp'].transform('nunique'))
df
day source timestamp timestamp_count
0 1 facebook 2018-08-04 11:16:32.416 2
1 1 facebook 2019-01-03 10:25:38.216 2
2 1 twitter 2018-10-14 13:26:22.123 1
3 2 facebook 2019-01-30 12:16:32.416 1
Group by column in Pandas and count Unique values in each group
Use pd.crosstab
:
print(pd.crosstab(df["Period"], df["Result"]))
Prints:
Result False True
Period
1 2 2
2 1 3
3 4 0
4 1 3
Related Topics
Typeerror: Str Does Not Support Buffer Interface
Why Isn't Python Very Good for Functional Programming
How to Install Python 3.X and 2.X on the Same Windows Computer
Find First Sequence Item That Matches a Criterion
How to Change the Figure Size with Subplots
Python Dictionary:Typeerror: Unhashable Type: 'List'
What's the Difference Between _Builtin_ and _Builtins_
Inheritance of Private and Protected Methods in Python
Can You Use a String to Instantiate a Class
How to Run a Function Periodically in Python
Speed of Calculating Powers (In Python)
Deleting List Elements Based on Condition
Global Variable from a Different File Python
Importing Modules: _Main_ VS Import as Module
Find P-Value (Significance) in Scikit-Learn Linearregression
Type Hint for a Function That Returns Only a Specific Set of Values
Downloading with Chrome Headless and Selenium
Convert Floating Point Number to a Certain Precision, and Then Copy to String