Counting unique values in a column in pandas dataframe like in Qlik?
Count distinct values, use nunique
:
df['hID'].nunique()
5
Count only non-null values, use count
:
df['hID'].count()
8
Count total values including null values, use the size
attribute:
df['hID'].size
8
Edit to add condition
Use boolean indexing:
df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])
OR using query
:
df.query('mID == "A"')['hID'].agg(['nunique','count','size'])
Output:
nunique 5
count 5
size 5
Name: hID, dtype: int64
Pandas Counting Unique Rows
You can use size
with reset_index
:
print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
ColA ColB Count
0 1 1 3
1 1 2 2
2 2 1 1
3 3 2 1
Count of unique values that occur more than 100 in a data frame
This would solve the problem.
import pandas as pd
# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()
# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()
# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)
this is the sample dataframe:
drug_name
0 hello
1 hello
2 hello
3 bye
4 bye
These are the unique values counted:
hello 4
bye 2
These are the unique values > n:
hello 4
Count unique values per groups with Pandas
You need nunique
:
df = df.groupby('domain')['ID'].nunique()
print (df)
domain
'facebook.com' 1
'google.com' 1
'twitter.com' 2
'vk.com' 3
Name: ID, dtype: int64
If you need to strip
'
characters:
df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com 1
google.com 1
twitter.com 2
vk.com 3
Name: ID, dtype: int64
Or as Jon Clements commented:
df.groupby(df.domain.str.strip("'"))['ID'].nunique()
You can retain the column name like this:
df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain ID
0 fb 1
1 ggl 1
2 twitter 2
3 vk 3
The difference is that nunique()
returns a Series and agg()
returns a DataFrame.
python, count unique list values of a list inside a data frame
Set the index of dataframe as age
, then use Series.explode
on column what colours do you like?'
then use groupby
on level=0
and aggregate the series using value_counts
:
df1 = (
df.set_index('age')['what colours do you like?'].explode()
.rename('color').groupby(level=0).value_counts().reset_index(name='count')
)
Result:
print(df1)
age color count
0 18-25 yellow 2
1 18-25 green 1
2 18-25 orange 1
3 26-30 blue 2
4 26-30 red 2
5 26-30 green 1
6 26-30 orange 1
Count unique occurrences within data frame
One option could be:
sapply(df, function(x) table(factor(x, levels = unique(unlist(df)))))
V1 v2 v3
A 1 1 2
B 1 2 0
D 1 0 1
C 0 1 1
Counting Unique Values in a Column
Do with get_dummies
df.category.str.get_dummies(',').replace(0,np.nan).stack().sum(level=1)
New data frame with unique values and counts
The expected output is not clear. Some assumptions of expected output
- Sum of 'N' by 'date'
library(data.table)
dt[, .(N = sum(N, na.rm = TRUE)), by = date]
- Count of unique 'article_id' for each date
dt1[, .(N = uniqueN(article_id)), by = date]
- Get the first count by 'date'
dt1[, .(N = first(N)), by = date]
How to count unique values in pandas column base on dictionary values
From my previous answer, I slightly modified the code:
data = []
for g, d in dic.items():
for k, l in d.items():
data.extend([(g, v, k) for v in l])
df1 = pd.DataFrame(data, columns=['ID', 'id1', 'id2'])
out = dff.merge(df1, on=['ID', 'id1']) \
.drop_duplicates(['ID', 'id1']) \
.value_counts('id2')
print(out)
# Output:
id2
aasd2 3
aasd 2
gsd3 2
vaasd 1
dtype: int64
Update
Is it possible to get what id1 in each list? for example aasd2-value count 3 and ['85649','85655','56731'].
out = (
dff.merge(df1, on=['ID', 'id1']).drop_duplicates(['ID', 'id1'])
.groupby(['ID', 'id2'])
.agg(**{'Value Count': ('id2', 'size'), 'List id1': ('id1', list)})
.reset_index()
)
print(out)
# Output:
ID id2 Value Count List id1
0 G-00001 aasd 2 [85646, 85647]
1 G-00001 vaasd 1 [8564316]
2 G-00002 aasd2 3 [85649, 85655, 56731]
3 G-00003 gsd3 2 [34566, 78931]
Finding count of distinct elements in DataFrame in each column
As of pandas 0.20 we can use nunique
directly on DataFrame
s, i.e.:
df.nunique()
a 4
b 5
c 1
dtype: int64
Other legacy options:
You could do a transpose of the df and then using apply
call nunique
row-wise:
In [205]:
df = pd.DataFrame({'a':[0,1,1,2,3],'b':[1,2,3,4,5],'c':[1,1,1,1,1]})
df
Out[205]:
a b c
0 0 1 1
1 1 2 1
2 1 3 1
3 2 4 1
4 3 5 1
In [206]:
df.T.apply(lambda x: x.nunique(), axis=1)
Out[206]:
a 4
b 5
c 1
dtype: int64
EDIT
As pointed out by @ajcr the transpose is unnecessary:
In [208]:
df.apply(pd.Series.nunique)
Out[208]:
a 4
b 5
c 1
dtype: int64
Related Topics
Annotating Facet Title as Strip Over Facet
Shiny: How to Adjust the Width of the Tabsetpanel
Dynamically Add Function to R6 Class Instance
How to Create Design Matrix in R
Adding Text to Ggplot Geom_Jitter Points That Match a Condition
How to Plot the Relative Proportions of Two Groups Using a Fill Aesthetic in Ggplot2
R Xts: Generating 1 Minute Time Series from Second Events
How to Add an External Legend to Ggpairs()
R: Why Does Read.Table Stop Reading a File
Add Textbox to Facet Wrapped Layout in Ggplot2
R Dplyr Join by Range or Virtual Column
Dodging Points and Error Bars with Ggplot
R Shiny Error: Cannot Coerce Type 'Closure' to Vector of Type 'Double'
Convert Column in Data.Frame to Date
Coding Variable Values into Classes Using R
Plot Mean and Sd of Dataset Per X Value Using Ggplot2