Counting Unique Items in Data Frame

Counting unique values in a column in pandas dataframe like in Qlik?

Count distinct values, use nunique:

df['hID'].nunique()
5

Count only non-null values, use count:

df['hID'].count()
8

Count total values including null values, use the size attribute:

df['hID'].size
8

Edit to add condition

Use boolean indexing:

df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])

OR using query:

df.query('mID == "A"')['hID'].agg(['nunique','count','size'])

Output:

nunique    5
count 5
size 5
Name: hID, dtype: int64

Pandas Counting Unique Rows

You can use size with reset_index:

print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
ColA ColB Count
0 1 1 3
1 1 2 2
2 2 1 1
3 3 2 1

Count of unique values that occur more than 100 in a data frame

This would solve the problem.


import pandas as pd

# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()

# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()

# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)

this is the sample dataframe:

  drug_name
0 hello
1 hello
2 hello
3 bye
4 bye

These are the unique values counted:

hello    4
bye 2

These are the unique values > n:

hello    4

Count unique values per groups with Pandas

You need nunique:

df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'facebook.com' 1
'google.com' 1
'twitter.com' 2
'vk.com' 3
Name: ID, dtype: int64

If you need to strip ' characters:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com 1
google.com 1
twitter.com 2
vk.com 3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain ID
0 fb 1
1 ggl 1
2 twitter 2
3 vk 3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

python, count unique list values of a list inside a data frame

Set the index of dataframe as age, then use Series.explode on column what colours do you like?' then use groupby on level=0 and aggregate the series using value_counts:

df1 = (
df.set_index('age')['what colours do you like?'].explode()
.rename('color').groupby(level=0).value_counts().reset_index(name='count')
)

Result:

print(df1)
age color count
0 18-25 yellow 2
1 18-25 green 1
2 18-25 orange 1
3 26-30 blue 2
4 26-30 red 2
5 26-30 green 1
6 26-30 orange 1

Count unique occurrences within data frame

One option could be:

sapply(df, function(x) table(factor(x, levels = unique(unlist(df)))))

V1 v2 v3
A 1 1 2
B 1 2 0
D 1 0 1
C 0 1 1

Counting Unique Values in a Column

Do with get_dummies

df.category.str.get_dummies(',').replace(0,np.nan).stack().sum(level=1)

New data frame with unique values and counts

The expected output is not clear. Some assumptions of expected output

  1. Sum of 'N' by 'date'
library(data.table)
dt[, .(N = sum(N, na.rm = TRUE)), by = date]

  1. Count of unique 'article_id' for each date
dt1[, .(N = uniqueN(article_id)), by = date]

  1. Get the first count by 'date'
dt1[, .(N = first(N)), by = date]

How to count unique values in pandas column base on dictionary values

From my previous answer, I slightly modified the code:

data = []
for g, d in dic.items():
for k, l in d.items():
data.extend([(g, v, k) for v in l])
df1 = pd.DataFrame(data, columns=['ID', 'id1', 'id2'])

out = dff.merge(df1, on=['ID', 'id1']) \
.drop_duplicates(['ID', 'id1']) \
.value_counts('id2')
print(out)

# Output:
id2
aasd2 3
aasd 2
gsd3 2
vaasd 1
dtype: int64

Update

Is it possible to get what id1 in each list? for example aasd2-value count 3 and ['85649','85655','56731'].

out = (
dff.merge(df1, on=['ID', 'id1']).drop_duplicates(['ID', 'id1'])
.groupby(['ID', 'id2'])
.agg(**{'Value Count': ('id2', 'size'), 'List id1': ('id1', list)})
.reset_index()
)
print(out)

# Output:
ID id2 Value Count List id1
0 G-00001 aasd 2 [85646, 85647]
1 G-00001 vaasd 1 [8564316]
2 G-00002 aasd2 3 [85649, 85655, 56731]
3 G-00003 gsd3 2 [34566, 78931]

Finding count of distinct elements in DataFrame in each column

As of pandas 0.20 we can use nunique directly on DataFrames, i.e.:

df.nunique()
a 4
b 5
c 1
dtype: int64

Other legacy options:

You could do a transpose of the df and then using apply call nunique row-wise:

In [205]:
df = pd.DataFrame({'a':[0,1,1,2,3],'b':[1,2,3,4,5],'c':[1,1,1,1,1]})
df

Out[205]:
a b c
0 0 1 1
1 1 2 1
2 1 3 1
3 2 4 1
4 3 5 1

In [206]:
df.T.apply(lambda x: x.nunique(), axis=1)

Out[206]:
a 4
b 5
c 1
dtype: int64

EDIT

As pointed out by @ajcr the transpose is unnecessary:

In [208]:
df.apply(pd.Series.nunique)

Out[208]:
a 4
b 5
c 1
dtype: int64


Related Topics



Leave a reply



Submit