Counting Unique Items in Data Frame

Counting unique values in a column in pandas dataframe like in Qlik?

Count distinct values, use nunique:

df['hID'].nunique()
5

Count only non-null values, use count:

df['hID'].count()
8

Count total values including null values, use the size attribute:

df['hID'].size
8

Edit to add condition

Use boolean indexing:

df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])

OR using query:

df.query('mID == "A"')['hID'].agg(['nunique','count','size'])

Output:

nunique    5
count      5
size       5
Name: hID, dtype: int64

Pandas Counting Unique Rows

You can use size with reset_index:

print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
   ColA  ColB  Count
0     1     1      3
1     1     2      2
2     2     1      1
3     3     2      1

Count of unique values that occur more than 100 in a data frame

This would solve the problem.


import pandas as pd

# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()

# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()

# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)

this is the sample dataframe:

  drug_name
0     hello
1     hello
2     hello
3       bye
4       bye

These are the unique values counted:

hello    4
bye      2

These are the unique values > n:

hello    4

Count unique values per groups with Pandas

You need nunique:

df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'facebook.com'    1
'google.com'      1
'twitter.com'     2
'vk.com'          3
Name: ID, dtype: int64

If you need to strip ' characters:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com    1
google.com      1
twitter.com     2
vk.com          3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
    domain  ID
0       fb   1
1      ggl   1
2  twitter   2
3       vk   3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

python, count unique list values of a list inside a data frame

Set the index of dataframe as age, then use Series.explode on column what colours do you like?' then use groupby on level=0 and aggregate the series using value_counts:

df1 = (
    df.set_index('age')['what colours do you like?'].explode()
    .rename('color').groupby(level=0).value_counts().reset_index(name='count')
)

Result:

print(df1)
     age   color  count
0  18-25  yellow      2
1  18-25   green      1
2  18-25  orange      1
3  26-30    blue      2
4  26-30     red      2
5  26-30   green      1
6  26-30  orange      1

Count unique occurrences within data frame

One option could be:

sapply(df, function(x) table(factor(x, levels = unique(unlist(df)))))

  V1 v2 v3
A  1  1  2
B  1  2  0
D  1  0  1
C  0  1  1

Counting Unique Values in a Column

Do with get_dummies

df.category.str.get_dummies(',').replace(0,np.nan).stack().sum(level=1)

New data frame with unique values and counts

The expected output is not clear. Some assumptions of expected output

Sum of 'N' by 'date'

library(data.table)
dt[, .(N = sum(N, na.rm = TRUE)), by = date]

Count of unique 'article_id' for each date

dt1[, .(N = uniqueN(article_id)), by = date]

Get the first count by 'date'

dt1[, .(N = first(N)), by = date]

How to count unique values in pandas column base on dictionary values

From my previous answer, I slightly modified the code:

data = []
for g, d in dic.items():
    for k, l in d.items():
        data.extend([(g, v, k) for v in l])
df1 = pd.DataFrame(data, columns=['ID', 'id1', 'id2'])

out = dff.merge(df1, on=['ID', 'id1']) \
         .drop_duplicates(['ID', 'id1']) \
         .value_counts('id2')
print(out)

# Output:
id2
aasd2    3
aasd     2
gsd3     2
vaasd    1
dtype: int64

Update

Is it possible to get what id1 in each list? for example aasd2-value count 3 and ['85649','85655','56731'].

out = (
  dff.merge(df1, on=['ID', 'id1']).drop_duplicates(['ID', 'id1'])
     .groupby(['ID', 'id2'])
     .agg(**{'Value Count': ('id2', 'size'), 'List id1': ('id1', list)})
     .reset_index()
)
print(out)

# Output:
        ID    id2  Value Count               List id1
0  G-00001   aasd            2         [85646, 85647]
1  G-00001  vaasd            1              [8564316]
2  G-00002  aasd2            3  [85649, 85655, 56731]
3  G-00003   gsd3            2         [34566, 78931]

Finding count of distinct elements in DataFrame in each column

As of pandas 0.20 we can use nunique directly on DataFrames, i.e.:

df.nunique()
a    4
b    5
c    1
dtype: int64

Other legacy options:

You could do a transpose of the df and then using apply call nunique row-wise:

In [205]:
df = pd.DataFrame({'a':[0,1,1,2,3],'b':[1,2,3,4,5],'c':[1,1,1,1,1]})
df

Out[205]:
   a  b  c
0  0  1  1
1  1  2  1
2  1  3  1
3  2  4  1
4  3  5  1

In [206]:
df.T.apply(lambda x: x.nunique(), axis=1)

Out[206]:
a    4
b    5
c    1
dtype: int64

EDIT

As pointed out by @ajcr the transpose is unnecessary:

In [208]:
df.apply(pd.Series.nunique)

Out[208]:
a    4
b    5
c    1
dtype: int64