Counting Unique/Distinct Values by Group in a Data Frame

Count unique values using pandas groupby

I think you can use SeriesGroupBy.nunique:

print (df.groupby('param')['group'].nunique())
param
a    2
b    1
Name: group, dtype: int64

Another solution with unique, then create new df by DataFrame.from_records, reshape to Series by stack and last value_counts:

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a    2
b    1
dtype: int64

Count unique values per groups with Pandas

You need nunique:

df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'facebook.com'    1
'google.com'      1
'twitter.com'     2
'vk.com'          3
Name: ID, dtype: int64

If you need to strip ' characters:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
facebook.com    1
google.com      1
twitter.com     2
vk.com          3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
    domain  ID
0       fb   1
1      ggl   1
2  twitter   2
3       vk   3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

Add count of unique / distinct values by group to the original data

Using ave (since you ask for it specifically):

within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})

Make sure that type is character vector and not factor.

Since you also say your data is huge and that speed/performance may therefore be a factor, I'd suggest a data.table solution as well.

require(data.table)
setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+
# if you don't want df to be modified by reference
ans = as.data.table(df)[, count := uniqueN(type), by = color]

uniqueN was implemented in v1.9.6 and is a faster equivalent of length(unique(.)). In addition it also works with data.frames/data.tables.

Python group by and count distinct values in a column and create delimited list

You can use str.len in your code:

df3 = (df.groupby('company')['product']
         .apply(lambda x: list(x.unique()))
         .reset_index()
         .assign(count=lambda d: d['product'].str.len())  ## added line
      )

output:

     company            product  count
0     Amazon           [E-comm]      1
1   Facebook     [Social Media]      1
2     Google  [Search, Android]      2
3  Microsoft        [OS, X-box]      2

Counting unique / distinct values by group in a data frame

This should do the trick:

ddply(myvec,~name,summarise,number_of_distinct_orders=length(unique(order_no)))

This requires package plyr.

R group_by and count distinct values in dataframe column with condition, using mutate

Since c is unique, you can approach it from the other way - count the number of c values that show up in val.

df %>% 
  group_by(id) %>% 
  mutate(distinctValues = sum(c %in% val))
# # A tibble: 14 x 3
# # Groups:   id [6]
#       id   val distinctValues
#    <dbl> <dbl>          <int>
#  1     1   100              0
#  2     1   100              0
#  3     2   200              1
#  4     2   300              1
#  5     3   400              0
#  6     4   500              1
#  7     4   500              1
#  8     5   500              1
#  9     5   600              1
# 10     5   600              1
# 11     6   200              2
# 12     6   200              2
# 13     6   300              2
# 14     6   500              2

You could also use distinctValues = sum(unique(val) %in% c) if that seems clearer - it might be a tad less efficient, but not enough to matter unless your data is massive.

How to count the number of unique values by group?

I think you've got it all wrong here. There is no need neither in plyr or <- when using data.table.

Recent versions of data.table, v >= 1.9.6, have a new function uniqueN() just for that.

library(data.table) ## >= v1.9.6
setDT(d)[, .(count = uniqueN(color)), by = ID]
#    ID count
# 1:  A     3
# 2:  B     2

If you want to create a new column with the counts, use the := operator

setDT(d)[, count := uniqueN(color), by = ID]

Or with dplyr use the n_distinct function

library(dplyr)
d %>%
  group_by(ID) %>%
  summarise(count = n_distinct(color))
# Source: local data table [2 x 2]
# 
#   ID count
# 1  A     3
# 2  B     2

Or (if you want a new column) use mutate instead of summary

d %>%
  group_by(ID) %>%
  mutate(count = n_distinct(color))

R - Count unique/distinct values in two columns together per group

You can subset the data from cur_data() and unlist the data to get a vector. Use n_distinct to count number of unique values.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(Count = n_distinct(unlist(select(cur_data(), 
                   Party, Party2013)), na.rm = TRUE)) %>%
  ungroup


#     ID  Wave Party Party2013 Count
#  <int> <int> <chr> <chr>     <int>
#1     1     1 A     A             2
#2     1     2 A     NA            2
#3     1     3 B     NA            2
#4     1     4 B     NA            2
#5     2     1 A     C             3
#6     2     2 B     NA            3
#7     2     3 B     NA            3
#8     2     4 B     NA            3

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A", 
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

pandas dataframe group by multiple columns and count distinct values

Combine value_counts with apply to do it per column:

df.apply(pd.value_counts)