Select Groups Based on Number of Unique/Distinct Values

Select groups based on number of unique / distinct values

You can make a selector for sample using ave many different ways.

sample[ ave( sample$Value, sample$Group, FUN = function(x) length(unique(x)) ) == 1,]

sample[ ave( sample$Value, sample$Group, FUN = function(x) sum(x - x[1]) ) == 0,]

sample[ ave( sample$Value, sample$Group, FUN = function(x) diff(range(x)) ) == 0,]

Count distinct values depending on group

You would use count(distinct):

select "group", count(distinct id)
from t
group by "group";

Note that group is a very poor name for a column because it is a SQL keyword. Hopefully the real column name is something more reasonable.

Filter based on number of distinct values per group

We can group by 'names' and filter the 'sex' having unique number of elements greater than 1

library(dplyr)
df %>% 
   group_by(names) %>%
   filter(n_distinct(sex) > 1)

Or another option is to group by 'names' and filter the groups having both the 'M' and 'F'

df %>%
  group_by(names) %>%
  filter(all(c("M", "F") %in% sex))

Select groups with more than one distinct value

Several possibilities, here's my favorite

library(data.table)
setDT(df)[, if(+var(number)) .SD, by = from]
#    from number
# 1:    2      1
# 2:    2      2

Basically, per each group we are checking if there is any variance, if TRUE, then return the group values

With base R, I would go with

df[as.logical(with(df, ave(number, from, FUN = var))), ]
#   from number
# 3    2      1
# 4    2      2

Edit: for a non numerical data you could try the new uniqueN function for the devel version of data.table (or use length(unique(number)) > 1 instead

setDT(df)[, if(uniqueN(number) > 1) .SD, by = from]

Counting unique / distinct values by group in a data frame

This should do the trick:

ddply(myvec,~name,summarise,number_of_distinct_orders=length(unique(order_no)))

This requires package plyr.

Select groups with more than one distinct value per group

Using data.table

library(data.table) #see: https://github.com/Rdatatable/data.table/wiki for more
setDT(data) #convert to native 'data.table' type by reference
data[ , if(uniqueN(category) > 1) .SD, by = ID]

uniqueN is data.table's (fast) native mask for length(unique()), and .SD is just the whole data.table (in more general cases, it can represent a subset of columns, e.g. when the .SDcols argument is activated). So basically the middle statement (j, the column selection argument) says to return all columns and rows associated with an ID for which there are at least two distinct values of category.

Use the by argument to extend to a case involving counts ok multiple columns.

R - Count unique/distinct values in two columns together per group

You can subset the data from cur_data() and unlist the data to get a vector. Use n_distinct to count number of unique values.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(Count = n_distinct(unlist(select(cur_data(), 
                   Party, Party2013)), na.rm = TRUE)) %>%
  ungroup


#     ID  Wave Party Party2013 Count
#  <int> <int> <chr> <chr>     <int>
#1     1     1 A     A             2
#2     1     2 A     NA            2
#3     1     3 B     NA            2
#4     1     4 B     NA            2
#5     2     1 A     C             3
#6     2     2 B     NA            3
#7     2     3 B     NA            3
#8     2     4 B     NA            3

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A", 
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

R group by | count distinct values grouping by another column

One way

test_df |>
  distinct() |>
  count(post_pagename)

#   post_pagename     n
#   <fct>         <int>
# 1 A                 3
# 2 B                 2
# 3 C                 1
# 4 D                 1

Or another

test_df |>
  group_by(post_pagename) |>
  summarise(distinct_visit_ids = n_distinct(visit_id))

# A tibble: 4 x 2
#  post_pagename distinct_visit_ids
#  <fct>                      <int>
#1 A                              3
#2 B                              2
#3 C                              1
#4 D                              1

*D has one visit, so it must be counted*

Select Groups Based on Number of Unique/Distinct Values