Filter Based on Number of Distinct Values Per Group

Filter based on number of distinct values per group

We can group by 'names' and filter the 'sex' having unique number of elements greater than 1

library(dplyr)
df %>% 
   group_by(names) %>%
   filter(n_distinct(sex) > 1)

Or another option is to group by 'names' and filter the groups having both the 'M' and 'F'

df %>%
  group_by(names) %>%
  filter(all(c("M", "F") %in% sex))

Filter in SQL on distinct values after grouping

If all you need is the column col1 you can group by col1 and set the condition in the HAVING clause:

SELECT col1
FROM tablename
GROUP BY col1
HAVING COUNT(DISTINCT col2) = 1;

If you want all the rows from the table use the above query with the operator IN:

SELECT *
FROM tablename 
WHERE col1 IN (
  SELECT col1
  FROM tablename
  GROUP BY col1
  HAVING COUNT(DISTINCT col2) = 1
)

Select groups based on number of unique / distinct values

You can make a selector for sample using ave many different ways.

sample[ ave( sample$Value, sample$Group, FUN = function(x) length(unique(x)) ) == 1,]

sample[ ave( sample$Value, sample$Group, FUN = function(x) sum(x - x[1]) ) == 0,]

sample[ ave( sample$Value, sample$Group, FUN = function(x) diff(range(x)) ) == 0,]

Filter list of distinct values from one column of grouped data in the same order as it shows

Couldn't recreate your data exactly for some reason so my output is different but here are a few quick methods that should achieve your desired outcome:

A few edits to your original code gives us a data frame with the proper cities in the proper order:

library(dplyr)

set.seed(42)

id    <- seq_len(10)
city  <- sample(c('Miami', 'Seattle', 'Houston', 'Toronto', 'Tokyo', 'Mumbai', 'Austin'), 10, replace = TRUE)
state <- sample(c('ON', 'WA', 'TX', 'MA'), 10, replace = TRUE)
rent  <- sample(800:1900, 10)

data  <- data.frame(id, city, state, rent)

data %>% 
  group_by(id, city, state) %>% 
  summarise(total_rent = sum(rent)) %>% 
  group_by(city) %>% 
  slice_max(1) %>% 
  arrange(desc(total_rent)) %>% 
  ungroup()
#> # A tibble: 5 x 4
#>      id city    state total_rent
#>   <int> <chr>   <chr>      <int>
#> 1     1 Miami   MA          1698
#> 2     6 Toronto WA          1659
#> 3     5 Seattle ON          1420
#> 4     2 Tokyo   TX          1400
#> 5    10 Austin  TX          1098

For just the values, the pull() / unique() combo is quite nice:

data %>% 
  group_by(id, city, state) %>% 
  summarise(total_rent = sum(rent)) %>% 
  arrange(desc(total_rent)) %>% 
  pull(city) %>% 
  unique()
#> [1] "Miami"   "Toronto" "Seattle" "Tokyo"   "Austin"

Another possible solution could involve factoring the cities in order after you've arranged them. This is achieved with library(forecats):

library(forcats)
library(magrittr)

data %>% 
  group_by(id, city, state) %>% 
  summarise(total_rent = sum(rent)) %>% 
  arrange(desc(total_rent)) %>% 
  ungroup() %>% 
  mutate(city = fct_inorder(city)) %$% 
  levels(city) 
#> [1] "Miami"   "Toronto" "Seattle" "Tokyo"   "Austin"

^{Created on 2021-03-04 by the reprex package (v0.3.0)}

How to chain group_by, filter, distinct, count in data.table?

The distinct in dplyr can be unique in data.table with by option

unique(setDT(test_df)[!is.na(date)], by = c("id", "date"))[, .N, by = id][N > 1]
     id N
1: 5678 2

Steps are as follows

Convert to data.table (setDT)
Remove the rows with NA from 'date' (!is.na(date))
Get the unique rows by the 'id' and 'date' column
Do a group by 'id' to get the count (.N)
Finally, filter the rows where count is greater than 1

R dplyr - Filter unique row in each group with dplyr

dat %>%
  mutate(rn = row_number()) %>%
  arrange(flag) %>%
  group_by(id, col2, col3) %>%
  slice(1) %>%
  ungroup() %>%
  arrange(rn) %>%
  select(-rn)
# # A tibble: 4 x 5
#      id col2  col3   flag   val
#   <int> <chr> <chr> <int> <int>
# 1     1 a     q        NA    NA
# 2     1 a     w         1    NA
# 3     1 b     r        NA    NA
# 4     2 c     q         1     5

If your data is instead strings with empty strings (it's not clear in the question), then

dat %>%
  # this is just to transform my number-based 'flag'/'val' to strings, you don't need this
  mutate(across(c(flag, val), ~ if_else(is.na(.), "", as.character(.)))) %>%
  # pick up here
  mutate(rn = row_number()) %>%
  arrange(!nzchar(flag)) %>%       # this is the only difference from above
  group_by(id, col2, col3) %>%
  slice(1) %>%
  ungroup() %>%
  arrange(rn) %>%
  select(-rn)
# # A tibble: 4 x 5
#      id col2  col3  flag  val  
#   <int> <chr> <chr> <chr> <chr>
# 1     1 a     q     ""    ""   
# 2     1 a     w     "1"   ""   
# 3     1 b     r     ""    ""   
# 4     2 c     q     "1"   "5"

The use of rn is merely to ensure that the order is preserved across the filtering. If order is not an issue (perhaps it's inferred some other way), then you can remove the mutate, and the trailing arrange(rn) %>% select(-rn).

Data

dat <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L), col2 = c("a", "a", "b", "c", "c", "c"), col3 = c("q", "w", "r", "q", "q", "q"), flag = c(NA, 1L, NA, 1L, NA, 1L), val = c(NA, NA, NA, 5L, NA, 6L)), class = "data.frame", row.names = c(NA, -6L))

filter column by count of distinct values

You can add another column in the summarize to count the number of records per group and then filter based on it:

my_tibble %>% 
    group_by(A) %>% 
    summarise(percentage = mean(B), n = n()) %>% 
    filter(percentage > 0, n > 1)

# A tibble: 2 x 3
#      A percentage     n
#  <chr>      <dbl> <int>
#1     a       0.75     4
#2     b       0.50     2

SQL Filter rows based on multiple distinct values of a column

You shouldn't GROUP BY the description if you are doing a DISTINCT COUNT on it (then it will always be just 1). Try something like this:

SELECT P2.PLU, P2.Description
FROM @YourTable P2 
WHERE P2.PLU in (
      SELECT P.PLU 
      FROM @YourTable P
      GROUP BY P.PLU
      HAVING COUNT(DISTINCT(P.DESCRIPTION)) > 1
)

Filter Based on Number of Distinct Values Per Group