Select Groups With More Than One Distinct Value

Select groups with more than one distinct value per group

Using data.table

library(data.table) #see: https://github.com/Rdatatable/data.table/wiki for more
setDT(data) #convert to native 'data.table' type by reference
data[ , if(uniqueN(category) > 1) .SD, by = ID]

uniqueN is data.table's (fast) native mask for length(unique()), and .SD is just the whole data.table (in more general cases, it can represent a subset of columns, e.g. when the .SDcols argument is activated). So basically the middle statement (j, the column selection argument) says to return all columns and rows associated with an ID for which there are at least two distinct values of category.

Use the by argument to extend to a case involving counts ok multiple columns.

Select groups with more than one distinct value

Several possibilities, here's my favorite

library(data.table)
setDT(df)[, if(+var(number)) .SD, by = from]
# from number
# 1: 2 1
# 2: 2 2

Basically, per each group we are checking if there is any variance, if TRUE, then return the group values


With base R, I would go with

df[as.logical(with(df, ave(number, from, FUN = var))), ]
# from number
# 3 2 1
# 4 2 2

Edit: for a non numerical data you could try the new uniqueN function for the devel version of data.table (or use length(unique(number)) > 1 instead

setDT(df)[, if(uniqueN(number) > 1) .SD, by = from]

Grouping together results of multiple GROUP_CONCAT() with distinct values only

One option is to unpivot the columns to rows before grouping. In MySQL, you can do this with union all:

select company, group_concat(distinct typex order by typex) res
from (
select company, type1 typex from mytable
union all select company, type2 from mytable
union all select company, type3 from mytable
) t
group by company

Demo on DB Fiddle:


company | res
:------- | :----
Generic | 1,2,3
Generic2 | 1,2

Select groups based on number of unique / distinct values

You can make a selector for sample using ave many different ways.

sample[ ave( sample$Value, sample$Group, FUN = function(x) length(unique(x)) ) == 1,]

or

sample[ ave( sample$Value, sample$Group, FUN = function(x) sum(x - x[1]) ) == 0,]

or

sample[ ave( sample$Value, sample$Group, FUN = function(x) diff(range(x)) ) == 0,]

How to GROUP BY on multiple DISTINCT columns? ( Google Big Query )

if you need to get distinct a, b combinations then this one could help:

SELECT
Date,
COUNT(DISTINCT(CONCAT(a,'_',b))) distinct_a_b
FROM process
GROUP BY date

Pandas GroupBy - Show only groups with more than one unique feature-value

You can use groupby to filter groups that have an nunique count over 1.

v = df_things.groupby('CLASS').B.value_counts()
v[v.groupby(level=0).transform('nunique').gt(1)]

CLASS B
Car bal3 2
bal2 1
Ship bal1 2
bal2 1
Name: B, dtype: int64

group and have more than one value in sql

This is pretty simple:

select id
from t
group by id
having min(type_id) <> max(type_id);

This is pretty much a direct translation of your description. (Note: You can use count(distinct) as well but that incurs more overhead.)



Related Topics



Leave a reply



Submit