Select groups with more than one distinct value per group
Using data.table
library(data.table) #see: https://github.com/Rdatatable/data.table/wiki for more
setDT(data) #convert to native 'data.table' type by reference
data[ , if(uniqueN(category) > 1) .SD, by = ID]
uniqueN
is data.table
's (fast) native mask for length(unique())
, and .SD
is just the whole data.table
(in more general cases, it can represent a subset of columns, e.g. when the .SDcols
argument is activated). So basically the middle statement (j
, the column selection argument) says to return all columns and rows associated with an ID
for which there are at least two distinct values of category
.
Use the by
argument to extend to a case involving counts ok multiple columns.
Select groups with more than one distinct value
Several possibilities, here's my favorite
library(data.table)
setDT(df)[, if(+var(number)) .SD, by = from]
# from number
# 1: 2 1
# 2: 2 2
Basically, per each group we are checking if there is any variance, if TRUE
, then return the group values
With base R, I would go with
df[as.logical(with(df, ave(number, from, FUN = var))), ]
# from number
# 3 2 1
# 4 2 2
Edit: for a non numerical data you could try the new uniqueN
function for the devel version of data.table
(or use length(unique(number)) > 1
instead
setDT(df)[, if(uniqueN(number) > 1) .SD, by = from]
Grouping together results of multiple GROUP_CONCAT() with distinct values only
One option is to unpivot the columns to rows before grouping. In MySQL, you can do this with union all
:
select company, group_concat(distinct typex order by typex) res
from (
select company, type1 typex from mytable
union all select company, type2 from mytable
union all select company, type3 from mytable
) t
group by company
Demo on DB Fiddle:
company | res
:------- | :----
Generic | 1,2,3
Generic2 | 1,2
Select groups based on number of unique / distinct values
You can make a selector for sample
using ave
many different ways.
sample[ ave( sample$Value, sample$Group, FUN = function(x) length(unique(x)) ) == 1,]
or
sample[ ave( sample$Value, sample$Group, FUN = function(x) sum(x - x[1]) ) == 0,]
or
sample[ ave( sample$Value, sample$Group, FUN = function(x) diff(range(x)) ) == 0,]
How to GROUP BY on multiple DISTINCT columns? ( Google Big Query )
if you need to get distinct a, b combinations then this one could help:
SELECT
Date,
COUNT(DISTINCT(CONCAT(a,'_',b))) distinct_a_b
FROM process
GROUP BY date
Pandas GroupBy - Show only groups with more than one unique feature-value
You can use groupby
to filter groups that have an nunique
count over 1.
v = df_things.groupby('CLASS').B.value_counts()
v[v.groupby(level=0).transform('nunique').gt(1)]
CLASS B
Car bal3 2
bal2 1
Ship bal1 2
bal2 1
Name: B, dtype: int64
group and have more than one value in sql
This is pretty simple:
select id
from t
group by id
having min(type_id) <> max(type_id);
This is pretty much a direct translation of your description. (Note: You can use count(distinct)
as well but that incurs more overhead.)
Related Topics
Read All Worksheets in an Excel Workbook into an R List With Data.Frames
Dplyr::Select Function Clashes With Mass::Select
Read All Files in Directory and Apply Multiple Functions to Each Data Frame
Add Correct Century to Dates With Year Provided as "Year Without Century", %Y
How to Make Consistent-Width Plots in Ggplot (With Legends)
How to Convert Dataframe into Time Series
What Does the Dot Mean in R - Personal Preference, Naming Convention or More
Group by Multiple Columns in Dplyr, Using String Vector Input
How to See the Source Code of R .Internal or .Primitive Function
Merging Two Data Frames Using Fuzzy/Approximate String Matching in R
Create a Data.Frame Where a Column Is a List
Increment by 1 For Every Change in Column
Fitting Several Regression Models With Dplyr