Select groups based on number of unique / distinct values
You can make a selector for sample
using ave
many different ways.
sample[ ave( sample$Value, sample$Group, FUN = function(x) length(unique(x)) ) == 1,]
or
sample[ ave( sample$Value, sample$Group, FUN = function(x) sum(x - x[1]) ) == 0,]
or
sample[ ave( sample$Value, sample$Group, FUN = function(x) diff(range(x)) ) == 0,]
Count distinct values depending on group
You would use count(distinct)
:
select "group", count(distinct id)
from t
group by "group";
Note that group
is a very poor name for a column because it is a SQL keyword. Hopefully the real column name is something more reasonable.
Filter based on number of distinct values per group
We can group by 'names' and filter
the 'sex' having unique
number of elements greater than 1
library(dplyr)
df %>%
group_by(names) %>%
filter(n_distinct(sex) > 1)
Or another option is to group by 'names' and filter
the groups having both the 'M' and 'F'
df %>%
group_by(names) %>%
filter(all(c("M", "F") %in% sex))
Select groups with more than one distinct value
Several possibilities, here's my favorite
library(data.table)
setDT(df)[, if(+var(number)) .SD, by = from]
# from number
# 1: 2 1
# 2: 2 2
Basically, per each group we are checking if there is any variance, if TRUE
, then return the group values
With base R, I would go with
df[as.logical(with(df, ave(number, from, FUN = var))), ]
# from number
# 3 2 1
# 4 2 2
Edit: for a non numerical data you could try the new uniqueN
function for the devel version of data.table
(or use length(unique(number)) > 1
instead
setDT(df)[, if(uniqueN(number) > 1) .SD, by = from]
Counting unique / distinct values by group in a data frame
This should do the trick:
ddply(myvec,~name,summarise,number_of_distinct_orders=length(unique(order_no)))
This requires package plyr.
Select groups with more than one distinct value per group
Using data.table
library(data.table) #see: https://github.com/Rdatatable/data.table/wiki for more
setDT(data) #convert to native 'data.table' type by reference
data[ , if(uniqueN(category) > 1) .SD, by = ID]
uniqueN
is data.table
's (fast) native mask for length(unique())
, and .SD
is just the whole data.table
(in more general cases, it can represent a subset of columns, e.g. when the .SDcols
argument is activated). So basically the middle statement (j
, the column selection argument) says to return all columns and rows associated with an ID
for which there are at least two distinct values of category
.
Use the by
argument to extend to a case involving counts ok multiple columns.
R - Count unique/distinct values in two columns together per group
You can subset the data from cur_data()
and unlist
the data to get a vector. Use n_distinct
to count number of unique values.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Count = n_distinct(unlist(select(cur_data(),
Party, Party2013)), na.rm = TRUE)) %>%
ungroup
# ID Wave Party Party2013 Count
# <int> <int> <chr> <chr> <int>
#1 1 1 A A 2
#2 1 2 A NA 2
#3 1 3 B NA 2
#4 1 4 B NA 2
#5 2 1 A C 3
#6 2 2 B NA 3
#7 2 3 B NA 3
#8 2 4 B NA 3
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A",
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))
R group by | count distinct values grouping by another column
One way
test_df |>
distinct() |>
count(post_pagename)
# post_pagename n
# <fct> <int>
# 1 A 3
# 2 B 2
# 3 C 1
# 4 D 1
Or another
test_df |>
group_by(post_pagename) |>
summarise(distinct_visit_ids = n_distinct(visit_id))
# A tibble: 4 x 2
# post_pagename distinct_visit_ids
# <fct> <int>
#1 A 3
#2 B 2
#3 C 1
#4 D 1
*D has one visit, so it must be counted*
Related Topics
Installing Rgl on Ubuntu and Mac: X11 Not Found
How to Reshape Data from Long to Wide Format
What Specifically Are the Dangers of Eval(Parse(...))
Dynamically Select Data Frame Columns Using $ and a Character Value
How to Convert a Factor to Integer\Numeric Without Loss of Information
Transpose/Reshape Dataframe Without "Timevar" from Long to Wide Format
Linear Regression and Group by in R
Split Data Frame String Column into Multiple Columns
Select the Row With the Maximum Value in Each Group
Count Number of Rows Within Each Group
How to Debug "Contrasts Can Be Applied Only to Factors With 2 or More Levels" Error
Error in If/While (Condition) {: Missing Value Where True/False Needed
Why Is '[' Better Than 'Subset'
Gather Multiple Sets of Columns