How to check if at least one of a group of rows has a specific value
Add one more condition which checks if the group has atleast one eligible=True value.
SELECT *
FROM ExampleTable
WHERE Group in
(SELECT Group
FROM ExampleTable
GROUP BY Group
HAVING (count(distinct LastName) > 1 or count(distinct FirstName) > 1)
and count(case when eligible='True' then 1 end) >= 1
)
group_by filter to groups that share at least one matching value
We can look at the length of age
for each group and compare it to the length of unique(age)
. If length(age) > length(unique(age))
at least two of the observations share an age.
library(dplyr)
df %>%
group_by(household) %>%
filter(length(age) > length(unique(age)))
# id household age
# <dbl> <dbl> <dbl>
# 1 1 1 19
# 2 2 1 19
# 3 3 1 45
@Dave2e pointed out another more dplyr
-y way to do this:
df %>%
group_by(household) %>%
filter(n() > n_distinct(age))
MySQL query to select groups containing at least a certain number of elements
You need Group By
and Having
clause
select G_id
from yourtable
where E_id in (1,3)
group by G_id
having count(distinct E_id) = 2
Update:
select G_id
from yourtable
group by G_id
having count(case when E_id = 1 then 1 end) > 0
and count(case when E_id = 3 then 1 end) > 0
and count(distinct E_id) = 3
GROUP BY Create group if at least one value in group meets condition
With the OP's modified requirement:
select type
from test
group by type
having count(*) > 1 and count(case when color = 'G' then 0 end) > 0
;
Select groups with row containing specific value (with dplyr and pipes)
After grouping by 'id', subset the 'string' for the second element and apply %in%
with "is" on the lhs of %in%
to return a single TRUE per group
library(dplyr)
df %>%
group_by(id) %>%
filter('is' %in% string[2]) %>%
ungroup
-output
# A tibble: 8 x 2
# id string
# <chr> <chr>
#1 id_1 here
#2 id_1 is
#3 id_1 some
#4 id_1 text
#5 id_2 here
#6 id_2 is
#7 id_2 other
#8 id_2 text
Group by having at least one of each item
One way of doing this is to count how many different pets each person has and to compare it (i.e. join it) with the total number of different pets:
SELECT person_id
FROM (SELECT person_id, COUNT(DISTINCT pet) AS dp
FROM pets
GROUP BY person_id) a
JOIN (SELECT COUNT(DISTINCT pet) AS dp FROM pets) b ON a.dp = b.dp
EDIT:
If just some pets are considered "ideal", and this list is known upfront, the query can be greatly simplified by introducing this information in a where
clause:
SELECT person_id
FROM pets
WHERE pet IN ('dog', 'cat', 'tiger')
GROUP BY person_id
HAVING COUNT(DISTINCT pet) = 3
Select grouped rows with at least one matching criterion
I would do it like this:
Data_File %>% group_by(Group_ID) %>%
filter(any(Product_Name %in% "ABCD"))
# Source: local data frame [7 x 3]
# Groups: Group_ID [3]
#
# Group_ID Product_Name Qty
# <dbl> <chr> <dbl>
# 1 123 ABCD 2
# 2 123 EFGH 3
# 3 123 XYZ1 4
# 4 123 Z123 5
# 5 234 ABCD 6
# 6 444 ABCD 8
# 7 444 ABCD 9
Explanation: any()
will return TRUE
if there are any rows (within the group) that match the condition. The length-1 result will then be recycled to the full length of the group and the entire group will be kept. You could also do it with sum(Product_name %in% "ABCD") > 0
as the condition, but the any reads very nicely. Use sum
instead if you wanted a more complicated condition, like 3 or more matching product names.
I prefer%in%
to ==
for things like this because it has better behavior with NA
and it is easy to expand if you wanted to check for any of multiple products by group.
If speed and efficiency are an issue, data.table
will be faster. I would do it like this, which relies on a keyed join for the filtering and uses no non-data.table operations, so it should be very fast:
library(data.table)
df = as.data.table(df)
setkey(df)
groups = unique(subset(df, Product_Name %in% "ABCD", Group_ID))
df[groups, nomatch = 0]
# Group_ID Product_Name Qty
# 1: 123 ABCD 2
# 2: 123 EFGH 3
# 3: 123 XYZ1 4
# 4: 123 Z123 5
# 5: 234 ABCD 6
# 6: 444 ABCD 8
# 7: 444 ABCD 9
Pick groups that have at least one non-missing value in R
This would be better if we had a reproducible example, but let's create a toy version of your data:
DataX <- data.frame(orgcode = rep(LETTERS[1:5], each = 3),
budget = c(NA, 21000, 22000,
30000, NA, 40000,
NA, NA, NA,
12000, 15000, 14000,
NA, NA, NA))
DataX
#> orgcode budget
#> 1 A NA
#> 2 A 21000
#> 3 A 22000
#> 4 B 30000
#> 5 B NA
#> 6 B 40000
#> 7 C NA
#> 8 C NA
#> 9 C NA
#> 10 D 12000
#> 11 D 15000
#> 12 D 14000
#> 13 E NA
#> 14 E NA
#> 15 E NA
We can see that organizations with the orgcode
C and E have all NA
values and should be removed. We can do this by using a dummy variable to find out whether each group is all(is.na(budget))
and filter on that:
library(dplyr)
DataX %>%
group_by(orgcode) %>%
mutate(allNA = !all(is.na(budget))) %>%
filter(allNA) %>%
select(-allNA)
#> # A tibble: 9 x 2
#> # Groups: orgcode [3]
#> orgcode budget
#> <fct> <dbl>
#> 1 A NA
#> 2 A 21000
#> 3 A 22000
#> 4 B 30000
#> 5 B NA
#> 6 B 40000
#> 7 D 12000
#> 8 D 15000
#> 9 D 14000
Created on 2020-07-29 by the reprex package (v0.3.0)
Related Topics
Reasons For Using the Set.Seed Function
Returning Multiple Objects in an R Function
Remove Legend Entries For Some Factors Levels
Test If Characters Are in a String
Remove Na Values from a Vector
Controlling Ggplot2 Legend Display Order
Simplest Way to Do Grouped Barplot
R: Use Magrittr Pipe Operator in Self Written Package
How to Swap Values Between Two Columns
Selecting Only Numeric Columns from a Data Frame
Convert Data.Frame Column Format from Character to Factor
A Similar Function to R'S Rep in Matlab
How to Install Packages in Latest Version of Rstudio and R Version.3.1.1