How to Remove Groups of Observation with Dplyr::Filter()

How to remove groups of observation with dplyr::filter()

Use filter in conjunction with base::ave

ds %>% dplyr::filter(ave(!is.na(attend), id, FUN = all))

To obtain

    id year attend
 1   1 2007      1
 2   1 2008      1
 3   1 2009      1
 4   1 2010      1
 5   1 2011      1
 6   9 2007      2
 7   9 2008      3
 8   9 2009      3
 9   9 2010      5
 10  9 2011      5

How to filter out whole groups using dplyr, that have columns satisfying a particular property?

We can group_by carb and remove groups which have any value less than 100 for disp.

library(dplyr)
mtcars %>% group_by(carb) %>% filter(all(disp > 100))
#Or
#mtcars %>% group_by(carb) %>% filter(!any(disp < 100))

#     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
# 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
# 3  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
# 4  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# 5  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
# 6  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
# 7  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
# 8  15.2     8  276.   180  3.07  3.78  18       0     0     3     3
# 9  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4
#10  10.4     8  460    215  3     5.42  17.8     0     0     3     4
#11  14.7     8  440    230  3.23  5.34  17.4     0     0     3     4
#12  13.3     8  350    245  3.73  3.84  15.4     0     0     3     4
#13  15.8     8  351    264  4.22  3.17  14.5     0     1     5     4
#14  19.7     6  145    175  3.62  2.77  15.5     0     1     5     6
#15  15       8  301    335  3.54  3.57  14.6     0     1     5     8

Remove groups based on number of observations below a certain value using dplyr

Try creating a new variable to store the values that reach the mentioned condition:

library(dplyr)
#Code
new <- df %>% group_by(Group) %>%
  mutate(Var=sum(Count>0)) %>%
  filter(Var>1) %>% select(-Var)

Output:

# A tibble: 5 x 3
# Groups:   Group [1]
  Group  Year Count
  <chr> <dbl> <dbl>
1 B         1    10
2 B         2    15
3 B         3     8
4 B         4     0
5 B         5     6

R dplyr: how to remove smaller groups?

You can use n() to get the number of rows per group, and filter on it, take a look at ?n(), the last example about the usage of n() is filtering based on the size of groups:

df %>% group_by(group) %>% filter(n() >= 3)

# Source: local data frame [6 x 3]
# Groups: group [2]

#      ID group value
#   <int> <int> <int>
# 1     3     2     0
# 2     4     2     5
# 3     5     2     3
# 4     8     4     3
# 5     9     4     7
# 6    10     4     5

How to delete groups containing less than 3 rows of data in R?

One way to do it is to use the magic n() function within filter:

library(dplyr)

my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))

my_data %>% 
  group_by(Year, Site, Brood) %>% 
  filter(n() >= 3)

The n() function gives the number of rows in the current group (or the number of rows total if there is no grouping).

Remove groups with less than three unique observations

With data.table you could do:

library(data.table)
DT[, if(uniqueN(Day) >= 3) .SD, by = Group]

which gives:

   Group Day
1:     1   1
2:     1   3
3:     1   5
4:     1   5
5:     3   1
6:     3   2
7:     3   3

Or with dplyr:

library(dplyr)
DT %>% 
  group_by(Group) %>% 
  filter(n_distinct(Day) >= 3)

which gives the same result.

dplyr: How to filter groups by subgroup criteria

You could add group_by and filter to the codes

#OP's code
d1 <- dadmom %>%
           gather(key, value, named:incm) %>%
           separate(key, c("variable", "type"), -2) %>%
           spread(variable, value, convert = TRUE)

 d1 %>% 
    group_by(famid) %>%
    filter(all(sum(type=='m' & inc > 15000)==sum(type=='m')))

#    famid type   inc name
# 1     2    d 22000  Art
# 2     2    m 18000  Amy
# 3     3    d 25000 Paul
# 4     3    m 50000  Pat

NOTE: The above will also work when there are multiple 'm's per famid (a bit more general)

For normal cases of single 'm/f' pair per famid

 d1 %>%
     group_by(famid) %>% 
     filter(any(inc >15000 & type=='m'))
 #   famid type   inc name
 #1     2    d 22000  Art
 #2     2    m 18000  Amy
 #3     3    d 25000 Paul
 #4     3    m 50000  Pat

Also, if you wish to use data.table, melt from the devel version i.e. v1.9.5 can take multiple value columns. It can be installed from here

 library(data.table)
 melt(setDT(dadmom), measure.vars=list(c(2,4), c(3,5)), 
    variable.name='type', value.name=c('name', 'inc'))[,
    type:=c('d', 'm')[type]][, .SD[any(type=='m' & inc >15000)] ,famid]
 #    famid type name   inc
 #1:     2    d  Art 22000
 #2:     2    m  Amy 18000
 #3:     3    d Paul 25000
 #4:     3    m  Pat 50000

Remove group from data.frame if at least one group member meets condition

Try

library(dplyr)
df2 %>%
  group_by(group) %>%
  filter(!any(world == "AF"))

Or as per metionned by @akrun:

setDT(df2)[, if(!any(world == "AF")) .SD, group]

setDT(df2)[, if(all(world != "AF")) .SD, group]

Which gives:

#Source: local data frame [7 x 3]
#Groups: group
#
#  world place group
#1    AB     1     1
#2    AC     1     1
#3    AD     2     1
#4    AB     1     3
#5    AE     2     3
#6    AC     3     3
#7    AE     1     3

Removing NA observations with dplyr::filter()

From @Ben Bolker:

[T]his has nothing specifically to do with dplyr::filter()

From @Marat Talipov:

[A]ny comparison with NA, including NA==NA, will return NA

From a related answer by @farnsy:

The == operator does not treat NA's as you would expect it to.

Think of NA as meaning "I don't know what's there". The correct answer
to 3 > NA is obviously NA because we don't know if the missing value
is larger than 3 or not. Well, it's the same for NA == NA. They are
both missing values but the true values could be quite different, so
the correct answer is "I don't know."

R doesn't know what you are doing in your analysis, so instead of
potentially introducing bugs that would later end up being published
an embarrassing you, it doesn't allow comparison operators to think NA
is a value.

How to Remove Groups of Observation with Dplyr::Filter()