How to Remove Groups of Observation with Dplyr::Filter()

How to remove groups of observation with dplyr::filter()

Use filter in conjunction with base::ave

ds %>% dplyr::filter(ave(!is.na(attend), id, FUN = all))

To obtain

    id year attend
1 1 2007 1
2 1 2008 1
3 1 2009 1
4 1 2010 1
5 1 2011 1
6 9 2007 2
7 9 2008 3
8 9 2009 3
9 9 2010 5
10 9 2011 5

How to filter out whole groups using dplyr, that have columns satisfying a particular property?

We can group_by carb and remove groups which have any value less than 100 for disp.

library(dplyr)
mtcars %>% group_by(carb) %>% filter(all(disp > 100))
#Or
#mtcars %>% group_by(carb) %>% filter(!any(disp < 100))

# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
# 4 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# 5 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
# 6 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
# 7 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
# 8 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
# 9 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#10 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#11 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#12 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
#13 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
#14 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
#15 15 8 301 335 3.54 3.57 14.6 0 1 5 8

Remove groups based on number of observations below a certain value using dplyr

Try creating a new variable to store the values that reach the mentioned condition:

library(dplyr)
#Code
new <- df %>% group_by(Group) %>%
mutate(Var=sum(Count>0)) %>%
filter(Var>1) %>% select(-Var)

Output:

# A tibble: 5 x 3
# Groups: Group [1]
Group Year Count
<chr> <dbl> <dbl>
1 B 1 10
2 B 2 15
3 B 3 8
4 B 4 0
5 B 5 6

R dplyr: how to remove smaller groups?

You can use n() to get the number of rows per group, and filter on it, take a look at ?n(), the last example about the usage of n() is filtering based on the size of groups:

df %>% group_by(group) %>% filter(n() >= 3)

# Source: local data frame [6 x 3]
# Groups: group [2]

# ID group value
# <int> <int> <int>
# 1 3 2 0
# 2 4 2 5
# 3 5 2 3
# 4 8 4 3
# 5 9 4 7
# 6 10 4 5

How to delete groups containing less than 3 rows of data in R?

One way to do it is to use the magic n() function within filter:

library(dplyr)

my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))

my_data %>%
group_by(Year, Site, Brood) %>%
filter(n() >= 3)

The n() function gives the number of rows in the current group (or the number of rows total if there is no grouping).

Remove groups with less than three unique observations

With data.table you could do:

library(data.table)
DT[, if(uniqueN(Day) >= 3) .SD, by = Group]

which gives:

   Group Day
1: 1 1
2: 1 3
3: 1 5
4: 1 5
5: 3 1
6: 3 2
7: 3 3

Or with dplyr:

library(dplyr)
DT %>%
group_by(Group) %>%
filter(n_distinct(Day) >= 3)

which gives the same result.

dplyr: How to filter groups by subgroup criteria

You could add group_by and filter to the codes

#OP's code
d1 <- dadmom %>%
gather(key, value, named:incm) %>%
separate(key, c("variable", "type"), -2) %>%
spread(variable, value, convert = TRUE)

d1 %>%
group_by(famid) %>%
filter(all(sum(type=='m' & inc > 15000)==sum(type=='m')))

# famid type inc name
# 1 2 d 22000 Art
# 2 2 m 18000 Amy
# 3 3 d 25000 Paul
# 4 3 m 50000 Pat

NOTE: The above will also work when there are multiple 'm's per famid (a bit more general)

For normal cases of single 'm/f' pair per famid

 d1 %>%
group_by(famid) %>%
filter(any(inc >15000 & type=='m'))
# famid type inc name
#1 2 d 22000 Art
#2 2 m 18000 Amy
#3 3 d 25000 Paul
#4 3 m 50000 Pat

Also, if you wish to use data.table, melt from the devel version i.e. v1.9.5 can take multiple value columns. It can be installed from here

 library(data.table)
melt(setDT(dadmom), measure.vars=list(c(2,4), c(3,5)),
variable.name='type', value.name=c('name', 'inc'))[,
type:=c('d', 'm')[type]][, .SD[any(type=='m' & inc >15000)] ,famid]
# famid type name inc
#1: 2 d Art 22000
#2: 2 m Amy 18000
#3: 3 d Paul 25000
#4: 3 m Pat 50000

Remove group from data.frame if at least one group member meets condition

Try

library(dplyr)
df2 %>%
group_by(group) %>%
filter(!any(world == "AF"))

Or as per metionned by @akrun:

setDT(df2)[, if(!any(world == "AF")) .SD, group]

Or

setDT(df2)[, if(all(world != "AF")) .SD, group]

Which gives:

#Source: local data frame [7 x 3]
#Groups: group
#
# world place group
#1 AB 1 1
#2 AC 1 1
#3 AD 2 1
#4 AB 1 3
#5 AE 2 3
#6 AC 3 3
#7 AE 1 3

Removing NA observations with dplyr::filter()

From @Ben Bolker:

[T]his has nothing specifically to do with dplyr::filter()

From @Marat Talipov:

[A]ny comparison with NA, including NA==NA, will return NA

From a related answer by @farnsy:

The == operator does not treat NA's as you would expect it to.

Think of NA as meaning "I don't know what's there". The correct answer
to 3 > NA is obviously NA because we don't know if the missing value
is larger than 3 or not. Well, it's the same for NA == NA. They are
both missing values but the true values could be quite different, so
the correct answer is "I don't know."

R doesn't know what you are doing in your analysis, so instead of
potentially introducing bugs that would later end up being published
an embarrassing you, it doesn't allow comparison operators to think NA
is a value.



Related Topics



Leave a reply



Submit