How to remove groups of observation with dplyr::filter()
Use filter
in conjunction with base::ave
ds %>% dplyr::filter(ave(!is.na(attend), id, FUN = all))
To obtain
id year attend
1 1 2007 1
2 1 2008 1
3 1 2009 1
4 1 2010 1
5 1 2011 1
6 9 2007 2
7 9 2008 3
8 9 2009 3
9 9 2010 5
10 9 2011 5
How to filter out whole groups using dplyr, that have columns satisfying a particular property?
We can group_by
carb
and remove groups which have any value less than 100 for disp
.
library(dplyr)
mtcars %>% group_by(carb) %>% filter(all(disp > 100))
#Or
#mtcars %>% group_by(carb) %>% filter(!any(disp < 100))
# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
# 4 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# 5 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
# 6 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
# 7 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
# 8 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
# 9 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#10 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#11 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#12 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
#13 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
#14 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
#15 15 8 301 335 3.54 3.57 14.6 0 1 5 8
Remove groups based on number of observations below a certain value using dplyr
Try creating a new variable to store the values that reach the mentioned condition:
library(dplyr)
#Code
new <- df %>% group_by(Group) %>%
mutate(Var=sum(Count>0)) %>%
filter(Var>1) %>% select(-Var)
Output:
# A tibble: 5 x 3
# Groups: Group [1]
Group Year Count
<chr> <dbl> <dbl>
1 B 1 10
2 B 2 15
3 B 3 8
4 B 4 0
5 B 5 6
R dplyr: how to remove smaller groups?
You can use n()
to get the number of rows per group, and filter on it, take a look at ?n()
, the last example about the usage of n()
is filtering based on the size of groups:
df %>% group_by(group) %>% filter(n() >= 3)
# Source: local data frame [6 x 3]
# Groups: group [2]
# ID group value
# <int> <int> <int>
# 1 3 2 0
# 2 4 2 5
# 3 5 2 3
# 4 8 4 3
# 5 9 4 7
# 6 10 4 5
How to delete groups containing less than 3 rows of data in R?
One way to do it is to use the magic n()
function within filter
:
library(dplyr)
my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))
my_data %>%
group_by(Year, Site, Brood) %>%
filter(n() >= 3)
The n()
function gives the number of rows in the current group (or the number of rows total if there is no grouping).
Remove groups with less than three unique observations
With data.table you could do:
library(data.table)
DT[, if(uniqueN(Day) >= 3) .SD, by = Group]
which gives:
Group Day
1: 1 1
2: 1 3
3: 1 5
4: 1 5
5: 3 1
6: 3 2
7: 3 3
Or with dplyr
:
library(dplyr)
DT %>%
group_by(Group) %>%
filter(n_distinct(Day) >= 3)
which gives the same result.
dplyr: How to filter groups by subgroup criteria
You could add group_by
and filter
to the codes
#OP's code
d1 <- dadmom %>%
gather(key, value, named:incm) %>%
separate(key, c("variable", "type"), -2) %>%
spread(variable, value, convert = TRUE)
d1 %>%
group_by(famid) %>%
filter(all(sum(type=='m' & inc > 15000)==sum(type=='m')))
# famid type inc name
# 1 2 d 22000 Art
# 2 2 m 18000 Amy
# 3 3 d 25000 Paul
# 4 3 m 50000 Pat
NOTE: The above will also work when there are multiple 'm's per famid (a bit more general)
For normal cases of single 'm/f' pair per famid
d1 %>%
group_by(famid) %>%
filter(any(inc >15000 & type=='m'))
# famid type inc name
#1 2 d 22000 Art
#2 2 m 18000 Amy
#3 3 d 25000 Paul
#4 3 m 50000 Pat
Also, if you wish to use data.table
, melt
from the devel version i.e. v1.9.5
can take multiple value columns. It can be installed from here
library(data.table)
melt(setDT(dadmom), measure.vars=list(c(2,4), c(3,5)),
variable.name='type', value.name=c('name', 'inc'))[,
type:=c('d', 'm')[type]][, .SD[any(type=='m' & inc >15000)] ,famid]
# famid type name inc
#1: 2 d Art 22000
#2: 2 m Amy 18000
#3: 3 d Paul 25000
#4: 3 m Pat 50000
Remove group from data.frame if at least one group member meets condition
Try
library(dplyr)
df2 %>%
group_by(group) %>%
filter(!any(world == "AF"))
Or as per metionned by @akrun:
setDT(df2)[, if(!any(world == "AF")) .SD, group]
Or
setDT(df2)[, if(all(world != "AF")) .SD, group]
Which gives:
#Source: local data frame [7 x 3]
#Groups: group
#
# world place group
#1 AB 1 1
#2 AC 1 1
#3 AD 2 1
#4 AB 1 3
#5 AE 2 3
#6 AC 3 3
#7 AE 1 3
Removing NA observations with dplyr::filter()
From @Ben Bolker:
[T]his has nothing specifically to do with dplyr::filter()
From @Marat Talipov:
[A]ny comparison with NA, including NA==NA, will return NA
From a related answer by @farnsy:
The == operator does not treat NA's as you would expect it to.
Think of NA as meaning "I don't know what's there". The correct answer
to 3 > NA is obviously NA because we don't know if the missing value
is larger than 3 or not. Well, it's the same for NA == NA. They are
both missing values but the true values could be quite different, so
the correct answer is "I don't know."R doesn't know what you are doing in your analysis, so instead of
potentially introducing bugs that would later end up being published
an embarrassing you, it doesn't allow comparison operators to think NA
is a value.
Related Topics
R Memory Management Advice (Caret, Model Matrices, Data Frames)
Check If Value Is in Data Frame
Saving a List of Plots by Their Names()
Changing Word Template for Knitr in Rmarkdown
Using Mean with .Sd and .Sdcols in Data.Table
Execute a Set of Lines from Another R File
Alternate Geom_Text Position with Hjust
Plot Probability Heatmap/Hexbin with Different Sized Bins
Easiest Way to Discretize Continuous Scales for Ggplot2 Color Scales
R: Find First Non-Na Observation in Data.Table Column by Group
What Are Productive Ways to Debug Rcpp Compiled Code Loaded in R (On Os X Mavericks)
Control Transparency of Smoother and Confidence Interval
How to Make Discrete Gradient Color Bar with Geom_Contour_Filled
Split a Vector into Three Vectors of Unequal Length in R