How does dplyr’s between work?
between
is nothing special — any other function in R would have led to the same problem. Your confusion stems from the fact that dplyr has a lot of functions that allow you to work on data.frame column names as if they were normal variables; for instance:
filter(flights, month > 9)
However, between
is not one of these functions. As mentioned, it’s simply a normal function. So if you want to use it, you need to provide arguments in the conventional way; for instance:
between(flights$month, 7, 9)
This will return a logical vector, and you can now use it to index your data.frame:
flights[between(flights$month, 7, 9), ]
Or, more dplyr-like:
flights %>% filter(between(month, 7, 9))
Note that here we now use non-standard evaluation. But the evaluation is performed by filter
, not by between
. between
is called (by filter
) using standard evaluation.
How do I filter a range of numbers in R?
You can use %in%
, or as has been mentioned, alternatively dplyr
s between()
:
library(dplyr)
new_frame <- Mydata %>% filter(x %in% (3:7) )
new_frame
# x y
# 1 3 45
# 2 4 54
# 3 5 65
# 4 6 78
# 5 7 97
While %in%
works great for integers (or other equally spaced sequences), if you need to filter on floats, or any value between and including your two end points, or just want an alternative that's a bit more explicit than %in%
, use dplyr
's between()
:
new_frame2 <- Mydata%>% filter( between(x, 3, 7) )
new_frame2
# x y
# 1 3 45
# 2 4 54
# 3 5 65
# 4 6 78
# 5 7 97
To further clarify, note that %in%
checks for the presence in a set of values:
3 %in% 3:7
# [1] TRUE
5 %in% 3:7
# [1] TRUE
5.0 %in% 3:7
# [1] TRUE
The above return TRUE
because 3:7
is shorthand for seq(3, 7)
which produces:
3:7
# [1] 3 4 5 6 7
seq(3, 7)
# [1] 3 4 5 6 7
As such, if you were to use %in%
to check for values not produced by :
, it will return FALSE
:
4.5 %in% 3:7
# [1] FALSE
4.15 %in% 3:7
# [1] FALSE
Whereas between
checks against the end points and all values in between:
between(3, 3, 7)
# [1] TRUE
between(7, 3, 7)
# [1] TRUE
between(5, 3, 7)
# [1] TRUE
between(5.0, 3, 7)
# [1] TRUE
between(4.5, 3, 7)
# [1] TRUE
between(4.15, 3, 7)
# [1] TRUE
How to use select() inside between() inside filter() to subset data dplyr r
Combine multiple conditions using &
-
library(dplyr)
data %>%
filter(SiteID == "A" & between(Seconds, 2, 8) |
SiteID == "B" & between(Seconds, 3, 6) |
SiteID == "C" & between(Seconds, 8, 10)|
SiteID == "D" & between(Seconds, 1, 6) |
SiteID == "E" & between(Seconds, 2, 9))
conditional matching between variables in dplyr
Try that:
parties %>%
group_by(name) %>%
filter("K" %in% class,
"R" %in% class,
"L" %in% class) %>%
summarise()
# A tibble: 2 x 1
name
<chr>
1 Party2
2 Party4
EDIT: If you want to work with more than 3 parties you can also use:
mask = c("K", "R", "L")
parties %>%
group_by(name) %>%
filter(all(mask %in% class)) %>%
summarise()
Filter between multiple date ranges
With some inspiration from this question on how to Efficient way to filter one data frame by ranges in another, I came up with the following solutions.
One is a very slow with very large datasets:
It takes my data provided above and uses rowwise()
filtered3 <- df %>%
rowwise() %>%
filter(any(datetime >= start & datetime <= end))
As I mentioned, with more than 3 million rows in my data, this was very slow.
Another option, also from the answer linked above, includes using the data.table package, which has an inrange
function. This one works much faster.
library(data.table)
range <- data.table(start = start, end = end)
filtered4 <- setDT(df)[datetime %inrange% range]
Filtering dates in dplyr
If Date is properly formatted as a date
, your first try works:
p2p_dt_SKILL_A <-read.table(text="Patch,Date,Prod_DL
P1,9/4/2015,3.43
P11,9/11/2015,3.49
P12,9/18/2015,3.45
P13,12/6/2015,3.57
P14,12/13/2015,3.43
P15,12/20/2015,3.47
",sep=",",stringsAsFactors =FALSE, header=TRUE)
p2p_dt_SKILL_A$Date <-as.Date(p2p_dt_SKILL_A$Date,"%m/%d/%Y")
p2p_dt_SKILL_A%>%
select(Patch,Date,Prod_DL)%>%
filter(Date > "2015-09-04" & Date <"2015-09-18")
Patch Date Prod_DL
1 P11 2015-09-11 3.49
Still works if data is of type tbl_df
.
p2p_dt_SKILL_A <-tbl_df(p2p_dt_SKILL_A)
p2p_dt_SKILL_A%>%
select(Patch,Date,Prod_DL)%>%
filter(Date > "2015-09-04" & Date <"2015-09-18")
Source: local data frame [1 x 3]
Patch Date Prod_DL
(chr) (date) (dbl)
1 P11 2015-09-11 3.49
combining loops and some dplyr functions
We can use map
to loop over the 'keywords', then filter
where the 'word' is that keyword, and frequency is greater than 0, then grouped by 'TI', get the tally
and the number of rows
library(purrr)
library(dplyr)
map(keywords, ~ df %>%
filter(word == .x, frequency > 0) %>%
group_by(TI) %>%
tally() %>%
nrow())
Error in `dplyr::between()`: 'left' must be length 1
You need to capture one value, and Tmin
is capturing the entire vector of values for each group, so to solve the problem you can use a function that takes out one value out of the vector. Since the vector is made of the same values, many functions can work, e.g. min
, or first
:
TimeTempReprod %>%
group_by(Date, Station) %>%
mutate(y = between(Temperature, min(Tmin), min(Tmin) + 2))
gives out:
# A tibble: 96 × 8
# Groups: Date, Station [2]
Station Date Time Temperature Tmin Tmed Tmax y
<chr> <date> <time> <dbl> <dbl> <dbl> <dbl> <lgl>
1 F 2021-10-15 00:11:46 16.8 15.2 17.1 20.4 TRUE
2 F 2021-10-15 00:41:46 16.5 15.2 17.1 20.4 TRUE
3 F 2021-10-15 01:11:46 16.2 15.2 17.1 20.4 TRUE
4 F 2021-10-15 01:41:46 15.6 15.2 17.1 20.4 TRUE
5 F 2021-10-15 02:11:46 15.9 15.2 17.1 20.4 TRUE
6 F 2021-10-15 02:41:46 16.1 15.2 17.1 20.4 TRUE
7 F 2021-10-15 03:11:46 16.4 15.2 17.1 20.4 TRUE
8 F 2021-10-15 03:41:46 16.2 15.2 17.1 20.4 TRUE
9 F 2021-10-15 04:11:46 16 15.2 17.1 20.4 TRUE
10 F 2021-10-15 04:41:46 16 15.2 17.1 20.4 TRUE
# … with 86 more rows
Related Topics
Find Location of Current .R File
View the Source of an R Package
Plot Every Column in a Data Frame as a Histogram on One Page Using Ggplot
Convert Ggplot Object to Plotly in Shiny Application
Ggplot2: Define Plot Layout with Grid.Arrange() as Argument of Do.Call()
Unnesting a List of Lists in a Data Frame Column
Stl Decomposition of Time Series with Missing Values for Anomaly Detection
Extract Text from Two-Column PDF with R
How to Display a Busy Indicator in a Shiny App
Geom_Bar() + Pictograms, How To
How to Plot a Classification Graph of a Svm in R
Importing Common Yaml in Rstudio/Knitr Document
An Elegant Way to Change Columns Type in Dataframe in R
Error: Could Not Find Function "Unit"
Creating a Sequential List of Letters with R