Filtering dates in dplyr
If Date is properly formatted as a date
, your first try works:
p2p_dt_SKILL_A <-read.table(text="Patch,Date,Prod_DL
P1,9/4/2015,3.43
P11,9/11/2015,3.49
P12,9/18/2015,3.45
P13,12/6/2015,3.57
P14,12/13/2015,3.43
P15,12/20/2015,3.47
",sep=",",stringsAsFactors =FALSE, header=TRUE)
p2p_dt_SKILL_A$Date <-as.Date(p2p_dt_SKILL_A$Date,"%m/%d/%Y")
p2p_dt_SKILL_A%>%
select(Patch,Date,Prod_DL)%>%
filter(Date > "2015-09-04" & Date <"2015-09-18")
Patch Date Prod_DL
1 P11 2015-09-11 3.49
Still works if data is of type tbl_df
.
p2p_dt_SKILL_A <-tbl_df(p2p_dt_SKILL_A)
p2p_dt_SKILL_A%>%
select(Patch,Date,Prod_DL)%>%
filter(Date > "2015-09-04" & Date <"2015-09-18")
Source: local data frame [1 x 3]
Patch Date Prod_DL
(chr) (date) (dbl)
1 P11 2015-09-11 3.49
Filtering dataset for a specific date with lubridate and dplyr
You need to convert the datetime object to date class and then compare. Also use &
to include multiple conditions to check.
library(dplyr)
library(lubridate)
d %>%
filter(as_date(Date) >= as_date("2019-11-02") &
as_date(Date) <= as_date("2019-11-02"))
# Date ID
# <dttm> <int>
#1 2019-11-02 07:19:19 1
#2 2019-11-02 14:02:02 2
#3 2019-11-02 15:59:23 3
#4 2019-11-02 19:22:58 4
#5 2019-11-02 20:49:25 5
#6 2019-11-02 21:06:07 6
#7 2019-11-02 21:27:12 7
Moreover, we can also use between
d %>%
filter(between(as_date(Date), as_date("2019-11-02"), as_date("2019-11-02")))
Filter by a date condition with an unspecified year in R
Here is an alternative, converting to character.
df[format(as.Date(df$date), "%m%d") > "0105",]
I have a column of dates, how to filter dates before a specific day? R
library(lubridate)
df <- setDT(your_dataset)
df_sep <- df[mydates >= as.Date('2020-09-01') & mydates <= as.Date('2020-09-30')]
How do filter dates in r using dplyr?
The date should be quoted and it is better to compare across similar types i.e. convert to Date
class with as.Date
library(dplyr)
df_gather %>%
filter(Country.Region %in% c("Italy")) %>%
arrange(desc(Date)) %>%
filter(Date %in% as.Date(c("2020-12-12")))
dplyr not filtering dates correctly
Basically, !=
does not get NA
, check this post for more information, but here is an example with your data
library(dplyr)
library(lubridate)
> test_df %>% filter(dob == ymd('1899-01-01')) %>% nrow()
[1] 14
> test_df %>% filter(dob != ymd('1899-01-01')) %>% nrow()
[1] 15
> test_df %>% filter(is.na(dob)) %>% nrow()
[1] 16
> test_df %>% filter(dob != ymd('1899-01-01') | is.na(dob)) %>% nrow()
[1] 31
Filter between multiple date ranges
With some inspiration from this question on how to Efficient way to filter one data frame by ranges in another, I came up with the following solutions.
One is a very slow with very large datasets:
It takes my data provided above and uses rowwise()
filtered3 <- df %>%
rowwise() %>%
filter(any(datetime >= start & datetime <= end))
As I mentioned, with more than 3 million rows in my data, this was very slow.
Another option, also from the answer linked above, includes using the data.table package, which has an inrange
function. This one works much faster.
library(data.table)
range <- data.table(start = start, end = end)
filtered4 <- setDT(df)[datetime %inrange% range]
How to filter by range of dates in R?
Here is one way you could do it:
- use
lubridate
symd
function to get the date format - group by
id
andarrange
- calculate the difference to the first date
- add a row number column
row
- filter for your conditions!
library(dplyr)
library(lubridate)
df %>%
mutate(date_dd.mm.yyyy = dmy(date_dd.mm.yyyy)) %>%
group_by(id) %>%
arrange(date_dd.mm.yyyy, .by_group = TRUE) %>%
mutate(diff = date_dd.mm.yyyy-first(date_dd.mm.yyyy)) %>%
mutate(row = row_number()) %>%
filter(row <=4 | diff < 90) %>%
select(-diff, -row)
id date_dd.mm.yyyy
<int> <date>
1 1 2021-01-01
2 1 2021-02-01
3 1 2021-02-02
4 1 2021-02-03
5 2 2021-03-03
6 2 2021-07-05
7 2 2021-07-07
8 2 2021-12-04
9 8 2021-07-06
10 12 2021-05-01
Filtering dates for time series plot using dplyr
If I remember correctly, between()
didn't work with Date
s at one point, even when the left
and right
arguments were converted with as.Date()
.
Here are some alternatives. Since all of your sample data fall between the specified years, these all filter for dates between 2013-02-04 and 2013-02-12. Adjust accordingly.
library(dplyr)
roadsalt_data <- as_tibble(roadsalt_data) # not necessary, just convenient console output
roadsalt_data %>%
select(orgid, stdate, locid, charnam) %>%
filter(stdate >= "2013-02-04", stdate <= "2013-02-12")
#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen
roadsalt_data %>%
select(orgid, stdate, locid, charnam) %>%
filter(between(stdate, as.Date("2013-02-04"), as.Date("2013-02-12")))
#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen
# How I would've done it
library(lubridate)
roadsalt_data %>%
select(orgid, stdate, locid, charnam) %>%
# filter(between(year(stdate), 1996, 2015)) # for years instead of days
filter(between(day(stdate), 4, 12))
#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen
# If {lubridate} isn't installed, this is all year() and day() do...
get_day <- function(date) as.POSIXlt(date, tz = tz(date))$mday
# get_year <- function(date) as.POSIXlt(date, tz = tz(date))$year + 1900 # for years instead of days
roadsalt_data %>%
select(orgid, stdate, locid, charnam) %>%
# filter(between(get_year(stdate), 1996, 2015)) # for years instead of days
filter(between(get_day(stdate), 4, 12))
#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen
# Base R
roadsalt_data <- roadsalt_data[, c("orgid", "stdate", "locid", "charnam")]
roadsalt_data[roadsalt_data$stdate >= as.Date("2013-02-04") & roadsalt_data$stdate <= as.Date("2013-02-12") ,]
#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen
Created on 2018-05-23 by the reprex package (v0.2.0).
==============================================================
If none of these work, there's something else entirely going on.
Related Topics
How to Save a Data Frame as CSV to a User Selected Location Using Tcltk
Avoid Rbind()/Cbind() Conversion from Numeric to Factor
What Is a Fast Way to Set Debugging Code at a Given Line in a Function
Plot a Legend and Well-Spaced Universal Y-Axis and Main Titles in Grid.Arrange
Methods for Doing Heatmaps, Level/Contour Plots, and Hexagonal Binning
Compare If Two Dataframe Objects in R Are Equal
Ggplot2: Overlay Density Plots R
Remove a Layer from a Ggplot2 Chart
Reading in Chunks at a Time Using Fread in Package Data.Table
Rmarkdown Directing Output File into a Directory
R Package Xtable, How to Create a Latextable with Multiple Rows and Columns from R
Adding Custom Image to Geom_Polygon Fill in Ggplot
Change Level of Multiple Factor Variables