Filtering Dates in Dplyr

Filtering dates in dplyr

If Date is properly formatted as a date, your first try works:

p2p_dt_SKILL_A <-read.table(text="Patch,Date,Prod_DL
P1,9/4/2015,3.43
P11,9/11/2015,3.49
P12,9/18/2015,3.45
P13,12/6/2015,3.57
P14,12/13/2015,3.43
P15,12/20/2015,3.47
",sep=",",stringsAsFactors =FALSE, header=TRUE)

p2p_dt_SKILL_A$Date <-as.Date(p2p_dt_SKILL_A$Date,"%m/%d/%Y")

p2p_dt_SKILL_A%>%
select(Patch,Date,Prod_DL)%>%
filter(Date > "2015-09-04" & Date <"2015-09-18")
Patch Date Prod_DL
1 P11 2015-09-11 3.49



Still works if data is of type tbl_df.

p2p_dt_SKILL_A <-tbl_df(p2p_dt_SKILL_A)

p2p_dt_SKILL_A%>%
select(Patch,Date,Prod_DL)%>%
filter(Date > "2015-09-04" & Date <"2015-09-18")
Source: local data frame [1 x 3]

Patch Date Prod_DL
(chr) (date) (dbl)
1 P11 2015-09-11 3.49

Filtering dataset for a specific date with lubridate and dplyr

You need to convert the datetime object to date class and then compare. Also use & to include multiple conditions to check.

library(dplyr)
library(lubridate)

d %>%
filter(as_date(Date) >= as_date("2019-11-02") &
as_date(Date) <= as_date("2019-11-02"))

# Date ID
# <dttm> <int>
#1 2019-11-02 07:19:19 1
#2 2019-11-02 14:02:02 2
#3 2019-11-02 15:59:23 3
#4 2019-11-02 19:22:58 4
#5 2019-11-02 20:49:25 5
#6 2019-11-02 21:06:07 6
#7 2019-11-02 21:27:12 7

Moreover, we can also use between

d %>%
filter(between(as_date(Date), as_date("2019-11-02"), as_date("2019-11-02")))

Filter by a date condition with an unspecified year in R

Here is an alternative, converting to character.

df[format(as.Date(df$date), "%m%d") > "0105",]

I have a column of dates, how to filter dates before a specific day? R

library(lubridate)

df <- setDT(your_dataset)

df_sep <- df[mydates >= as.Date('2020-09-01') & mydates <= as.Date('2020-09-30')]

How do filter dates in r using dplyr?

The date should be quoted and it is better to compare across similar types i.e. convert to Date class with as.Date

library(dplyr)
df_gather %>%
filter(Country.Region %in% c("Italy")) %>%
arrange(desc(Date)) %>%
filter(Date %in% as.Date(c("2020-12-12")))

dplyr not filtering dates correctly

Basically, != does not get NA, check this post for more information, but here is an example with your data

library(dplyr)
library(lubridate)

> test_df %>% filter(dob == ymd('1899-01-01')) %>% nrow()
[1] 14

> test_df %>% filter(dob != ymd('1899-01-01')) %>% nrow()
[1] 15

> test_df %>% filter(is.na(dob)) %>% nrow()
[1] 16

> test_df %>% filter(dob != ymd('1899-01-01') | is.na(dob)) %>% nrow()
[1] 31

Filter between multiple date ranges

With some inspiration from this question on how to Efficient way to filter one data frame by ranges in another, I came up with the following solutions.

One is a very slow with very large datasets:

It takes my data provided above and uses rowwise()

filtered3 <- df %>% 
rowwise() %>%
filter(any(datetime >= start & datetime <= end))

As I mentioned, with more than 3 million rows in my data, this was very slow.

Another option, also from the answer linked above, includes using the data.table package, which has an inrange function. This one works much faster.

library(data.table)
range <- data.table(start = start, end = end)
filtered4 <- setDT(df)[datetime %inrange% range]

How to filter by range of dates in R?

Here is one way you could do it:

  1. use lubridates ymd function to get the date format
  2. group by id and arrange
  3. calculate the difference to the first date
  4. add a row number column row
  5. filter for your conditions!
library(dplyr)
library(lubridate)
df %>%
mutate(date_dd.mm.yyyy = dmy(date_dd.mm.yyyy)) %>%
group_by(id) %>%
arrange(date_dd.mm.yyyy, .by_group = TRUE) %>%
mutate(diff = date_dd.mm.yyyy-first(date_dd.mm.yyyy)) %>%
mutate(row = row_number()) %>%
filter(row <=4 | diff < 90) %>%
select(-diff, -row)
      id date_dd.mm.yyyy
<int> <date>
1 1 2021-01-01
2 1 2021-02-01
3 1 2021-02-02
4 1 2021-02-03
5 2 2021-03-03
6 2 2021-07-05
7 2 2021-07-07
8 2 2021-12-04
9 8 2021-07-06
10 12 2021-05-01

Filtering dates for time series plot using dplyr

If I remember correctly, between() didn't work with Dates at one point, even when the left and right arguments were converted with as.Date().

Here are some alternatives. Since all of your sample data fall between the specified years, these all filter for dates between 2013-02-04 and 2013-02-12. Adjust accordingly.

library(dplyr)

roadsalt_data <- as_tibble(roadsalt_data) # not necessary, just convenient console output

roadsalt_data %>%
select(orgid, stdate, locid, charnam) %>%
filter(stdate >= "2013-02-04", stdate <= "2013-02-12")

#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen

roadsalt_data %>%
select(orgid, stdate, locid, charnam) %>%
filter(between(stdate, as.Date("2013-02-04"), as.Date("2013-02-12")))

#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen

# How I would've done it
library(lubridate)

roadsalt_data %>%
select(orgid, stdate, locid, charnam) %>%
# filter(between(year(stdate), 1996, 2015)) # for years instead of days
filter(between(day(stdate), 4, 12))

#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen

# If {lubridate} isn't installed, this is all year() and day() do...
get_day <- function(date) as.POSIXlt(date, tz = tz(date))$mday
# get_year <- function(date) as.POSIXlt(date, tz = tz(date))$year + 1900 # for years instead of days

roadsalt_data %>%
select(orgid, stdate, locid, charnam) %>%
# filter(between(get_year(stdate), 1996, 2015)) # for years instead of days
filter(between(get_day(stdate), 4, 12))

#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen

# Base R
roadsalt_data <- roadsalt_data[, c("orgid", "stdate", "locid", "charnam")]
roadsalt_data[roadsalt_data$stdate >= as.Date("2013-02-04") & roadsalt_data$stdate <= as.Date("2013-02-12") ,]

#> # A tibble: 8 x 4
#> orgid stdate locid charnam
#> <chr> <date> <chr> <chr>
#> 1 USGS-NJ 2013-02-12 USGS-01380075 Nitrogen, mixed forms (NH3), (NH4), or~
#> 2 USGS-NJ 2013-02-12 USGS-01368820 Nitrogen, mixed forms (NH3), (NH4), or~
#> 3 USGS-NJ 2013-02-12 USGS-01409815 Nitrogen, mixed forms (NH3), (NH4), or~
#> 4 USGS-NJ 2013-02-12 USGS-01411400 Nitrogen, mixed forms (NH3), (NH4), or~
#> 5 USGS-NJ 2013-02-04 USGS-01458570 Inorganic nitrogen (nitrate and nitrit~
#> 6 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 7 USGS-NJ 2013-02-04 USGS-01458570 Phosphorus
#> 8 USGS-NJ 2013-02-04 USGS-01445160 Kjeldahl nitrogen

Created on 2018-05-23 by the reprex package (v0.2.0).

==============================================================

If none of these work, there's something else entirely going on.



Related Topics



Leave a reply



Submit