Subset Dataframe Based on Posixct Date and Time Greater Than Datetime Using Dplyr

Subset dataframe based on POSIXct date and time greater than datetime using dplyr

ymd_hms uses POSIXct times in "UTC" timezone by default - as.POSIXct uses the system timezone (e.g. - Australia for me) - you need to consistently use ymd_hms or change to the "UTC" timezone as per Dave's suggestion in the comments.

E.g.: these examples work:

date_test <- seq(ymd_hms('2016-07-01 00:30:00'),ymd_hms('2016-07-01 01:30:00'), by = '15 min')
date_test <- data.frame(datetime=date_test)
date_test

# datetime
#1 2016-07-01 00:30:00
#2 2016-07-01 00:45:00
#3 2016-07-01 01:00:00
#4 2016-07-01 01:15:00
#5 2016-07-01 01:30:00

date_test %>%
filter(datetime > as.POSIXct("2016-07-01 01:00:00", tz="UTC"))

date_test %>%
filter(datetime > ymd_hms("2016-07-01 01:00:00"))

# datetime
#1 2016-07-01 01:15:00
#2 2016-07-01 01:30:00

Subset data frame by DateTime column?

By default as.POSIXct takes the local timezone, you might want to add specific timezone to it.

library(dplyr)

df %>% filter(DateTime >= as.POSIXct('2019-07-25 01:45', tz = 'GMT') &
DateTime <= as.POSIXct('2019-07-25 23:45', tz = 'GMT'))

Moreover, if you need GMT/UTC timezone you can use lubridate::ymd_hms which has UTC as default timezone.

Subsetting POSIXct date and time returns wrong date

That is because "2014-01-01 00:00:00" is a string and you are comparing POSIXct class with a character class. Convert it into POSIXct and it should work

subset(x, Date2 <= as.POSIXct("2014-01-01 00:00:00", tz = "UTC"))

# date time Date2
#1 2013-12-12 23:00:00 2013-12-12 23:00:00
#2 2014-01-01 00:00:00 2014-01-01 00:00:00

Here, both the rows are selected since both of them are less than equal to date-time compared.

Filtering POSIXct POSIXt column based on value and NA in R

Try this approach:

library(dplyr)
#Code
newdf <- df%>%
filter(TERM_DATE> as.POSIXct("2018-12-31") | is.na(TERM_DATE))

Output:

     GEOGCD           OPER_DATE  TERM_DATE
1 E05006867 2009-01-01 00:00:00 2019-03-31
2 E05006868 2009-01-01 00:00:00 2019-03-31
3 E05000064 2018-05-01 22:00:00 <NA>

The smart solution from @StupidWolf also works:

#Code 2
df%>%
filter(TERM_DATE> as.Date("2018-12-31") | is.na(TERM_DATE))

Output:

     GEOGCD           OPER_DATE  TERM_DATE
1 E05006867 2009-01-01 00:00:00 2019-03-31
2 E05006868 2009-01-01 00:00:00 2019-03-31
3 E05000064 2018-05-01 22:00:00 <NA>

The output expected from OP can be reached using:

#Code 3
newdf <- df%>%
filter(TERM_DATE< as.POSIXct("2018-12-31") | is.na(TERM_DATE))

Output:

     GEOGCD           OPER_DATE  TERM_DATE
1 E05000066 2009-01-01 00:00:00 2018-05-02
2 E05000067 2009-01-01 00:00:00 2018-05-02
3 E05000068 2009-01-01 00:00:00 2018-05-02
4 E05000064 2018-05-01 22:00:00 <NA>

Or using as.Date(). You need to change the comparison to <.

R subsetting a big dataframe based on date values

You can convert date column to POSIXct and then subset.

You can do this using base R :

humm$date <- as.POSIXct(humm$date, format = '%m/%d/%Y %H:%M')
subset(humm, date >= as.POSIXct('02/24/2020 17:00', format = '%m/%d/%Y %H:%M', tz = 'GMT') &
date <= as.POSIXct('02/26/2020 02:00', format = '%m/%d/%Y %H:%M', tz = 'GMT'))

Or dplyr and lubridate :

library(dplyr)
library(lubridate)

humm %>%
mutate(date = mdy_hm(date)) %>%
filter(between(date, mdy_hm('02/24/2020 07:00'), mdy_hm('02/26/2020 02:00')))

How to select date range with POSIXct type

If DATE contains only dates (without hours and etc, like in your example data above) you can convert it into as.Date class and then operate on it

model.weather$DATE <- as.Date(model.weather$DATE)

model.weather <- subset(model.weather, DATE >= "2006-04-01" & DATE <= "2011-03-01")

or (a better solution)

model.weather <- model.weather[model.weather$DATE >= "2006-04-01" & model.weather$DATE <= "2011-03-01", ]

R - subset dataframe by Time only

Assuming that the start and end dates are always the same and only the times differ and you want those rows for which the time starts at or after 8:00 and ends before 9:30, convert the date/time values to characters strings of the form HH:MM and compare:

subset(DF, format(`Start Date`, "%H:%M") >= "08:00" & 
format(`End Date`, "%H:%M") < "09:30")

giving:

  Duration            End Date          Start Date
1 228 2013-01-03 09:10:00 2013-01-03 09:06:00

Note: We used the following for DF. (Next time please use dput to provide your data in reproducible form.)

DF <- structure(list(Duration = c(228L, 1675L, 393L, 426L, 827L, 780L
), `End Date` = structure(c(1357222200, 1357338840, 1357340040,
1357315800, 1357074780, 1357074780), class = c("POSIXct", "POSIXt"
), tzone = ""), `Start Date` = structure(c(1357221960, 1357337160,
1357339680, 1357315380, 1357073940, 1357074000), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("Duration", "End Date", "Start Date"
), row.names = c(NA, -6L), class = "data.frame")


Related Topics



Leave a reply



Submit