Subset dataframe based on POSIXct date and time greater than datetime using dplyr
ymd_hms
uses POSIXct
times in "UTC" timezone by default - as.POSIXct
uses the system timezone (e.g. - Australia for me) - you need to consistently use ymd_hms
or change to the "UTC" timezone as per Dave's suggestion in the comments.
E.g.: these examples work:
date_test <- seq(ymd_hms('2016-07-01 00:30:00'),ymd_hms('2016-07-01 01:30:00'), by = '15 min')
date_test <- data.frame(datetime=date_test)
date_test
# datetime
#1 2016-07-01 00:30:00
#2 2016-07-01 00:45:00
#3 2016-07-01 01:00:00
#4 2016-07-01 01:15:00
#5 2016-07-01 01:30:00
date_test %>%
filter(datetime > as.POSIXct("2016-07-01 01:00:00", tz="UTC"))
date_test %>%
filter(datetime > ymd_hms("2016-07-01 01:00:00"))
# datetime
#1 2016-07-01 01:15:00
#2 2016-07-01 01:30:00
Subset data frame by DateTime column?
By default as.POSIXct
takes the local timezone, you might want to add specific timezone to it.
library(dplyr)
df %>% filter(DateTime >= as.POSIXct('2019-07-25 01:45', tz = 'GMT') &
DateTime <= as.POSIXct('2019-07-25 23:45', tz = 'GMT'))
Moreover, if you need GMT
/UTC
timezone you can use lubridate::ymd_hms
which has UTC
as default timezone.
Subsetting POSIXct date and time returns wrong date
That is because "2014-01-01 00:00:00"
is a string and you are comparing POSIXct
class with a character class. Convert it into POSIXct
and it should work
subset(x, Date2 <= as.POSIXct("2014-01-01 00:00:00", tz = "UTC"))
# date time Date2
#1 2013-12-12 23:00:00 2013-12-12 23:00:00
#2 2014-01-01 00:00:00 2014-01-01 00:00:00
Here, both the rows are selected since both of them are less than equal to date-time compared.
Filtering POSIXct POSIXt column based on value and NA in R
Try this approach:
library(dplyr)
#Code
newdf <- df%>%
filter(TERM_DATE> as.POSIXct("2018-12-31") | is.na(TERM_DATE))
Output:
GEOGCD OPER_DATE TERM_DATE
1 E05006867 2009-01-01 00:00:00 2019-03-31
2 E05006868 2009-01-01 00:00:00 2019-03-31
3 E05000064 2018-05-01 22:00:00 <NA>
The smart solution from @StupidWolf also works:
#Code 2
df%>%
filter(TERM_DATE> as.Date("2018-12-31") | is.na(TERM_DATE))
Output:
GEOGCD OPER_DATE TERM_DATE
1 E05006867 2009-01-01 00:00:00 2019-03-31
2 E05006868 2009-01-01 00:00:00 2019-03-31
3 E05000064 2018-05-01 22:00:00 <NA>
The output expected from OP can be reached using:
#Code 3
newdf <- df%>%
filter(TERM_DATE< as.POSIXct("2018-12-31") | is.na(TERM_DATE))
Output:
GEOGCD OPER_DATE TERM_DATE
1 E05000066 2009-01-01 00:00:00 2018-05-02
2 E05000067 2009-01-01 00:00:00 2018-05-02
3 E05000068 2009-01-01 00:00:00 2018-05-02
4 E05000064 2018-05-01 22:00:00 <NA>
Or using as.Date()
. You need to change the comparison to <
.
R subsetting a big dataframe based on date values
You can convert date
column to POSIXct
and then subset
.
You can do this using base R :
humm$date <- as.POSIXct(humm$date, format = '%m/%d/%Y %H:%M')
subset(humm, date >= as.POSIXct('02/24/2020 17:00', format = '%m/%d/%Y %H:%M', tz = 'GMT') &
date <= as.POSIXct('02/26/2020 02:00', format = '%m/%d/%Y %H:%M', tz = 'GMT'))
Or dplyr
and lubridate
:
library(dplyr)
library(lubridate)
humm %>%
mutate(date = mdy_hm(date)) %>%
filter(between(date, mdy_hm('02/24/2020 07:00'), mdy_hm('02/26/2020 02:00')))
How to select date range with POSIXct type
If DATE
contains only dates (without hours and etc, like in your example data above) you can convert it into as.Date
class and then operate on it
model.weather$DATE <- as.Date(model.weather$DATE)
model.weather <- subset(model.weather, DATE >= "2006-04-01" & DATE <= "2011-03-01")
or (a better solution)
model.weather <- model.weather[model.weather$DATE >= "2006-04-01" & model.weather$DATE <= "2011-03-01", ]
R - subset dataframe by Time only
Assuming that the start and end dates are always the same and only the times differ and you want those rows for which the time starts at or after 8:00 and ends before 9:30, convert the date/time values to characters strings of the form HH:MM and compare:
subset(DF, format(`Start Date`, "%H:%M") >= "08:00" &
format(`End Date`, "%H:%M") < "09:30")
giving:
Duration End Date Start Date
1 228 2013-01-03 09:10:00 2013-01-03 09:06:00
Note: We used the following for DF
. (Next time please use dput
to provide your data in reproducible form.)
DF <- structure(list(Duration = c(228L, 1675L, 393L, 426L, 827L, 780L
), `End Date` = structure(c(1357222200, 1357338840, 1357340040,
1357315800, 1357074780, 1357074780), class = c("POSIXct", "POSIXt"
), tzone = ""), `Start Date` = structure(c(1357221960, 1357337160,
1357339680, 1357315380, 1357073940, 1357074000), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("Duration", "End Date", "Start Date"
), row.names = c(NA, -6L), class = "data.frame")
Related Topics
Extracting Indices for Data Frame Rows That Have Max Value for Named Field
Control Speed of a Gganimation
How to Merge Two Nodes into a Single Node Using Igraph
Control Font Thickness Without Changing Font Size
Convert List to Named List in R
How to Remove Columns from a Data.Frame by Data Type
Adding Labels on Curves in Glmnet Plot in R
Read CSV with Two Headers into a Data.Frame
Unexpected Symbol Error in Parse(Text = Str) with Hyphen After a Digit
Why Does Subsetting a Column from a Data Frame VS. a Tibble Give Different Results
How to Select Rows According to Column Value Conditions
Likert Plot Showing Percentage Values
Month Language in the As.Date Function
Plotting Wide Format Data Using R Ggplot
Fit Many Formulae at Once, Faster Options Than Lapply
How to Use Random Forests in R with Missing Values
What's the Difference Between Substitute and Quote in R
How to Make Shiny's Input$Var Consumable for Dplyr::Summarise()