Valid time zones in lubridate
Take a search through the OlsonNames()
in the standard base
R package, which provides a list of all the valid timezones on the host system. e.g.:
grep("Brazil", OlsonNames(), value=TRUE)
...provides four possible results for Brazil.
Digging into R package: Time Zones in lubridate
You can find a list of the timezones available by lubridate
with:
lubridate::olson_time_zones()
This information is easily found by typing
??timezones
in the console.
The link describing this function is the second entry in the list of the help page in my case:
Another possibility is to search the manual of lubridate
for "available time zones".
Understanding timezone strings in R
CEST probably stands for Central European Summer Time.
So during daylight saving time, CET becomes CEST, and in winter it does not:
as.POSIXct(c("2016-1-1 13:00", "2016-3-1 13:00",
"2016-5-1 13:00", "2016-6-1 13:00",
"2016-9-1 13:00","2016-11-1 13:00"), tz="CET")
returns:
"2016-01-01 13:00:00 CET" "2016-03-01 13:00:00 CET" "2016-05-01 13:00:00 CEST"
"2016-06-01 13:00:00 CEST" "2016-09-01 13:00:00 CEST" "2016-11-01 13:00:00 CET"
However, as @Matt_Johnson explained, CEST is not on official timezone,
so as.POSIXct("2016-1-1 13:00, tz="CEST")
fails.
What does remain strange, that CEST is acceptable in a string, even if the time is outside daylight saving time:
as.POSIXct("2016-1-1 13:00 CEST")
[1] "2016-01-01 13:00:00 CET"
The help files from as.POSIXct
and strptime
don't offer any explaination here.
Why does lubridate appear to change time zones for two dates combined into a vector?
Using c()
will remove the timezone attribute. Hence you have to reassign it:
xy <- c(x,y)
attr(xy, "tzone") <- "UTC"
> xy
[1] "2016-02-08 UTC" "2016-03-29 UTC"
Source and more information: Peter Ehlers on R Help
Transform string in date through Lubridate with variation in month, day, year hour min am/pm and time zone
First thing I did was to setup a tibble with your 2 date vectors
tibble(
date1 = c("February 11th 2017, 6:05am PST", "April 24th 2018, 4:09pm PDT"),
date2 = c("2013-12-14 00:58:00 CET", "2013-06-19 18:00:00 CEST"),
) %>%
{. ->> my_dates}
my_dates
# # A tibble: 2 x 2
# date1 date2
# <chr> <chr>
# February 11th 2017, 6:05am PST 2013-12-14 00:58:00 CET
# April 24th 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST
Then, make a tibble of the timezone abbreviations and their offset from UTC
# setup timezones and UTC offsets
tribble(
~tz, ~offset,
'PST', -8,
'PDT', -7,
'CET', +1,
'CEST', +2
) %>%
{. ->> my_tz}
my_tz
# # A tibble: 4 x 2
# tz offset
# <chr> <dbl>
# PST -8
# PDT -7
# CET 1
# CEST 2
Then, we tidy the datetimes up by removing the character suffix after the day number in date1
(the 'th' bit after '11th'). We also pull out the timezone code and put that in a separate column; the timezone column allows us to left_join()
my_tz
in, giving us the UTC offset.
We use string-handling functions from the stringr
package, and regex expressions to find, extract and replace the components. A very handy tool for testing regex patterns can be found here https://regex101.com/r/5pr3LL/1/
my_dates %>%
mutate(
# remove the character suffix after the day number (eg 11th)
day_suffix = str_extract(date1, '[0-9]+[a-z]+') %>% str_extract('[a-z]+'),
date1 = str_replace(date1, day_suffix, ''),
day_suffix = NULL,
# extract timezone info
date1_tz = str_extract(date1, '[a-zA-Z]+$'),
date2_tz = str_extract(date2, '[a-zA-Z]+$'),
) %>%
# join in timezones for date1
left_join(my_tz, by = c('date1_tz' = 'tz')) %>%
rename(
offset_date1 = offset
) %>%
# join in timezones for date2
left_join(my_tz, by = c('date2_tz' = 'tz')) %>%
rename(
offset_date2 = offset
) %>%
{. ->> my_dates_info}
my_dates_info
# # A tibble: 2 x 6
# date1 date2 date1_tz date2_tz offset_date1 offset_date2
# <chr> <chr> <chr> <chr> <dbl> <dbl>
# February 11 2017, 6:05am PST 2013-12-14 00:58:00 CET PST CET -8 1
# April 24 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST PDT CEST -7 2
So now, we can use lubridate::as_datetime()
to convert date1
and date2
to dttm
(datetime) format. as_datetime()
takes a character-format datetime and converts it to datetime format. You must specify the format of the character string using symbols and abbreviations explained here. For example, here we use %B
to refer to the full name of the month, %d
is the day number and %Y
is the (4-digit) year number etc.
Note: because we don't specify the timezone inside as_datetime()
, the underlying timezone stored with these datetimes defaults to UTC
(as seen by using tz()
). This is why we call these columns date*_orig
, to remind us the timezone is the original datetime's timezone. Then we add the offset to the datetime object, so we now have these times in UTC (and the underlying timezone signature of these values is UTC
, so that's ideal).
# now define datetimes in local and UTC timezones (note: technically the tz is UTC for both)
my_dates_info %>%
mutate(
date1_orig = as_datetime(date1, format = '%B %d %Y, %I:%M%p '),
date1_utc = date1_orig + hours(offset_date1),
date2_orig = as_datetime(date2, format = '%Y-%m-%d %H:%M:%S'),
date2_utc = date2_orig + hours(offset_date2),
) %>%
{. ->> my_dates_utc}
my_dates_utc
# # A tibble: 2 x 10
# date1 date2 date1_tz date2_tz offset_date1 offset_date2 date1_orig date1_utc date2_orig date2_utc
# <chr> <chr> <chr> <chr> <dbl> <dbl> <dttm> <dttm> <dttm> <dttm>
# February 11 2017, 6:05am PST 2013-12-14 00:58:00 CET PST CET -8 1 2017-02-11 06:05:00 2017-02-10 22:05:00 2013-12-14 00:58:00 2013-12-14 01:58:00
# April 24 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST PDT CEST -7 2 2018-04-24 16:09:00 2018-04-24 09:09:00 2013-06-19 18:00:00 2013-06-19 20:00:00
Now that we have both sets of dates in datetime format, and in the same timezone, we can calculate time differences between them.
# now calculate difference between them
my_dates_utc %>%
select(date1_utc, date2_utc) %>%
mutate(
difference_days = interval(start = date1_utc, end = date2_utc) %>% time_length(unit = 'days')
)
# # A tibble: 2 x 3
# date1_utc date2_utc difference_days
# <dttm> <dttm> <dbl>
# 2017-02-10 22:05:00 2013-12-14 01:58:00 -1155.
# 2018-04-24 09:09:00 2013-06-19 20:00:00 -1770.
This should be fine for small-scale operations. If you had more than 2 different datetime format vectors, it would be worth considering a more complex operation where you transform the data from wide to long format. This would save repeating the same/similar code for each column, like we have done for date1
and date2
in this example.
Related Topics
Object Not Found Error with Ggplot2
Apply a Summarise Condition to a Range of Columns When Using Dplyr Group_By
Fitting a Lognormal Distribution to Truncated Data in R
How to Change the Size of the Strip on Facets in a Ggplot
Disconnected from Server in Shinyapps, But Local's Working
Plot Curved Lines Between Two Locations in Ggplot2
From [Package] Import [Function] in R
How to Run a Function Every Second
Predict.Svm Does Not Predict New Data
How Fill Part of a Circle Using Ggplot2
Caret: There Were Missing Values in Resampled Performance Measures
Concatenate Values Across Columns in Data.Table, Row by Row
Nan Is Removed When Using Na.Rm=True