Parsing Iso8601 Date and Time Format in R

Parsing ISO8601 date and time format in R

%z is the signed offset in hours, in the format hhmm, not hh:mm. Here's one way to remove the last :.

newstring <- gsub("(.*).(..)$","\\1\\2","2013-04-05T07:49:54-07:00")
(timep <- strptime(newstring, "%Y-%m-%dT%H:%M:%S%z", tz="UTC"))
# [1] "2013-04-05 14:49:54 UTC"

Also note that you don't have to remove the "T".

Parse ISO 8601 date-time in format YYYY-MM-DDTHH-MM-SSZ

You can simply parse the timestamp by specifying the format in as.POSIXct (or strptime)

as.POSIXct("2019-05-15T01:42:15.072Z", format = "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC")
#[1] "2019-05-15 01:42:15 UTC"

Explanation:

%Y, %m and %d denote the year (with century), month and day; %H, %M and %OS denote the hours, minutes and seconds (including milliseconds). The T and Z are simply added to the format string, because

Any character in the format string not part of a conversion specification is interpreted literally

See ?strptime for the different conversion specifications.

A comment on timezones

As the Z denotes UTC times, we have manually added tz = "UTC" to as.POSIXct (as pointed out by @BennyJobigan). If you wanted the timestamp to be converted to your local (target) timezone, you can do

# In timezone of target, i.e. convert from UTC to local
lubridate::with_tz(
as.POSIXct("2019-05-15T01:42:15.072Z", format = "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC"),
tz = Sys.timezone())
# [1] "2019-05-15 11:42:15 AEST"

(Obviously the output depends on your local timezone and might be different from what I get.)

Parsing ISO8601 in R

Strictly speaking, you can't. I don't need to know anything about r or cran (or even what they are) to tell you that, because I know ISO 8601 well enough to know that just knowing something is ISO 8601 is not enough to unambiguously know that what is meant by it, especially in the shorter forms.

Find out what profile of ISO 8601 the other party is using. If they don't know what you're talking about, then you will be doing them a favour when you point out what I just said in the paragraph above. As I wrote once elsewhere,

Unfortunately many people think of a particular profile they are familiar with when they hear “ISO 8601”, other people know that using 8601 is a Good Thing but are not familiar with the details of implementation. Hence a spec or requirements document might mention 8601 but not be more explicit than that. In such cases it’s important to seek clarification rather than assume that the format you think of as “ISO 8601” is the correct one to use.

So, tell them "'ISO 8601' is not specific enough, I need to know exactly what you are doing, what your limits on precision are." (And possibly what your policy on dates prior to 1582 and perhaps again prior to 0001 are, your policy on leap-seconds, and a few other things left open but the standard)

Then whatever you're dealing with should be easy enough: Aside from this point of ambiguity, it is a pretty straight-forward standard. It should just be thought of as a standard about defining date formats, more than one that defines a date format.

Parsing string/ISO 8601 Date

You can provide more formats with format=, and in this case add %z, which ?strptime defines as

     '%z' Signed offset in hours and minutes from UTC, so '-0800' is 8
hours behind UTC. Values up to '+1400' are accepted.
(Standard only for output. For input R currently supports it
on all platforms.)
library(lubridate)
as_datetime("20200603231413-0400", format = "%Y%m%d%H%M%S%z", tz = "America/New_York")
# [1] "2020-06-03 23:14:13 EDT"
as_datetime("20200628203000+0000", format = "%Y%m%d%H%M%S%z", tz = "America/New_York")
# [1] "2020-06-28 16:30:00 EDT"
as_datetime("20200528203116+0000", format = "%Y%m%d%H%M%S%z", tz = "America/New_York")
# [1] "2020-05-28 16:31:16 EDT"

Extract month from ISO-8601 format date using R

You have a small logic error. You take your input data and parse it -- so far, so good:

> v <- c("2019-12-31T17:05:00Z", "2019-12-31T23:14:00Z", "2020-01-01T11:00:00Z")
> pt <- as.POSIXct(v)
> pt
[1] "2019-12-31 CST" "2019-12-31 CST" "2020-01-01 CST"
>

Now that you have date(time) information, you can format() or strftime() as you like:

> strftime(pt, "%m")
[1] "12" "12" "01"
> strftime(pt, "%b")
[1] "Dec" "Dec" "Jan"
> strftime(pt, "%B")
[1] "December" "December" "January"
>

Converting from character to int is also easy. Note that all this was done with base R without any additional packages.

Calculating time difference in R with ISO 8601 data format

As the error message suggests to use difftime you need values of class POSIXct. In the data the values are of type character stored inside a list. Another thing to notice is that the data contains timestamp from different timezones (-04:00 and -05:00). Thankfully, ymd_hms from lubridate can handle it automatically for us.

library(dplyr)
library(lubridate)

recordingdata %>%
mutate(across(c(start, end), unlist),
across(c(start, end), ymd_hms),
difference = difftime(end, start, units = 'mins'))

# start end difference
#1 2018-10-04 18:00:12 2018-10-04 19:18:07 77.91420 mins
#2 2018-10-25 15:05:29 2018-10-25 16:16:14 70.73660 mins
#3 2018-10-11 15:04:04 2018-10-11 16:13:38 69.55299 mins
#4 2019-01-24 19:02:47 2019-01-24 20:16:50 74.04655 mins
#5 2019-01-16 20:31:36 2019-01-16 21:21:15 49.65784 mins
#6 2018-11-27 16:04:25 2018-11-27 17:16:36 72.17727 mins
#7 2018-09-27 17:59:20 2018-09-27 19:16:23 77.04634 mins
#8 2018-10-23 15:03:57 2018-10-23 16:18:01 74.06905 mins
#...
#...

Current time in ISO 8601 format

as.POSIXlt (and as.POSIXct) are for input. Use either format or strftime for output. See ?strftime for details on format strings:

 tm <- as.POSIXlt(Sys.time(), "UTC")
strftime(tm , "%Y-%m-%dT%H:%M:%S%z")
#[1] "2015-04-08T15:11:22+0000"

The third parameter of as.POSIXlt, format, is used when the first parameter is a string-like value that needs to be parsed. Since we are passing in a Date value from Sys.time, the format is ignored.

I don't think that the colon in the timezone output is requirement of the ISO 8601 format but I could be wrong on that point. The help page says the standard is POSIX 1003.1. May need to put in the colon with a regex substitution if needed.

After looking at http://dotat.at/tmp/ISO_8601-2004_E.pdf I see that there is no colon in the "basic" format" timezone representation, but there is one in the "extended format".



Related Topics



Leave a reply



Submit