Parsing ISO8601 date and time format in R
%z
is the signed offset in hours, in the format hhmm
, not hh:mm
. Here's one way to remove the last :
.
newstring <- gsub("(.*).(..)$","\\1\\2","2013-04-05T07:49:54-07:00")
(timep <- strptime(newstring, "%Y-%m-%dT%H:%M:%S%z", tz="UTC"))
# [1] "2013-04-05 14:49:54 UTC"
Also note that you don't have to remove the "T"
.
Parse ISO 8601 date-time in format YYYY-MM-DDTHH-MM-SSZ
You can simply parse the timestamp by specifying the format in as.POSIXct
(or strptime
)
as.POSIXct("2019-05-15T01:42:15.072Z", format = "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC")
#[1] "2019-05-15 01:42:15 UTC"
Explanation:
%Y
, %m
and %d
denote the year (with century), month and day; %H
, %M
and %OS
denote the hours, minutes and seconds (including milliseconds). The T
and Z
are simply added to the format
string, because
Any character in the format string not part of a conversion specification is interpreted literally
See ?strptime
for the different conversion specifications.
A comment on timezones
As the Z
denotes UTC times, we have manually added tz = "UTC"
to as.POSIXct
(as pointed out by @BennyJobigan). If you wanted the timestamp to be converted to your local (target) timezone, you can do
# In timezone of target, i.e. convert from UTC to local
lubridate::with_tz(
as.POSIXct("2019-05-15T01:42:15.072Z", format = "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC"),
tz = Sys.timezone())
# [1] "2019-05-15 11:42:15 AEST"
(Obviously the output depends on your local timezone and might be different from what I get.)
Parsing ISO8601 in R
Strictly speaking, you can't. I don't need to know anything about r or cran (or even what they are) to tell you that, because I know ISO 8601 well enough to know that just knowing something is ISO 8601 is not enough to unambiguously know that what is meant by it, especially in the shorter forms.
Find out what profile of ISO 8601 the other party is using. If they don't know what you're talking about, then you will be doing them a favour when you point out what I just said in the paragraph above. As I wrote once elsewhere,
Unfortunately many people think of a particular profile they are familiar with when they hear “ISO 8601”, other people know that using 8601 is a Good Thing but are not familiar with the details of implementation. Hence a spec or requirements document might mention 8601 but not be more explicit than that. In such cases it’s important to seek clarification rather than assume that the format you think of as “ISO 8601” is the correct one to use.
So, tell them "'ISO 8601' is not specific enough, I need to know exactly what you are doing, what your limits on precision are." (And possibly what your policy on dates prior to 1582 and perhaps again prior to 0001 are, your policy on leap-seconds, and a few other things left open but the standard)
Then whatever you're dealing with should be easy enough: Aside from this point of ambiguity, it is a pretty straight-forward standard. It should just be thought of as a standard about defining date formats, more than one that defines a date format.
Parsing string/ISO 8601 Date
You can provide more formats with format=
, and in this case add %z
, which ?strptime
defines as
'%z' Signed offset in hours and minutes from UTC, so '-0800' is 8
hours behind UTC. Values up to '+1400' are accepted.
(Standard only for output. For input R currently supports it
on all platforms.)
library(lubridate)
as_datetime("20200603231413-0400", format = "%Y%m%d%H%M%S%z", tz = "America/New_York")
# [1] "2020-06-03 23:14:13 EDT"
as_datetime("20200628203000+0000", format = "%Y%m%d%H%M%S%z", tz = "America/New_York")
# [1] "2020-06-28 16:30:00 EDT"
as_datetime("20200528203116+0000", format = "%Y%m%d%H%M%S%z", tz = "America/New_York")
# [1] "2020-05-28 16:31:16 EDT"
Extract month from ISO-8601 format date using R
You have a small logic error. You take your input data and parse it -- so far, so good:
> v <- c("2019-12-31T17:05:00Z", "2019-12-31T23:14:00Z", "2020-01-01T11:00:00Z")
> pt <- as.POSIXct(v)
> pt
[1] "2019-12-31 CST" "2019-12-31 CST" "2020-01-01 CST"
>
Now that you have date(time) information, you can format()
or strftime()
as you like:
> strftime(pt, "%m")
[1] "12" "12" "01"
> strftime(pt, "%b")
[1] "Dec" "Dec" "Jan"
> strftime(pt, "%B")
[1] "December" "December" "January"
>
Converting from character
to int
is also easy. Note that all this was done with base R without any additional packages.
Calculating time difference in R with ISO 8601 data format
As the error message suggests to use difftime
you need values of class POSIXct
. In the data the values are of type character stored inside a list. Another thing to notice is that the data contains timestamp from different timezones (-04:00
and -05:00
). Thankfully, ymd_hms
from lubridate
can handle it automatically for us.
library(dplyr)
library(lubridate)
recordingdata %>%
mutate(across(c(start, end), unlist),
across(c(start, end), ymd_hms),
difference = difftime(end, start, units = 'mins'))
# start end difference
#1 2018-10-04 18:00:12 2018-10-04 19:18:07 77.91420 mins
#2 2018-10-25 15:05:29 2018-10-25 16:16:14 70.73660 mins
#3 2018-10-11 15:04:04 2018-10-11 16:13:38 69.55299 mins
#4 2019-01-24 19:02:47 2019-01-24 20:16:50 74.04655 mins
#5 2019-01-16 20:31:36 2019-01-16 21:21:15 49.65784 mins
#6 2018-11-27 16:04:25 2018-11-27 17:16:36 72.17727 mins
#7 2018-09-27 17:59:20 2018-09-27 19:16:23 77.04634 mins
#8 2018-10-23 15:03:57 2018-10-23 16:18:01 74.06905 mins
#...
#...
Current time in ISO 8601 format
as.POSIXlt
(and as.POSIXct
) are for input. Use either format
or strftime
for output. See ?strftime for details on format strings:
tm <- as.POSIXlt(Sys.time(), "UTC")
strftime(tm , "%Y-%m-%dT%H:%M:%S%z")
#[1] "2015-04-08T15:11:22+0000"
The third parameter of as.POSIXlt
, format
, is used when the first parameter is a string-like value that needs to be parsed. Since we are passing in a Date value from Sys.time
, the format
is ignored.
I don't think that the colon in the timezone output is requirement of the ISO 8601 format but I could be wrong on that point. The help page says the standard is POSIX 1003.1. May need to put in the colon with a regex substitution if needed.
After looking at http://dotat.at/tmp/ISO_8601-2004_E.pdf I see that there is no colon in the "basic" format" timezone representation, but there is one in the "extended format".
Related Topics
Split or Separate Uneven/Unequal Strings with No Delimiter
Count Total Missing Values by Group
R Lpsolve Binary Find All Possible Solutions
R How to Remove Rows in a Data Frame Based on the First Character of a Column
Constroptim in R - Init Val Is Not in the Interior of the Feasible Region Error
Join Two Data Tables and Use Only One Column from Second Dt
Draw Lines Between Different Elements in a Stacked Bar Plot
De-Aggregate/Reverse-Summarise/Expand a Dataset in R
How to Add Se Error Bars to My Barplot in Ggplot2
Update a Ggplot Using a for Loop (R)
Update Subset of Values in a Dataframe Column
Getsymbols and Using Lapply, Cl, and Merge to Extract Close Prices
Contrast Between Label and Background: Determine If Color Is Light or Dark
How to Figure Third Friday of a Month in R
R - How to Get a Value of a Multi-Dimensional Array by a Vector of Indices