Accurately Converting from Character->Posixct->Character with Sub Millisecond Datetimes

Accurately converting from character- POSIXct- character with sub millisecond datetimes

Two things:

1) @statquant is right (and the otherwise known experts @Joshua Ulrich and @Dirk Eddelbuettel are wrong), and @Aaron in his comment, but that will not be important for the main question here:

POSIXlt by design is definitely more accurate in storing times than POSIXct: As its seconds are always in [0, 60), it has a granularity of about 6e-15, i.e., 6 femtoseconds which would be dozens of million times less granular than POSIXct.

However, this is not very relevant here (and for current R): Almost all operations, notably numeric ones, use the Ops group method (yes, not known to beginners, but well documented), just look at Ops.POSIXt which indeed trashes the extra precision by first coercing to POSIXct. In addition, the format()/print() ing uses 6 decimals after the "." at most, and hence also does not distinguish between the internally higher precision of POSIXlt and the "only" 100 nanosecond granularity of POSIXct.

(For the above reason, both Dirk and Joshua were lead to their wrong assertion: For all simple practical uses, the precision of *lt and *ct is made the same).

2) I do tend to agree that we (R Core) should improve the format()ing and hence print()ing of such fractions of seconds POSIXt objects (still after the bug fix mentioned by @Aaron above).

But then I may be wrong, and "we" have got it right, by some definition of "right" ;-)

From character to date in R

You do not need to eliminate the +0000, it is the offset expressed in minutes from the UTC and some times (not this one which is 0000) it might be very useful and convert the data in the proper way.

Here is a working example with strptime:

x <- "Fri Sep 18 17:01:33 +0000 2015"
strptime(x, "%a %b %d %H:%M:%S %z %Y", tz = "UTC")
[1] "2015-09-18 17:01:33 UTC"

Unexpected behavior with POSIXct datetimes under diff

There is a units<- function for difftime objects:

> units(del) <- 'hours'
> table(del)
del
0 1
1 46

The ?difftime help page says:

If units = "auto", a suitable set of units is chosen, the largest possible (excluding "weeks") in which all the absolute differences are greater than one.

So perhaps the logic of the function got sidetracked by the 0 value in your case and the units got set to seconds.

Odd behavor with POSIXct/POSIXlt and subsecond accuracy

@GSee is right, this is a floating point arithmetic problem. And Gavin Simpson's answer is correct in that it's how the object is printed.

R> options(digits=17)
R> .index(x)
[1] 1295589600.0009999 1295589600.0020001 1295589600.0030000 1295589600.0039999
[5] 1295589600.0050001 1295589600.0060000 1295589600.0070000 1295589600.0079999
[9] 1295589600.0090001 1295589600.0100000

All the precision is there, but these lines in format.POSIXlt cause options(digits.secs=6) to not be honored.

np <- getOption("digits.secs")
if (is.null(np))
np <- 0L
else
np <- min(6L, np)
if (np >= 1L) {
for (i in seq_len(np) - 1L) {
if (all(abs(secs - round(secs, i)) < 1e-06)) {
np <- i
break
}
}
}

Due to precision issues, in your example np is reset to 3 in the above for loop. And the format "%Y-%m-%d %H:%M:%OS3" yields the times you posted. You can see the times are accurate if you use the "%Y-%m-%d %H:%M:%OS6" format.

R> format(as.POSIXlt(index(x)[1:2]), "%Y-%m-%d %H:%M:%OS3")
[1] "2011-01-21 00:00:00.000" "2011-01-21 00:00:00.002"
R> format(as.POSIXlt(index(x)[1:2]), "%Y-%m-%d %H:%M:%OS6")
[1] "2011-01-21 00:00:00.000999" "2011-01-21 00:00:00.002000"

In R, is the %OSn time format only valid for formatting, but not parsing?

This is expected behavior, not a bug. "%OSn" is for output. "%OS" is for input, and includes fractional seconds, as it says in your second blockquote:

Further, for strptime %OS will input seconds including fractional seconds.

options(digits.secs=6)
as.POSIXct("2015-06-09 11:24:19.002", "America/New_York", "%Y-%m-%d %H:%M:%OS")
# [1] "2015-06-09 11:24:19.002 EDT"

Also note that "EST" is an ambiguous timezone, and probably not what you expect. See the Time zone names section of ?timezone.

Milliseconds puzzle when calling strptime in R

This is related to R-FAQ 7.31, though it takes a different-than-usual guise.

The behavior you are seeing results from a combination of: (a) the inexact representation of (most) decimal values by binary computers; and (b) the documented behavior of strftime and strptime, which is to truncate rather than round the fractional parts of seconds, to the specified number of decimal places.

From the ?strptime help file (the key word being 'truncated'):

Specific to R is ‘%OSn’, which for output gives the seconds
truncated to ‘0 <= n <= 6’ decimal places (and if ‘%OS’ is not
followed by a digit, it uses the setting of
‘getOption("digits.secs")’, or if that is unset, ‘n = 3’).

An example will probably illustrate what's going on more effectively than further explanation:

strftime('2011-10-11 07:49:36.3', format="%Y-%m-%d %H:%M:%OS6")
[1] "2011-10-11 07:49:36.299999"

strptime('2012-01-16 12:00:00.3', format="%Y-%m-%d %H:%M:%OS1")
[1] "2012-01-16 12:00:00.2"

In the example above, the fractional '.3' must be best approximated by a binary number that is slightly less than '0.300000000000000000' -- something like '0.29999999999999999'. Because strptime and strftime truncate rather than round to the specified decimal place, 0.3 will be converted to 0.2, if the number of decimal places is set to 1. The same logic holds for your example times, of which half exhibit this behavior, as would (on average) be expected.

as.POSIXct/as.POSIXlt doesn't like .61 milliseconds

If you change the format for the time part to be %H%M%OS instead of %H%M%S.%OS, it seems to parse correctly. You may have to adjust your options so see this:

as.POSIXlt(vec, tz = "EST", format = "%Y%m%d.%H%M%OS")
#[1] "2015-01-01 01:01:01 EST" "2015-01-01 01:01:01 EST"
#[3] "2015-01-01 01:01:01 EST"

options(digits.secs = 2)
as.POSIXlt(vec, tz = "EST", format = "%Y%m%d.%H%M%OS")
# [1] "2015-01-01 01:01:01.60 EST" "2015-01-01 01:01:01.61 EST"
# [3] "2015-01-01 01:01:01.62 EST"


Related Topics



Leave a reply



Submit