Milliseconds Puzzle When Calling Strptime in R

Milliseconds puzzle when calling strptime in R

This is related to R-FAQ 7.31, though it takes a different-than-usual guise.

The behavior you are seeing results from a combination of: (a) the inexact representation of (most) decimal values by binary computers; and (b) the documented behavior of strftime and strptime, which is to truncate rather than round the fractional parts of seconds, to the specified number of decimal places.

From the ?strptime help file (the key word being 'truncated'):

Specific to R is ‘%OSn’, which for output gives the seconds
truncated to ‘0 <= n <= 6’ decimal places (and if ‘%OS’ is not
followed by a digit, it uses the setting of
‘getOption("digits.secs")’, or if that is unset, ‘n = 3’).

An example will probably illustrate what's going on more effectively than further explanation:

strftime('2011-10-11 07:49:36.3', format="%Y-%m-%d %H:%M:%OS6")
[1] "2011-10-11 07:49:36.299999"

strptime('2012-01-16 12:00:00.3', format="%Y-%m-%d %H:%M:%OS1")
[1] "2012-01-16 12:00:00.2"

In the example above, the fractional '.3' must be best approximated by a binary number that is slightly less than '0.300000000000000000' -- something like '0.29999999999999999'. Because strptime and strftime truncate rather than round to the specified decimal place, 0.3 will be converted to 0.2, if the number of decimal places is set to 1. The same logic holds for your example times, of which half exhibit this behavior, as would (on average) be expected.

R issue with rounding milliseconds

I don't see that:

> options(digits.secs = 4)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"
> options(digits.secs = 3)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"

with

> sessionInfo()
R version 2.15.0 Patched (2012-04-14 r59019)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

With the "%OSn" format strings, one forces truncation. If the fractional second cannot be represented exactly in floating points then the truncation may very well go the wrong way. If you see things going to wrong way you can also round explicitly to the unit you want or add a half of the fraction you wish to operate at (in the case shown 0.0005):

> t1 <- as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
> t1
[1] "2012-06-07 13:29:56.061 UTC"
> t1 + 0.0005
[1] "2012-06-07 13:29:56.061 UTC"

(but a I said, I don't see the problem here.)

This latter point was made by Simon Urbanek on the R-Devel mailing list on 30-May-2012.

Adding milliseconds to Time Stamp in R

As explained here (and by @nrussel in comments above) this is caused by strftime()'s truncation (not rounding!) of the machine's slightly imprecise floating point representation of fractional seconds.

As a perhaps slightly kludgy fix, you could write a small printing function that adds a very small value -- greater than .Machine$double.eps but less than .01 -- to each time value before printing it.

strftime2 <- function(x, ...) {
strftime(x + 1e-6, ...)
}

## Compare results
strftime(strptime(Time,format="%H:%M:%OS")+(RTime %% 1),format="%H:%M:%OS2")
# [1] "09:33:23.77" "09:35:25.27" "09:36:26.98"
strftime2(strptime(Time,format="%H:%M:%OS")+(RTime %% 1),format="%H:%M:%OS2")
# [1] "09:33:23.78" "09:35:25.28" "09:36:26.98"

How to parse milliseconds?

Courtesy of the ?strptime help file (with the example changed to your value):

> z <- strptime("2010-01-15 13:55:23.975", "%Y-%m-%d %H:%M:%OS")
> z # prints without fractional seconds
[1] "2010-01-15 13:55:23 UTC"

> op <- options(digits.secs=3)
> z
[1] "2010-01-15 13:55:23.975 UTC"

> options(op) #reset options

Timestamp R Sequence Milliseconds

Due to rounding down issues with milliseconds in R (see this post), you need to add a tiny fractional amount to the vector. And the length.out should be n+1, not n.

df_blank  <- data.frame(Timestamp = seq.POSIXt(Time1, Time2, length.out=n+1) + 0.0001)
head(df_blank)
# Timestamp
#1 2018-06-01 00:00:00.000
#2 2018-06-01 00:00:00.100
#3 2018-06-01 00:00:00.200
#4 2018-06-01 00:00:00.300
#5 2018-06-01 00:00:00.400
#6 2018-06-01 00:00:00.500

Without the addition of the tiny amount, you can see the problem.

df_blank  <- data.frame(Timestamp = seq.POSIXt(Time1, Time2, length.out=n+1))

head(format(df_blank, "%Y-%m-%d %H:%M:%OS6"))
# Timestamp
#1 2018-06-01 00:00:00.000000
#2 2018-06-01 00:00:00.099999
#3 2018-06-01 00:00:00.200000
#4 2018-06-01 00:00:00.299999
#5 2018-06-01 00:00:00.400000
#6 2018-06-01 00:00:00.500000

And without the formatting, you see what appears to be a very strange sequence.

head(df_blank)
# Timestamp
#1 2018-06-01 00:00:00.0
#2 2018-06-01 00:00:00.0
#3 2018-06-01 00:00:00.2
#4 2018-06-01 00:00:00.2
#5 2018-06-01 00:00:00.4
#6 2018-06-01 00:00:00.5

zoo/xts microsecond read issue

You want the index to be a time class such as POSIXct or POSIXlt. Also, your format argument wasn't quite right. Try this

read.zoo("~/sample.txt", header = TRUE, format="%H:%M:%OS", FUN=as.POSIXct)

Which, for the sample data provided, gives

read.zoo(text="           Time  Set1    Set2   
10:19:38.551629 16234 16236
10:19:41.408010 16234 16236
10:19:47.264204 16234 16236 ", header = TRUE, format="%H:%M:%OS", FUN=as.POSIXct)
# Set1 Set2
#2012-06-21 10:19:38.551629 16234 16236
#2012-06-21 10:19:41.408010 16234 16236
#2012-06-21 10:19:47.264204 16234 16236

Strptime fails when working with a dataframe

You only need to cast your calls to strptime to POSIXct explicitly:

aDateInPOSIXct <- as.POSIXct(strptime("2018-12-31", format = "%Y-%m-%d"))
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- as.POSIXct(strptime("2019-01-01", format = "%Y-%m-%d"))

df[1,1] <- bDateInPOSIXct

Check the R documentation which says:

Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct".

Accurately converting from character- POSIXct- character with sub millisecond datetimes

Two things:

1) @statquant is right (and the otherwise known experts @Joshua Ulrich and @Dirk Eddelbuettel are wrong), and @Aaron in his comment, but that will not be important for the main question here:

POSIXlt by design is definitely more accurate in storing times than POSIXct: As its seconds are always in [0, 60), it has a granularity of about 6e-15, i.e., 6 femtoseconds which would be dozens of million times less granular than POSIXct.

However, this is not very relevant here (and for current R): Almost all operations, notably numeric ones, use the Ops group method (yes, not known to beginners, but well documented), just look at Ops.POSIXt which indeed trashes the extra precision by first coercing to POSIXct. In addition, the format()/print() ing uses 6 decimals after the "." at most, and hence also does not distinguish between the internally higher precision of POSIXlt and the "only" 100 nanosecond granularity of POSIXct.

(For the above reason, both Dirk and Joshua were lead to their wrong assertion: For all simple practical uses, the precision of *lt and *ct is made the same).

2) I do tend to agree that we (R Core) should improve the format()ing and hence print()ing of such fractions of seconds POSIXt objects (still after the bug fix mentioned by @Aaron above).

But then I may be wrong, and "we" have got it right, by some definition of "right" ;-)

In R, is the %OSn time format only valid for formatting, but not parsing?

This is expected behavior, not a bug. "%OSn" is for output. "%OS" is for input, and includes fractional seconds, as it says in your second blockquote:

Further, for strptime %OS will input seconds including fractional seconds.

options(digits.secs=6)
as.POSIXct("2015-06-09 11:24:19.002", "America/New_York", "%Y-%m-%d %H:%M:%OS")
# [1] "2015-06-09 11:24:19.002 EDT"

Also note that "EST" is an ambiguous timezone, and probably not what you expect. See the Time zone names section of ?timezone.



Related Topics



Leave a reply



Submit