R Xts: .001 Millisecond in Index

R xts: .001 millisecond in index

I suspect this is a rounding/floating point issue:

Browse[2]> print(head(as.numeric(order.by)), digits = 20)
[1] 1332234170.0009999275 1332234170.0009999275 1332234170.0009999275
[4] 1332234170.0009999275 1332234170.0009999275 1332234170.0009999275

That was achieved by debugging xts() on a the call

foo <- xts(1:180, rep(as.POSIXlt("2012-03-20 09:02:50.001"), 180), 
unqiue = FALSE)

but you can see the problem via clearly via

> print(as.numeric(as.POSIXlt("2012-03-20 09:02:50.001")))
[1] 1332234170
> print(as.numeric(as.POSIXlt("2012-03-20 09:02:50.001")), digits = 20)
[1] 1332234170.0009999275

Indicating your fractional number of seconds can't be created nor stored at exactly .001 milliseconds. Whereas truncation as 3 dp will keep .002 as it is stored as:

> print(as.numeric(as.POSIXlt("2012-03-20 09:02:50.002")), digits = 20)
[1] 1332234170.0020000935

Truncating or rounding that to 3 dp will preserve the .002 part. One of the issues you have to deal with in working with computers.

Do note that this appears to just be an issue in the printed representation of the index dates:

> print(as.numeric(index(foo)[1]), digits = 20)
[1] 1332234170.0009999275

The precision (with floating point issues) is preserved in the actual object storing the index times - you just can't see that when printing the times to the console.

R xts: millisecond index

This works with package zoo so I suspect it works also with xts as the latter builds upon the former.

> ## create some times with milliseconds
> times <- Sys.time() + seq(0, by = 0.1, length = 10)
> times
[1] "2012-03-19 22:10:57.763 GMT" "2012-03-19 22:10:57.863 GMT"
[3] "2012-03-19 22:10:57.963 GMT" "2012-03-19 22:10:58.063 GMT"
[5] "2012-03-19 22:10:58.163 GMT" "2012-03-19 22:10:58.263 GMT"
[7] "2012-03-19 22:10:58.363 GMT" "2012-03-19 22:10:58.463 GMT"
[9] "2012-03-19 22:10:58.563 GMT" "2012-03-19 22:10:58.663 GMT"
> ZOO <- zoo(1:10, order = times)
> index(ZOO)
[1] "2012-03-19 22:10:57.763 GMT" "2012-03-19 22:10:57.863 GMT"
[3] "2012-03-19 22:10:57.963 GMT" "2012-03-19 22:10:58.063 GMT"
[5] "2012-03-19 22:10:58.163 GMT" "2012-03-19 22:10:58.263 GMT"
[7] "2012-03-19 22:10:58.363 GMT" "2012-03-19 22:10:58.463 GMT"
[9] "2012-03-19 22:10:58.563 GMT" "2012-03-19 22:10:58.663 GMT"

The trick to see the milliseconds is to alter the digits.secs option via options(). The above performed using:

> getOption("digits.secs")
[1] 3

Which is set using

> opts <- options(digits.secs = 3)

You can reset this to default (0) by doing options(opts). By default R doesn't print sub-second information because digits.secs defaults to 0. The data are recorded to sub-second accuracy though, even if not printed.

If this is not what you meant, can you explain what you did that was not working?

R: xts timestamp differ from real data timestamp by 1 millisecond

This is similar to R issue with rounding milliseconds. One simple solution would be adding 0.5 ms as suggested there:

tt_ts <- strptime(tt[,1],"%Y-%m-%d %H:%M:%OS") + 0.0005
xts::xts(x=tt[,c(-1)], order.by=tt_ts)
# [,1]
# 2018-03-01 09:51:59.969 30755.5
# 2018-03-01 09:51:59.969 30755.0
# 2018-03-01 09:51:59.970 30755.5
# 2018-03-01 09:51:59.971 30756.0
# 2018-03-01 09:51:59.987 30756.5
# 2018-03-01 09:51:59.988 30756.5

We can see this from a simple example:

st <- strptime("2018-03-01 09:51:59.971", "%Y-%m-%d %H:%M:%OS")
format(st, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.971"
pt <- as.POSIXct(st)
format(pt, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.970"

After conversion to POSIXct the ms is wrong. Increasing the output precision, we see that the floating point number used to represent the time is just below the required value, but R truncates the number instead of rounding it:

format(pt, "%Y-%m-%d %H:%M:%OS6")
#> [1] "2018-03-01 09:51:59.970999"

Shifting by one half of the required precision fixes this.

format(pt + 0.0005, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.971"

Generally, if x is a number with 3 decimal digits, any number within the open range (x - 0.0005, x + 0.0005) would be rounded to x. On truncation, that would still work for those within [x, x + 0.0005). But those within (x - 0.0005, x) would be represented by x - 0.001 as you observed. If we shift the relevant number by 0.0005 before truncation, we are speaking about the range (x, x + 0.001). All these numbers will be truncated to x as wanted.

I am excluding the points x ± 0.0005 since there are different rules for rounding them and the actual floating point number representing the time point will be a lot closer to the desired value than this.

EDIT: Concerning the question in the comments about taking differences: There it should not matter whether you add half a milli-second or not if you add it to both points. Example with a time point that needs adjustment on its own:

st1 <- strptime("2018-03-01 09:51:59.971", "%Y-%m-%d %H:%M:%OS")
format(st1, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.970"
pt1 <- as.POSIXct(st1)
format(pt1, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.970"
format(pt1 + 0.0005, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.971"

And a time point that does not need adjustment:

st2 <- strptime("2018-03-01 09:51:59.969", "%Y-%m-%d %H:%M:%OS")
format(st2, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.969"
pt2 <- as.POSIXct(st2)
format(pt2, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.969"
format(pt2 + 0.0005, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.969"

Difference is the same independent of any adjustment:

difftime(pt1, pt2, "secs")                                      
#> Time difference of 0.001999855 secs
difftime(pt1 + 0.0005, pt2 + 0.0005, "secs")
#> Time difference of 0.001999855 secs

R issue with rounding milliseconds

I don't see that:

> options(digits.secs = 4)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"
> options(digits.secs = 3)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"

with

> sessionInfo()
R version 2.15.0 Patched (2012-04-14 r59019)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

With the "%OSn" format strings, one forces truncation. If the fractional second cannot be represented exactly in floating points then the truncation may very well go the wrong way. If you see things going to wrong way you can also round explicitly to the unit you want or add a half of the fraction you wish to operate at (in the case shown 0.0005):

> t1 <- as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
> t1
[1] "2012-06-07 13:29:56.061 UTC"
> t1 + 0.0005
[1] "2012-06-07 13:29:56.061 UTC"

(but a I said, I don't see the problem here.)

This latter point was made by Simon Urbanek on the R-Devel mailing list on 30-May-2012.

Timestamp R Sequence Milliseconds

Due to rounding down issues with milliseconds in R (see this post), you need to add a tiny fractional amount to the vector. And the length.out should be n+1, not n.

df_blank  <- data.frame(Timestamp = seq.POSIXt(Time1, Time2, length.out=n+1) + 0.0001)
head(df_blank)
# Timestamp
#1 2018-06-01 00:00:00.000
#2 2018-06-01 00:00:00.100
#3 2018-06-01 00:00:00.200
#4 2018-06-01 00:00:00.300
#5 2018-06-01 00:00:00.400
#6 2018-06-01 00:00:00.500

Without the addition of the tiny amount, you can see the problem.

df_blank  <- data.frame(Timestamp = seq.POSIXt(Time1, Time2, length.out=n+1))

head(format(df_blank, "%Y-%m-%d %H:%M:%OS6"))
# Timestamp
#1 2018-06-01 00:00:00.000000
#2 2018-06-01 00:00:00.099999
#3 2018-06-01 00:00:00.200000
#4 2018-06-01 00:00:00.299999
#5 2018-06-01 00:00:00.400000
#6 2018-06-01 00:00:00.500000

And without the formatting, you see what appears to be a very strange sequence.

head(df_blank)
# Timestamp
#1 2018-06-01 00:00:00.0
#2 2018-06-01 00:00:00.0
#3 2018-06-01 00:00:00.2
#4 2018-06-01 00:00:00.2
#5 2018-06-01 00:00:00.4
#6 2018-06-01 00:00:00.5

Odd behavor with POSIXct/POSIXlt and subsecond accuracy

@GSee is right, this is a floating point arithmetic problem. And Gavin Simpson's answer is correct in that it's how the object is printed.

R> options(digits=17)
R> .index(x)
[1] 1295589600.0009999 1295589600.0020001 1295589600.0030000 1295589600.0039999
[5] 1295589600.0050001 1295589600.0060000 1295589600.0070000 1295589600.0079999
[9] 1295589600.0090001 1295589600.0100000

All the precision is there, but these lines in format.POSIXlt cause options(digits.secs=6) to not be honored.

np <- getOption("digits.secs")
if (is.null(np))
np <- 0L
else
np <- min(6L, np)
if (np >= 1L) {
for (i in seq_len(np) - 1L) {
if (all(abs(secs - round(secs, i)) < 1e-06)) {
np <- i
break
}
}
}

Due to precision issues, in your example np is reset to 3 in the above for loop. And the format "%Y-%m-%d %H:%M:%OS3" yields the times you posted. You can see the times are accurate if you use the "%Y-%m-%d %H:%M:%OS6" format.

R> format(as.POSIXlt(index(x)[1:2]), "%Y-%m-%d %H:%M:%OS3")
[1] "2011-01-21 00:00:00.000" "2011-01-21 00:00:00.002"
R> format(as.POSIXlt(index(x)[1:2]), "%Y-%m-%d %H:%M:%OS6")
[1] "2011-01-21 00:00:00.000999" "2011-01-21 00:00:00.002000"

R lubridate ymd_hms millisecond diff

Edit: this answer Milliseconds in POSIXct Class addresses what is happening with POSIXct

(Note that you get rounding errors, and R's datetime formatting always rounds downwards, so if you show less decimal places it sometimes looks like you've lost a millisecond.)


The problem seems to exist with ymd_hms and also as.POSIXct.

If I call strptime directly, or use as.POSIXlt, the milliseconds parse correctly:

strptime(time, "%Y-%m-%d %H:%M:%OS", tz = "Europe/Helsinki")

as.POSIXlt(time, "%Y-%m-%d %H:%M:%OS", tz = "Europe/Helsinki")

Either of those options should fix your problem.

"2019-01-14 10:58:23.438 EET"

POSIXlt and POSIXct are behaving differently however:

as.POSIXlt(time, "%Y-%m-%d %H:%M:%OS", tz = "Europe/Helsinki") %>% 
format(., "%Y-%m-%d %H:%M:%OS6")

[1] "2019-01-14 10:58:23.438000"

as.POSIXct(time, "%Y-%m-%d %H:%M:%OS", tz = "Europe/Helsinki") %>%
format(., "%Y-%m-%d %H:%M:%OS6")

[1] "2019-01-14 10:58:23.437999"


Related Topics



Leave a reply



Submit