R Issue with Rounding Milliseconds

R issue with rounding milliseconds

I don't see that:

> options(digits.secs = 4)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"
> options(digits.secs = 3)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"

with

> sessionInfo()
R version 2.15.0 Patched (2012-04-14 r59019)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

With the "%OSn" format strings, one forces truncation. If the fractional second cannot be represented exactly in floating points then the truncation may very well go the wrong way. If you see things going to wrong way you can also round explicitly to the unit you want or add a half of the fraction you wish to operate at (in the case shown 0.0005):

> t1 <- as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
> t1
[1] "2012-06-07 13:29:56.061 UTC"
> t1 + 0.0005
[1] "2012-06-07 13:29:56.061 UTC"

(but a I said, I don't see the problem here.)

This latter point was made by Simon Urbanek on the R-Devel mailing list on 30-May-2012.

Milliseconds puzzle when calling strptime in R

This is related to R-FAQ 7.31, though it takes a different-than-usual guise.

The behavior you are seeing results from a combination of: (a) the inexact representation of (most) decimal values by binary computers; and (b) the documented behavior of strftime and strptime, which is to truncate rather than round the fractional parts of seconds, to the specified number of decimal places.

From the ?strptime help file (the key word being 'truncated'):

Specific to R is ‘%OSn’, which for output gives the seconds
truncated to ‘0 <= n <= 6’ decimal places (and if ‘%OS’ is not
followed by a digit, it uses the setting of
‘getOption("digits.secs")’, or if that is unset, ‘n = 3’).

An example will probably illustrate what's going on more effectively than further explanation:

strftime('2011-10-11 07:49:36.3', format="%Y-%m-%d %H:%M:%OS6")
[1] "2011-10-11 07:49:36.299999"

strptime('2012-01-16 12:00:00.3', format="%Y-%m-%d %H:%M:%OS1")
[1] "2012-01-16 12:00:00.2"

In the example above, the fractional '.3' must be best approximated by a binary number that is slightly less than '0.300000000000000000' -- something like '0.29999999999999999'. Because strptime and strftime truncate rather than round to the specified decimal place, 0.3 will be converted to 0.2, if the number of decimal places is set to 1. The same logic holds for your example times, of which half exhibit this behavior, as would (on average) be expected.

Converting milliseconds to hh:mm format by considering the rounding in r

You can use round_date from the lubridate package to round to the nearest minute after converting milliseconds to a POSIXct object and before formatting it with strftime

library(lubridate)

x <- c(3159763, 2839300, 3821900)
y <- as.POSIXct.numeric(x/1000, origin = '1970-01-01')
z <- lubridate::round_date(y, unit = 'minute')

strftime(z, format = '%R', tz='GMT')

[1] "00:53" "00:47" "01:04"

Rounding milliseconds of POSIXct in data.table v1.9.2 (ok in 1.8.10)

Yes I reproduced your result with v1.9.2.

library(data.table)

DT <- data.table(timestamp=c(as.POSIXct("2013-01-01 17:51:00.707"),
as.POSIXct("2013-01-01 17:51:59.996"),
as.POSIXct("2013-01-01 17:52:00.059"),
as.POSIXct("2013-01-01 17:54:23.901"),
as.POSIXct("2013-01-01 17:54:23.914")))

options(digits.secs=3) # usually placed in .Rprofile

DT
timestamp
1: 2013-01-01 17:51:00.707
2: 2013-01-01 17:51:59.996
3: 2013-01-01 17:52:00.059
4: 2013-01-01 17:54:23.901
5: 2013-01-01 17:54:23.914

duplicated(DT)
## [1] FALSE FALSE FALSE FALSE TRUE

Update from v1.9.3 from Matt

There was a change to rounding in v1.9.2 which affected milliseconds of POSIXct. More info here :

Grouping very small numbers (e.g. 1e-28) and 0.0 in data.table v1.8.10 vs v1.9.2

Large integers in data.table. Grouping results different in 1.9.2 compared to 1.8.10

So, the workaround now available in v1.9.3 is :

> setNumericRounding(1)   # default is 2
> duplicated(DT)
[1] FALSE FALSE FALSE FALSE FALSE

Hope you understand why the change was made and agree that we're going in the right direction.

Of course, you shouldn't have to call setNumericRounding(), that's just a workaround.

I've filed a new item on the tracker :

#5445 numeric rounding should be 0 or 1 automatically for POSIXct

how to safely store millisecond differences between timestamps?

Some considerations, some I think you already know:

  • floating-point will rarely give you perfectly 58 milliseconds (due to R FAQ 7.31 and IEEE-754);

  • display of the data can be managed on the console with options(digits.secs=3) (and digits=3) and in reports with sprintf, format, or round;

  • calculation "goodness" can be improved if you round before calculation; while this is a little more onerous, as long as we can safely assume that the data is accurate to at least milliseconds, this holds mathematically.

If you're concerned about introducing errors in the data, though, an alternative is to encode as milliseconds (instead of the R norm of seconds). If you can choose an arbitrary and recent (under 24 days) reference point, then you can do it with normal integer, but if that is insufficient or you prefer to use epoch milliseconds, then you need to jump to 64-bit integers, perhaps with bit64.

now <- Sys.time()
as.integer(now)
# [1] 1583507603
as.integer(as.numeric(now) * 1000)
# Warning: NAs introduced by coercion to integer range
# [1] NA
bit64::as.integer64(as.numeric(now) * 1000)
# integer64
# [1] 1583507603439

Round from milliseconds to minutes

If you're rounding up, then the last element should go up to 4 (since 30.02 seconds rounds to 1 minute). Here's an idea using strptime(), rounding the minutes.

## replace the last colon with a decimal point
st <- sub("(.*):(.*)", "\\1.\\2", StartTime)
## convert to POSIXlt and grab the rounded minutes
round(strptime(st, "%H:%M:%OS"), "mins")$min
# [1] 0 0 2 4

Timestamp R Sequence Milliseconds

Due to rounding down issues with milliseconds in R (see this post), you need to add a tiny fractional amount to the vector. And the length.out should be n+1, not n.

df_blank  <- data.frame(Timestamp = seq.POSIXt(Time1, Time2, length.out=n+1) + 0.0001)
head(df_blank)
# Timestamp
#1 2018-06-01 00:00:00.000
#2 2018-06-01 00:00:00.100
#3 2018-06-01 00:00:00.200
#4 2018-06-01 00:00:00.300
#5 2018-06-01 00:00:00.400
#6 2018-06-01 00:00:00.500

Without the addition of the tiny amount, you can see the problem.

df_blank  <- data.frame(Timestamp = seq.POSIXt(Time1, Time2, length.out=n+1))

head(format(df_blank, "%Y-%m-%d %H:%M:%OS6"))
# Timestamp
#1 2018-06-01 00:00:00.000000
#2 2018-06-01 00:00:00.099999
#3 2018-06-01 00:00:00.200000
#4 2018-06-01 00:00:00.299999
#5 2018-06-01 00:00:00.400000
#6 2018-06-01 00:00:00.500000

And without the formatting, you see what appears to be a very strange sequence.

head(df_blank)
# Timestamp
#1 2018-06-01 00:00:00.0
#2 2018-06-01 00:00:00.0
#3 2018-06-01 00:00:00.2
#4 2018-06-01 00:00:00.2
#5 2018-06-01 00:00:00.4
#6 2018-06-01 00:00:00.5

R: xts timestamp differ from real data timestamp by 1 millisecond

This is similar to R issue with rounding milliseconds. One simple solution would be adding 0.5 ms as suggested there:

tt_ts <- strptime(tt[,1],"%Y-%m-%d %H:%M:%OS") + 0.0005
xts::xts(x=tt[,c(-1)], order.by=tt_ts)
# [,1]
# 2018-03-01 09:51:59.969 30755.5
# 2018-03-01 09:51:59.969 30755.0
# 2018-03-01 09:51:59.970 30755.5
# 2018-03-01 09:51:59.971 30756.0
# 2018-03-01 09:51:59.987 30756.5
# 2018-03-01 09:51:59.988 30756.5

We can see this from a simple example:

st <- strptime("2018-03-01 09:51:59.971", "%Y-%m-%d %H:%M:%OS")
format(st, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.971"
pt <- as.POSIXct(st)
format(pt, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.970"

After conversion to POSIXct the ms is wrong. Increasing the output precision, we see that the floating point number used to represent the time is just below the required value, but R truncates the number instead of rounding it:

format(pt, "%Y-%m-%d %H:%M:%OS6")
#> [1] "2018-03-01 09:51:59.970999"

Shifting by one half of the required precision fixes this.

format(pt + 0.0005, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.971"

Generally, if x is a number with 3 decimal digits, any number within the open range (x - 0.0005, x + 0.0005) would be rounded to x. On truncation, that would still work for those within [x, x + 0.0005). But those within (x - 0.0005, x) would be represented by x - 0.001 as you observed. If we shift the relevant number by 0.0005 before truncation, we are speaking about the range (x, x + 0.001). All these numbers will be truncated to x as wanted.

I am excluding the points x ± 0.0005 since there are different rules for rounding them and the actual floating point number representing the time point will be a lot closer to the desired value than this.

EDIT: Concerning the question in the comments about taking differences: There it should not matter whether you add half a milli-second or not if you add it to both points. Example with a time point that needs adjustment on its own:

st1 <- strptime("2018-03-01 09:51:59.971", "%Y-%m-%d %H:%M:%OS")
format(st1, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.970"
pt1 <- as.POSIXct(st1)
format(pt1, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.970"
format(pt1 + 0.0005, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.971"

And a time point that does not need adjustment:

st2 <- strptime("2018-03-01 09:51:59.969", "%Y-%m-%d %H:%M:%OS")
format(st2, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.969"
pt2 <- as.POSIXct(st2)
format(pt2, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.969"
format(pt2 + 0.0005, "%Y-%m-%d %H:%M:%OS3")
#> [1] "2018-03-01 09:51:59.969"

Difference is the same independent of any adjustment:

difftime(pt1, pt2, "secs")                                      
#> Time difference of 0.001999855 secs
difftime(pt1 + 0.0005, pt2 + 0.0005, "secs")
#> Time difference of 0.001999855 secs

How to prevent R from rounding seconds

You can use the lubridate package to maintain the full precision of your data.

times <-
c(
"19:56:05.938836",
"19:56:06.269024",
"19:56:06.868525",
"19:56:15.080690",
"19:56:15.422007",
"19:56:16.132036"
)

hms(times)
# [1] "19H 56M 5.938836S" "19H 56M 6.269024S" "19H 56M 6.868525S" "19H 56M 15.08069S"
# [5] "19H 56M 15.422007S" "19H 56M 16.132036S"


Related Topics



Leave a reply



Submit