Milliseconds puzzle when calling strptime in R
This is related to R-FAQ 7.31, though it takes a different-than-usual guise.
The behavior you are seeing results from a combination of: (a) the inexact representation of (most) decimal values by binary computers; and (b) the documented behavior of strftime
and strptime
, which is to truncate rather than round the fractional parts of seconds, to the specified number of decimal places.
From the ?strptime
help file (the key word being 'truncated'):
Specific to R is ‘%OSn’, which for output gives the seconds
truncated to ‘0 <= n <= 6’ decimal places (and if ‘%OS’ is not
followed by a digit, it uses the setting of
‘getOption("digits.secs")’, or if that is unset, ‘n = 3’).
An example will probably illustrate what's going on more effectively than further explanation:
strftime('2011-10-11 07:49:36.3', format="%Y-%m-%d %H:%M:%OS6")
[1] "2011-10-11 07:49:36.299999"
strptime('2012-01-16 12:00:00.3', format="%Y-%m-%d %H:%M:%OS1")
[1] "2012-01-16 12:00:00.2"
In the example above, the fractional '.3' must be best approximated by a binary number that is slightly less than '0.300000000000000000' -- something like '0.29999999999999999'. Because strptime
and strftime
truncate rather than round to the specified decimal place, 0.3 will be converted to 0.2, if the number of decimal places is set to 1. The same logic holds for your example times, of which half exhibit this behavior, as would (on average) be expected.
R issue with rounding milliseconds
I don't see that:
> options(digits.secs = 4)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"
> options(digits.secs = 3)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"
with
> sessionInfo()
R version 2.15.0 Patched (2012-04-14 r59019)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
With the "%OSn"
format strings, one forces truncation. If the fractional second cannot be represented exactly in floating points then the truncation may very well go the wrong way. If you see things going to wrong way you can also round explicitly to the unit you want or add a half of the fraction you wish to operate at (in the case shown 0.0005
):
> t1 <- as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
> t1
[1] "2012-06-07 13:29:56.061 UTC"
> t1 + 0.0005
[1] "2012-06-07 13:29:56.061 UTC"
(but a I said, I don't see the problem here.)
This latter point was made by Simon Urbanek on the R-Devel mailing list on 30-May-2012.
Adding milliseconds to Time Stamp in R
As explained here (and by @nrussel in comments above) this is caused by strftime()
's truncation (not rounding!) of the machine's slightly imprecise floating point representation of fractional seconds.
As a perhaps slightly kludgy fix, you could write a small printing function that adds a very small value -- greater than .Machine$double.eps
but less than .01
-- to each time value before printing it.
strftime2 <- function(x, ...) {
strftime(x + 1e-6, ...)
}
## Compare results
strftime(strptime(Time,format="%H:%M:%OS")+(RTime %% 1),format="%H:%M:%OS2")
# [1] "09:33:23.77" "09:35:25.27" "09:36:26.98"
strftime2(strptime(Time,format="%H:%M:%OS")+(RTime %% 1),format="%H:%M:%OS2")
# [1] "09:33:23.78" "09:35:25.28" "09:36:26.98"
How to parse milliseconds?
Courtesy of the ?strptime
help file (with the example changed to your value):
> z <- strptime("2010-01-15 13:55:23.975", "%Y-%m-%d %H:%M:%OS")
> z # prints without fractional seconds
[1] "2010-01-15 13:55:23 UTC"
> op <- options(digits.secs=3)
> z
[1] "2010-01-15 13:55:23.975 UTC"
> options(op) #reset options
Timestamp R Sequence Milliseconds
Due to rounding down issues with milliseconds in R (see this post), you need to add a tiny fractional amount to the vector. And the length.out should be n+1
, not n
.
df_blank <- data.frame(Timestamp = seq.POSIXt(Time1, Time2, length.out=n+1) + 0.0001)
head(df_blank)
# Timestamp
#1 2018-06-01 00:00:00.000
#2 2018-06-01 00:00:00.100
#3 2018-06-01 00:00:00.200
#4 2018-06-01 00:00:00.300
#5 2018-06-01 00:00:00.400
#6 2018-06-01 00:00:00.500
Without the addition of the tiny amount, you can see the problem.
df_blank <- data.frame(Timestamp = seq.POSIXt(Time1, Time2, length.out=n+1))
head(format(df_blank, "%Y-%m-%d %H:%M:%OS6"))
# Timestamp
#1 2018-06-01 00:00:00.000000
#2 2018-06-01 00:00:00.099999
#3 2018-06-01 00:00:00.200000
#4 2018-06-01 00:00:00.299999
#5 2018-06-01 00:00:00.400000
#6 2018-06-01 00:00:00.500000
And without the formatting, you see what appears to be a very strange sequence.
head(df_blank)
# Timestamp
#1 2018-06-01 00:00:00.0
#2 2018-06-01 00:00:00.0
#3 2018-06-01 00:00:00.2
#4 2018-06-01 00:00:00.2
#5 2018-06-01 00:00:00.4
#6 2018-06-01 00:00:00.5
zoo/xts microsecond read issue
You want the index to be a time class such as POSIXct
or POSIXlt
. Also, your format
argument wasn't quite right. Try this
read.zoo("~/sample.txt", header = TRUE, format="%H:%M:%OS", FUN=as.POSIXct)
Which, for the sample data provided, gives
read.zoo(text=" Time Set1 Set2
10:19:38.551629 16234 16236
10:19:41.408010 16234 16236
10:19:47.264204 16234 16236 ", header = TRUE, format="%H:%M:%OS", FUN=as.POSIXct)
# Set1 Set2
#2012-06-21 10:19:38.551629 16234 16236
#2012-06-21 10:19:41.408010 16234 16236
#2012-06-21 10:19:47.264204 16234 16236
Strptime fails when working with a dataframe
You only need to cast your calls to strptime
to POSIXct
explicitly:
aDateInPOSIXct <- as.POSIXct(strptime("2018-12-31", format = "%Y-%m-%d"))
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- as.POSIXct(strptime("2019-01-01", format = "%Y-%m-%d"))
df[1,1] <- bDateInPOSIXct
Check the R documentation which says:
Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct".
Accurately converting from character- POSIXct- character with sub millisecond datetimes
Two things:
1) @statquant is right (and the otherwise known experts @Joshua Ulrich and @Dirk Eddelbuettel are wrong), and @Aaron in his comment, but that will not be important for the main question here:
POSIXlt
by design is definitely more accurate in storing times than POSIXct
: As its seconds are always in [0, 60), it has a granularity of about 6e-15, i.e., 6 femtoseconds which would be dozens of million times less granular than POSIXct
.
However, this is not very relevant here (and for current R): Almost all operations, notably numeric ones, use the Ops
group method (yes, not known to beginners, but well documented), just look at Ops.POSIXt
which indeed trashes the extra precision by first coercing to POSIXct
. In addition, the format()/print() ing uses 6 decimals after the "." at most, and hence also does not distinguish between the internally higher precision of POSIXlt
and the "only" 100 nanosecond granularity of POSIXct
.
(For the above reason, both Dirk and Joshua were lead to their wrong assertion: For all simple practical uses, the precision of *lt and *ct is made the same).
2) I do tend to agree that we (R Core) should improve the format()
ing and hence print()
ing of such fractions of seconds POSIXt objects (still after the bug fix mentioned by @Aaron above).
But then I may be wrong, and "we" have got it right, by some definition of "right" ;-)
In R, is the %OSn time format only valid for formatting, but not parsing?
This is expected behavior, not a bug. "%OSn"
is for output. "%OS"
is for input, and includes fractional seconds, as it says in your second blockquote:
Further, for
strptime
%OS
will input seconds including fractional seconds.
options(digits.secs=6)
as.POSIXct("2015-06-09 11:24:19.002", "America/New_York", "%Y-%m-%d %H:%M:%OS")
# [1] "2015-06-09 11:24:19.002 EDT"
Also note that "EST"
is an ambiguous timezone, and probably not what you expect. See the Time zone names section of ?timezone
.
Related Topics
Overlay Geom_Points() on Geom_Boxplot(Fill=Group)
Time Series Plot with X Axis in "Year"-"Month" in R
Legends for Multiple Fills in Ggplot
Dual Y Axis in Ggplot2 for Multiple Panel Figure
Finding Where Two Linear Fits Intersect in R
How to Train a Ml Model in Sparklyr and Predict New Values on Another Dataframe
Find Consecutive Values in Vector in R
In Read.Table(): Incomplete Final Line Found by Readtableheader
Extract First Word from a Column and Insert into New Column
Minus Operation of Data Frames
Dplyr::N() Returns "Error: Error: N() Should Only Be Called in a Data Context "
Color Points with the Color as a Column in Ggplot2
How to Expand Axis Asymmetrically with Ggplot2 Without Setting Limits Manually
Sort a List of Nontrivial Elements in R