Weird As.Posixct Behavior Depending on Daylight Savings Time

Weird as.POSIXct behavior depending on daylight savings time

if you really want it printed with the times you can always do.

as.POSIXct("26/03/2006 02:05:38", format="%d/%m/%Y %H:%M:%S", tz = "UTC")
#[1] "2006-03-26 02:05:38 UTC"

Just make sure you do this for all conversions for consistency.

As Wikipedia states:

UTC does not change with a change of seasons, but local time or civil
time may change if a time zone jurisdiction observes daylight saving
time (summer time). For example, local time on the east coast of the
United States is five hours behind UTC during winter, but four hours
behind while daylight saving is observed there.

Behaviour of as.POSIXct in R with inconsistent daylight saving time data

The problem does not lie in the diff function. It lies with as.POSIX* combined with the DST (Daylight saving time). R does not handle this automatically.

On 25 march, 2018 02:00:00. The CET is set 1 hour foward, changing to CEST time officially. This means 2018-03-25 02:00:00 CET simply does not exist.

Why does this happen?

When calling as.POSIXct() some parameters are set as default. One of them is the tz (timezone) set at the system's default (mine is CET).

To clarify, I edited your dataset

ts <- c("2018-03-25 01:45:00", "2018-03-25 02:00:00", "2018-03-25 03:00:00")

Now we run the following line

as.POSIXct(ts)
#"2018-03-25 CET" "2018-03-25 CET" "2018-03-25 CET"

There is no format parameter given, so R will try different formats, resulting in the timestamps removed. So what if we force a format with timestamps? Running the following line will result in:

as.POSIXct(ts, format = "%Y-%m-%d %H:%M:%OS")
# "2018-03-25 01:45:00 CET" NA "2018-03-25 03:00:00 CEST"

Note that the second value (where a time actually does not exist) is coerced as NA. Because R cannot transform this value to "%Y-%m-%d %H:%M:%OS", it tries an easier format ("%Y-%m-%d"). Also note that the third value is in the CEST timezone, passing the DST time. Running the set through a transformation call with a different timezone given, the code succeeds:

as.POSIXct(ts[1:3], format = "%Y-%m-%d %H:%M:%OS", tz = "UTC")
#"2018-03-25 01:45:00 UTC" "2018-03-25 02:00:00 UTC" "2018-03-25 03:00:00 UTC"

Different parsing behaviour for the first day of April in R as.POSIXct and as.POSIXlt, is R april fooling me?

This is almost certainly a daylight savings time issue. Not sure why POSIXct and POSIXlt behave differently though. From your profile, it looks like you're in Mexico.

From here:

most of Mexico, including capital Mexico City, will set the clocks 1 hour forward 3 weeks later, on Sunday, April 1, 2012.

So the problem is that 2:58 AM on 1 April 2012 did not exist in the time zone that is currently active in your locale.

Unless there is something specific having to do with the POSIXct/POSIXlt difference, this should probably be closed as a duplicate of e.g.:

  • What is wrong with this date and time?
  • R POSIXct returns NA with "03/12/2017 02:17:13"
  • PosixCT conversion in R fails
  • Weird as.POSIXct behavior depending on daylight savings time
  • Strange strptime behavior in R
  • as.POSIX error, can not convert a particular date
  • Weird POSIX behaviour for two closely time strings with and without specifying the format

And this r help question

If you want to deal with this e.g. by setting all times to UTC (i.e. ignoring your local time zone settings), I believe there are lots of suggestions on Stack Overflow (now that you know to search for "daylight savings time" it should be easy to find them).

obligatory xkcd

Weird POSIX behaviour for two closely time strings with and without specifying the format

  1. Why the time zone differs between the two lines

As said in the comments, it differs due to daylight savings. Since you don't include the zone in the call to as.POSIXct, you are prone to many problems. When at all possible, be explicit with timezone. This is a no-kidding moment: if you know it (and it is not part of the string), never assume it will be inferred correctly. In my experience, it will get it wrong enough to be really annoying and very difficult to detect, find, and fix.



  1. Why when no format is given it ignores the times' portion

It does not, though it might look like it. This is only a symptom of how it is printed, not stored. (This is common in many of R's functions, for instance how it shows pi with only a handful of decimal places while it is certainly storing many more. Without this "representation versus actual precision" model, R's console would be unnecessarily full of decimal places and such, all the time.)

If I update your code to explicitly include zone:

as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel")
# [1] "2017-03-24 IST" "2017-03-24 IST"
as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel") + 1
# [1] "2017-03-24 00:00:01 IST" "2017-03-24 00:00:01 IST"

In the second case, I added one second to the times, and you see the time is now there. You can look at the internals to see it in a different way:

dput(as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel"))
# structure(c(1490306400, 1490306400), class = c("POSIXct", "POSIXt"
# ), tzone = "Israel")
dput(as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel")+1)
# structure(c(1490306401, 1490306401), tzone = "Israel", class = c("POSIXct",
# "POSIXt"))

Times are stored as floating point numbers and a special class. Between the two (without and with a 1-second addition), you can see that the numbers are just off-by-one.

A third way to confirm is to take the "missing time" posix objects and explicitly print to something (which is no longer POSIXct, but it's just for demo):

a <- as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel")
a
# [1] "2017-03-24 IST" "2017-03-24 IST"
format(a, format="the time is %Y-%m-%d %H:%M:%S")
# [1] "the time is 2017-03-24 00:00:00" "the time is 2017-03-24 00:00:00"


  1. Why does it fail to convert the first string when the format is specified?

As @Dave2e commented, according to the daylight savings conversions, that time "never happened".

According to https://www.timeanddate.com/time/change/israel/jerusalem?year=2017:

Mar 24, 2017 - Daylight Saving Time Started

When local standard time was about to reach
Friday, March 24, 2017, 2:00:00 am clocks were turned forward 1 hour to
Friday, March 24, 2017, 3:00:00 am local daylight time instead.

I interpret that to mean that the clock shifted from 01:59:59 to 03:00:00, so 02:**:** never happened. R is telling you with the NA that that time should not have occurred. There are certainly ways (hacks) you can infer that this is the case: find all NA values, then attempt to re-convert using plus or minus an hour; if the new value is not NA, then you found another instance where R thinks that time is not possible. If it is still NA, then there must be something else about the string (additional characters, different order, etc).

In my experience, I have not found this logic to ever be incorrect (though I don't know with certainty that it is flawless), even if it seems annoying. When I thought it might have been incorrect, I have always found something else that explained why I think I have that precise time:

  • data collection stored the wrong TZ
  • data collection failed to store the TZ, and I inferred incorrectly
  • some conversion in the pipeline mis-converted the times and/or zone(s)
  • likely something else I haven't rooted out

Handling dates when we switch to daylight savings time and back

I had a similar problem with hydrological data from a sensor. My timestamps were in UTC+1 (CET) and did not switch to daylight saving time (UTC+2, CEST). As I didn't want my data to be one hour off (which would be the case if UTC were used) I took the %z conversion specification of strptime. In ?strptime you'll find:

%z Signed offset in hours and minutes from UTC, so -0800 is 8 hours
behind UTC.

For example: In 2012, the switch from Standard Time to DST occured on 2012-03-25, so there is no 02:00 on this day. If you try to convert "2012-03-25 02:00:00" to a POSIXct-Object,

> as.POSIXct("2012-03-25 02:00:00", tz="Europe/Vienna")
[1] "2012-03-25 CET"

you don't get an error or a warning, you just get date without the time (this behavior is documented).

Using format = "%z" gives the desired result:

> as.POSIXct("2012-03-25 02:00:00 +0100", format="%F %T %z", tz="Europe/Vienna")
[1] "2012-03-25 03:00:00 CEST"

In order to facilitate this import, I wrote a small function with appropriate defaults values:

as.POSIXct.no.dst <- function (x, tz = "", format="%Y-%m-%d %H:%M", offset="+0100", ...)
{
x <- paste(x, offset)
format <- paste(format, "%z")
as.POSIXct(x, tz, format=format, ...)
}

> as.POSIXct.no.dst(c("2012-03-25 00:00", "2012-03-25 01:00", "2012-03-25 02:00", "2012-03-25 03:00"))
[1] "2012-03-25 00:00:00 CET" "2012-03-25 01:00:00 CET" "2012-03-25 03:00:00 CEST"
[4] "2012-03-25 04:00:00 CEST"

Weird POSIXct error

This is a daylight savings time issue: apparently 2 AM on 2013-03-10 doesn't exist in that time zone. Nevertheless, it's mildly interesting (at least to me) that as.POSIXct doesn't complain, but silently returns a slightly odd answer. One problem may be that R typically uses system libraries for some of this stuff, and so is at the whim of the underlying libraries ...

Incorporating useful information from the comments: @JoshUlrich points out that you can get around this (provided that the original data are really in GMT) by using Sys.setenv(TZ="GMT") before importing the data, since RODBC uses the system-level timezone rather than allowing you to specify it ...

Is there a reliable way to detect POSIXlt objects representing a time which does not exist due to DST?

The value of as.POSIXct(test) seems to be platform dependent, adding a layer of complexity to getting a reliable method. On my windows machine, (R 3.3.1), as.POSIXct(test) produces NA, as also reported by OP. However, on my Linux platform (same R version), I get the following:

times = c ("2015-03-29 01:00",
"2015-03-29 02:00",
"2015-03-29 03:00")

test <- strptime(times, format="%Y-%m-%d %H:%M", tz="CET")

test
#[1] "2015-03-29 01:00:00 CET" "2015-03-29 02:00:00 CEST" "2015-03-29 03:00:00 CEST"
as.POSIXct(test)
#[1] "2015-03-29 01:00:00 CET" "2015-03-29 01:00:00 CET" "2015-03-29 03:00:00 CEST"
as.character(test)
#[1] "2015-03-29 01:00:00" "2015-03-29 02:00:00" "2015-03-29 03:00:00"
as.character(as.POSIXct(test))
#[1] "2015-03-29 01:00:00" "2015-03-29 01:00:00" "2015-03-29 03:00:00"

The one thing that we can rely on is not the actual value of as.POSIXct(test), but that it will be different from test when test is an invalid date/time:

(as.character(test) == as.character(as.POSIXct(test))) %in% TRUE
# TRUE FALSE TRUE

I'm not sure that as.character is strictly necessary here, but I include it just to ensure that we don't fall foul of any other odd behaviours of POSIX objects.

PosixCT conversion in R fails

Not sure what you mean. It works just fine:

as.POSIXct("11MAR18:02:00:00",format="%d%b%y:%H:%M:%S")
#[1] "2018-03-11 02:00:00 AEDT"

as.POSIXct("10MAR18:02:00:00",format="%d%b%y:%H:%M:%S")
#[1] "2018-03-10 02:00:00 AEDT"

This is also works with a different time zone, e.g. tz = "UTC".

If this is not what you're after, can you please clarify?

Trouble dealing with POSIXct timezones and truncating the time out of POSIXct objects

If you don't specify a timezone then R will use your system's locale as POSIXct objects must have a timezone. The difference between CEST and CET is that one is summertime and one is not. That means if you define a date during the part of the year defined as summertime then R will decide to use the summertime version of the timezone. If you want to set dates that don't use summertime versions then define them as GMT from the beginning.

formatString = "%Y-%m-%d %H:%M:%OS"
x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString), tz="GMT")
y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString), tz="GMT")

If you want to truncate out the time, don't use as.Date on a POSIXct object since as.Date is meant to convert character objects to Date objects (which aren't the same as POSIXct objects). If you want to truncate POSIXct objects with base R then you'll have to wrap either round or trunc in as.POSIXct but I would recommend checking out the lubridate package for dealing with dates and times (specifically POSIXct objects).

If you want to keep CET but never use CEST you can use a location that doesn't observe daylight savings. According to http://www.timeanddate.com/time/zones/cet your only options are Algeria and Tunisia. According to https://en.wikipedia.org/wiki/List_of_tz_database_time_zones the valid tz would be "Africa/Algiers". Therefore you could do

 formatString = "%Y-%m-%d %H:%M:%OS"
x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString), tz="Africa/Algiers")
y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString), tz="Africa/Algiers")

and both x and y would be in CET.

One more thing about setting timezones. If you tell R you want a generic timezone then it won't override daylight savings settings. That's why setting attr(y, "tzone") <- "CET" didn't have the desired result. If you did attr(y, "tzone") <- "Africa/Algiers" then it would have worked as you expected. Do be careful with conversions though because when you change the timezone it will change the time to account for the new timezone. The package lubridate has the function force_tz which changes the timezone without changing the time for cases where the initial timezone setting was wrong but the time was right.



Related Topics



Leave a reply



Submit