R As.Posixct Parsing Error

R as.POSIXct parsing error

You cannot parse non-existing times. 02:00:10 did not exist as we had 'spring forward' this Saturday night / Sunday morning with the switch to daylight-savings. R knows this:

R> t_1 = "03/13/2011 01:00:10"; as.POSIXct(t_1, format = time_format)
[1] "2011-03-13 01:00:10 CST"
R> t_2 = "03/13/2011 02:00:10"; as.POSIXct(t_2, format = time_format)
[1] "2011-03-13 01:00:10 CST"
R> t_3 = "03/13/2011 03:00:10"; as.POSIXct(t_3, format = time_format)
[1] "2011-03-13 03:00:10 CDT"
R>

On Linux, my timezone library seems to cope -- 02:00:10 becomes 01:00:10 as an hour is subtracted.

`as.POSIXct` get error with ` %Y-%m-%d %H:%M:%S ` format

The easy ones first:

  • optional = FALSE is the default: therefore #1 == #2 and #4 == #5
  • #6 needs no explanation: you need the argument origin = as the error states
  • #3 returns different results because of the time zone (the tz= argument). Therefore, it shows 8 hours before.

Now, the problem is #4 and #5 (which are the same as I stated before):

as.POSIXct(dates,"%Y-%m-%d %H:%M:%S",tz="Asia/Shanghai",origin="1970-01-01")
#> [1] NA NA NA NA NA NA NA NA

To understand how this works you need to look at the function as.POSIXct, which, when called with a numeric x (like in this case), calls the method: as.POSIXct.numeric.

as.POSIXct.numeric

#> function (x, tz = "", origin, ...)
#> {
#> if (missing(origin)) {
#> if (!length(x))
#> return(.POSIXct(numeric(), tz))
#> if (!any(is.finite(x)))
#> return(.POSIXct(x, tz))
#> stop("'origin' must be supplied")
#> }
#> .POSIXct(as.POSIXct(origin, tz = "GMT", ...) + x, tz)
#> }
#> <bytecode: 0x55df7f23b390>
#> <environment: namespace:base>

Focus on this line:

#> .POSIXct(as.POSIXct(origin, tz = "GMT", ...) + x, tz)

In particular:

as.POSIXct(origin, tz = "GMT", ...) + x

As you see, the function transforms origin in datetime and then it sums the numeric input you imputed. Every additional argument you provided falls into ....

The function tries to convert 1970-01-01 to datetime using the format you provided: %Y-%m-%d %H:%M:%S.
Since the origin 1970-01-01 has format %Y-%m-%d, the function can't convert the origin from string to POSIX, thus returning NA. (That's where NAs are generated!)

When you convert a numeric to POSIX, the format you add as argument doens't apply to the output (since it will be always a POSIX) nor to the input, rather to the origin. Thus, origin and format need to match.

To solve your problem, you need to use origin with the format %Y-%m-%d %H:%M:%S.
Like this:

as.POSIXct(dates,"%Y-%m-%d %H:%M:%S",tz="Asia/Shanghai",origin="1970-01-01 00:00:00")
#> [1] "2021-07-19 01:38:57 CST" "2021-07-19 01:38:58 CST" "2021-07-19 01:38:59 CST" "2021-07-19 01:39:00 CST"
#> [5] "2021-07-19 01:39:01 CST" "2021-07-19 01:39:02 CST" "2021-07-19 01:39:03 CST" "2021-07-19 01:39:04 CST"

Or you need to use this format: %Y-%m-%d
Like this:

as.POSIXct(dates,"%Y-%m-%d",tz="Asia/Shanghai",origin="1970-01-01")
#> [1] "2021-07-19 01:38:57 CST" "2021-07-19 01:38:58 CST" "2021-07-19 01:38:59 CST" "2021-07-19 01:39:00 CST"
#> [5] "2021-07-19 01:39:01 CST" "2021-07-19 01:39:02 CST" "2021-07-19 01:39:03 CST" "2021-07-19 01:39:04 CST"

The results are then equal to #1 and #2.

Different parsing behaviour for the first day of April in R as.POSIXct and as.POSIXlt, is R april fooling me?

This is almost certainly a daylight savings time issue. Not sure why POSIXct and POSIXlt behave differently though. From your profile, it looks like you're in Mexico.

From here:

most of Mexico, including capital Mexico City, will set the clocks 1 hour forward 3 weeks later, on Sunday, April 1, 2012.

So the problem is that 2:58 AM on 1 April 2012 did not exist in the time zone that is currently active in your locale.

Unless there is something specific having to do with the POSIXct/POSIXlt difference, this should probably be closed as a duplicate of e.g.:

  • What is wrong with this date and time?
  • R POSIXct returns NA with "03/12/2017 02:17:13"
  • PosixCT conversion in R fails
  • Weird as.POSIXct behavior depending on daylight savings time
  • Strange strptime behavior in R
  • as.POSIX error, can not convert a particular date
  • Weird POSIX behaviour for two closely time strings with and without specifying the format

And this r help question

If you want to deal with this e.g. by setting all times to UTC (i.e. ignoring your local time zone settings), I believe there are lots of suggestions on Stack Overflow (now that you know to search for "daylight savings time" it should be easy to find them).

obligatory xkcd

as.POSIX error, can not convert a particular date

@HongOoi is right, "daylight saving shenanigans". Midnight on June 1, 1940, just did not exist in some time zones.

Ultimately, since "1940-06-01" is interpreted by the as.POSIX* functions as "1940-06-01 00:00:00" in the particular time zone, we can say that that time did not exist (according to the tzdata time zone database).

I just tried this with all 562 of the time zones included in OlsonNames(), and it fails for 10 of them:

str(lapply(head(OlsonNames(),3),
function(tz) as.POSIXlt("1940-06-01", tz = tz)))
# List of 3
# $ : POSIXlt[1:1], format: "1940-06-01"
# $ : POSIXlt[1:1], format: "1940-06-01"
# $ : POSIXlt[1:1], format: "1940-06-01"

### this time through, if it works return "", if it errors return the tz instead
tzoops <- lapply(OlsonNames(), function(tz) tryCatch({
ignore <- as.POSIXlt("1940-06-01", tz = tz)
""
}, error = function(e) tz))

### "" indicates no error occurred
head(tzoops, 2)
# [[1]]
# [1] ""
# [[2]]
# [1] ""

### these zones failed for some reason
unlist(Filter(nzchar, tzoops))
# [1] "Asia/Chongqing" "Asia/Chungking" "Asia/Gaza" "Asia/Harbin" "Asia/Hebron"
# [6] "Asia/Jerusalem" "Asia/Shanghai" "Asia/Tel_Aviv" "Israel" "PRC"

I'll pick one randomly: Israel, and do a little research at timeanddate.com for Isreal (see note 1 below), where it says that up through 1939, Israel's time zone has

No changes, UTC +2 hours all of the period

but in 1940, Sat, Jun 1 at 12:00 am marks the daylight-savings change from IST to IDT. (And then in 1941 Israel shifted to UTC+3, not relevant for this dilemma.)

With this, we can determine that 1940-05-31 23:59:59 existed, but one second later it shifted an hour for DST:

as.POSIXct("1940-05-31 23:59:59 PST", tz = "Israel")
# [1] "1940-05-31 23:59:59 IST"
as.POSIXct("1940-05-31 23:59:59 PST", tz = "Israel") + 1 # one second later
# [1] "1940-06-01 01:00:00 IDT"

We can verify similar results for all of the other time zones by parsing one second before midnight and then adding 1 second. (I've added "UTC" at the top to show what I would expect to happen if we were not dealing with a DST issue.)

str(lapply(setNames(nm = c("UTC", unlist(Filter(nzchar, tzoops)))),
function(tz) as.POSIXct("1940-05-31 23:59:59 PST", tz = tz) + c(0, 1)))
# List of 11
# $ UTC : POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 00:00:00"
# $ Asia/Chongqing: POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ Asia/Chungking: POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ Asia/Gaza : POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ Asia/Harbin : POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ Asia/Hebron : POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ Asia/Jerusalem: POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ Asia/Shanghai : POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ Asia/Tel_Aviv : POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ Israel : POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"
# $ PRC : POSIXct[1:2], format: "1940-05-31 23:59:59" "1940-06-01 01:00:00"

So lacking further detailed research on each one of those time zones, my guess is that they have similar changes on that day.


Notes:

  1. https://www.timeanddate.com/time/zone/israel/jerusalem?year=1940

    timeanddate.com table screenshot, just in case

strptime, as.POSIXct and as.Date return unexpected NA

I think it is exactly as you guessed, strptime fails to parse your date-time string because of your locales. Your string contains both abbreviated weekday (%a) and abbreviated month name (%b). These time specifications are described in ?strptime:

Details

%a: Abbreviated weekday name in the current locale on this
platform

%b: Abbreviated month name in the current locale on this platform.

"Note that abbreviated names are platform-specific (although the
standards specify that in the C locale they must be the first three
letters of the capitalized English name:"

"Knowing what the abbreviations are is essential if you wish to use
%a, %b or %h as part of an input format: see the examples for
how to check."

See also

[...] locales to query or set a locale.

The issue of locales is relevant also for as.POSIXct, as.POSIXlt and as.Date.

From ?as.POSIXct:

Details

If format is specified, remember that some of the format
specifications are locale-specific, and you may need to set the
LC_TIME category appropriately via Sys.setlocale. This most often
affects the use of %b, %B (month names) and %p (AM/PM).

From ?as.Date:

Details

Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months.


Thus, if weekdays and month names in the string differ from those in the current locale, strptime, as.POSIXct and as.Date fail to parse the string correctly and NA is returned.

However, you may solve this issue by changing the locales:

# First save your current locale
loc <- Sys.getlocale("LC_TIME")

# Set correct locale for the strings to be parsed
# (in this particular case: English)
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized
Sys.setlocale("LC_TIME", "en_GB.UTF-8")
# or
Sys.setlocale("LC_TIME", "C")

#Then proceed as you intended
x <- "Thu Nov 8 15:41:45 2012"
strptime(x, "%a %b %d %H:%M:%S %Y")
# [1] "2012-11-08 15:41:45"

# Then set back to your old locale
Sys.setlocale("LC_TIME", loc)

With my personal locale I can reproduce your error:

Sys.setlocale("LC_TIME", loc)
# [1] "fr_FR.UTF-8"

strptime(var,"%a %b %d %H:%M:%S %Y")
# [1] NA

lubridate::as_datetime() fails when as.POSIXct() works?

Why the one works and not the other I don't know, but you can help as_datetime() understand the input by suppling a format string, which specifies the format the text string.

lubridate::as_datetime("2020-10-27 20:25", format = "%Y-%m-%d %H:%M")

Check out the documentation for as_datetime() and strptime() on how to write the format-string.

edit: It seems that the format argument defaults to NULL for as_datetime, a similar error is generated by as.Posixct() if format = NULL is supplied.

as.POSIXct("2020-10-27 20:25", format = NULL)


Related Topics



Leave a reply



Submit