Strptime, As.Posixct and As.Date Return Unexpected Na

strptime, as.POSIXct and as.Date return unexpected NA

I think it is exactly as you guessed, strptime fails to parse your date-time string because of your locales. Your string contains both abbreviated weekday (%a) and abbreviated month name (%b). These time specifications are described in ?strptime:

Details

%a: Abbreviated weekday name in the current locale on this
platform

%b: Abbreviated month name in the current locale on this platform.

"Note that abbreviated names are platform-specific (although the
standards specify that in the C locale they must be the first three
letters of the capitalized English name:"

"Knowing what the abbreviations are is essential if you wish to use
%a, %b or %h as part of an input format: see the examples for
how to check."

See also

[...] locales to query or set a locale.

The issue of locales is relevant also for as.POSIXct, as.POSIXlt and as.Date.

From ?as.POSIXct:

Details

If format is specified, remember that some of the format
specifications are locale-specific, and you may need to set the
LC_TIME category appropriately via Sys.setlocale. This most often
affects the use of %b, %B (month names) and %p (AM/PM).

From ?as.Date:

Details

Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months.


Thus, if weekdays and month names in the string differ from those in the current locale, strptime, as.POSIXct and as.Date fail to parse the string correctly and NA is returned.

However, you may solve this issue by changing the locales:

# First save your current locale
loc <- Sys.getlocale("LC_TIME")

# Set correct locale for the strings to be parsed
# (in this particular case: English)
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized
Sys.setlocale("LC_TIME", "en_GB.UTF-8")
# or
Sys.setlocale("LC_TIME", "C")

#Then proceed as you intended
x <- "Thu Nov 8 15:41:45 2012"
strptime(x, "%a %b %d %H:%M:%S %Y")
# [1] "2012-11-08 15:41:45"

# Then set back to your old locale
Sys.setlocale("LC_TIME", loc)

With my personal locale I can reproduce your error:

Sys.setlocale("LC_TIME", loc)
# [1] "fr_FR.UTF-8"

strptime(var,"%a %b %d %H:%M:%S %Y")
# [1] NA

R: strptime() and is.na () unexpected results

The problem is likely that all the times that return NA do not exist in whatever timezone you're using, due to daylight saving time.

Check with the data source to determine the timezone the data were recorded in, then set the tz argument to that value in your call to strptime.

Converting date in R with as.POSIXct() returns NA

This is a locale problem, months in your language have other names than in English (for %B in date format), that's why it fails. It simply cannot recognize "July" in apollo string as a month, because it searches for month names in your language.

Try to set English locale for dates and times by running:

Sys.setlocale(category = "LC_TIME", locale = "English")

or set English locale for all categories (monetary, numeric etc.):

Sys.setlocale(category = "LC_ALL", locale = "English")

For details, see Sys.setlocale()).

See this example (my default locale is Czech, so your code returns NA in my case as well):

apollo <- "July 20, 1969, 20:17:39"
apollo.fmt <- "%B %d, %Y, %H:%M:%S"
xct <- as.POSIXct(apollo, format = apollo.fmt, tz = "UTC")
xct
#> [1] NA

Sys.setlocale(category = "LC_TIME", locale = "English")
#> [1] "English_United States.1252"

apollo <- "July 20, 1969, 20:17:39"
apollo.fmt <- "%B %d, %Y, %H:%M:%S"
xct <- as.POSIXct(apollo, format = apollo.fmt, tz = "UTC")
xct
#> [1] "1969-07-20 20:17:39 UTC"

Created on 2020-07-18 by the reprex package (v0.3.0)

Why does as.Date return NA in one case, and doesn't return in another?

The parsing of date strings depends on the machine's language settings. If you want to work with english date strings, set the locale to (british or american) English:

> Sys.setlocale("LC_ALL", 'en_GB.UTF-8')
[1] "LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=es_ES.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=es_ES.UTF-8;LC_IDENTIFICATION=C"
> as.Date('Dec 15, 2000', format = '%b %d, %Y')
[1] "2000-12-15"

Edit

To be more specific, the environment variable LC_TIME is the one that determines the parsing behaviour of date strings:

Sys.setlocale("LC_TIME", 'en_GB.UTF-8')

R strptime issue when using %b to format a date

The simplest solution will be this:

as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y")
#> [1] "2018-12-01"

format(x = as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y"),
format = "%b-%y")
#> [1] "Dec-18"

Created on 2019-05-15 by the reprex package (v0.2.1)

R doesn't recognise Dec-18 as date. Add a 01- so that it can detect it as date, and then display as you prefer.

How to solve as.POSIXct return in NA?

The only thing I can think is going wrong has to do with your system's locale. The example below shows what happens with my locale.

The posted data string.

x <- "05:39:18 23-Oct-2016"

The error reproduced.

as.POSIXct(x, format = "%H:%M:%S %d-%b-%Y", tz = "GMT")
#[1] NA

This solution is locale independent.

old_loc <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "en_US.UTF-8")

as.POSIXct(x, format = "%H:%M:%S %d-%b-%Y", tz = "GMT")
#[1] "2016-10-23 05:39:18 GMT"

And back to the original

Sys.setlocale("LC_TIME", old_loc)

What is wrong is the month abbreviation, in my country the correct one would be "out" ("outubro"). So the following works at the first try, without fiddling with locale settings.

y <- "05:39:18 23-out-2016"
as.POSIXct(y, format = "%H:%M:%S %d-%b-%Y", tz = "GMT")
#[1] "2016-10-23 05:39:18 GMT"


Related Topics



Leave a reply



Submit