strptime, as.POSIXct and as.Date return unexpected NA
I think it is exactly as you guessed, strptime
fails to parse your date-time string because of your locales. Your string contains both abbreviated weekday (%a
) and abbreviated month name (%b
). These time specifications are described in ?strptime
:
Details
%a
: Abbreviated weekday name in the current locale on this
platform
%b
: Abbreviated month name in the current locale on this platform."Note that abbreviated names are platform-specific (although the
standards specify that in theC
locale they must be the first three
letters of the capitalized English name:""Knowing what the abbreviations are is essential if you wish to use
%a
,%b
or%h
as part of an input format: see the examples for
how to check."See also
[...]
locales
to query or set a locale.
The issue of locales
is relevant also for as.POSIXct
, as.POSIXlt
and as.Date
.
From ?as.POSIXct
:
Details
If
format
is specified, remember that some of the format
specifications are locale-specific, and you may need to set the
LC_TIME
category appropriately viaSys.setlocale
. This most often
affects the use of%b
,%B
(month names) and%p
(AM/PM).
From ?as.Date
:
Details
Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months.
Thus, if weekdays and month names in the string differ from those in the current locale, strptime
, as.POSIXct
and as.Date
fail to parse the string correctly and NA
is returned.
However, you may solve this issue by changing the locales
:
# First save your current locale
loc <- Sys.getlocale("LC_TIME")
# Set correct locale for the strings to be parsed
# (in this particular case: English)
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized
Sys.setlocale("LC_TIME", "en_GB.UTF-8")
# or
Sys.setlocale("LC_TIME", "C")
#Then proceed as you intended
x <- "Thu Nov 8 15:41:45 2012"
strptime(x, "%a %b %d %H:%M:%S %Y")
# [1] "2012-11-08 15:41:45"
# Then set back to your old locale
Sys.setlocale("LC_TIME", loc)
With my personal locale I can reproduce your error:
Sys.setlocale("LC_TIME", loc)
# [1] "fr_FR.UTF-8"
strptime(var,"%a %b %d %H:%M:%S %Y")
# [1] NA
R: strptime() and is.na () unexpected results
The problem is likely that all the times that return NA
do not exist in whatever timezone you're using, due to daylight saving time.
Check with the data source to determine the timezone the data were recorded in, then set the tz
argument to that value in your call to strptime
.
Converting date in R with as.POSIXct() returns NA
This is a locale problem, months in your language have other names than in English (for %B
in date format), that's why it fails. It simply cannot recognize "July" in apollo
string as a month, because it searches for month names in your language.
Try to set English locale for dates and times by running:
Sys.setlocale(category = "LC_TIME", locale = "English")
or set English locale for all categories (monetary, numeric etc.):
Sys.setlocale(category = "LC_ALL", locale = "English")
For details, see Sys.setlocale()
).
See this example (my default locale is Czech, so your code returns NA
in my case as well):
apollo <- "July 20, 1969, 20:17:39"
apollo.fmt <- "%B %d, %Y, %H:%M:%S"
xct <- as.POSIXct(apollo, format = apollo.fmt, tz = "UTC")
xct
#> [1] NA
Sys.setlocale(category = "LC_TIME", locale = "English")
#> [1] "English_United States.1252"
apollo <- "July 20, 1969, 20:17:39"
apollo.fmt <- "%B %d, %Y, %H:%M:%S"
xct <- as.POSIXct(apollo, format = apollo.fmt, tz = "UTC")
xct
#> [1] "1969-07-20 20:17:39 UTC"
Created on 2020-07-18 by the reprex package (v0.3.0)
Why does as.Date return NA in one case, and doesn't return in another?
The parsing of date strings depends on the machine's language settings. If you want to work with english date strings, set the locale to (british or american) English:
> Sys.setlocale("LC_ALL", 'en_GB.UTF-8')
[1] "LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=es_ES.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=es_ES.UTF-8;LC_IDENTIFICATION=C"
> as.Date('Dec 15, 2000', format = '%b %d, %Y')
[1] "2000-12-15"
Edit
To be more specific, the environment variable LC_TIME
is the one that determines the parsing behaviour of date strings:
Sys.setlocale("LC_TIME", 'en_GB.UTF-8')
R strptime issue when using %b to format a date
The simplest solution will be this:
as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y")
#> [1] "2018-12-01"
format(x = as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y"),
format = "%b-%y")
#> [1] "Dec-18"
Created on 2019-05-15 by the reprex package (v0.2.1)
R
doesn't recognise Dec-18 as date. Add a 01-
so that it can detect it as date, and then display as you prefer.
How to solve as.POSIXct return in NA?
The only thing I can think is going wrong has to do with your system's locale. The example below shows what happens with my locale.
The posted data string.
x <- "05:39:18 23-Oct-2016"
The error reproduced.
as.POSIXct(x, format = "%H:%M:%S %d-%b-%Y", tz = "GMT")
#[1] NA
This solution is locale independent.
old_loc <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "en_US.UTF-8")
as.POSIXct(x, format = "%H:%M:%S %d-%b-%Y", tz = "GMT")
#[1] "2016-10-23 05:39:18 GMT"
And back to the original
Sys.setlocale("LC_TIME", old_loc)
What is wrong is the month abbreviation, in my country the correct one would be "out"
("outubro"
). So the following works at the first try, without fiddling with locale settings.
y <- "05:39:18 23-out-2016"
as.POSIXct(y, format = "%H:%M:%S %d-%b-%Y", tz = "GMT")
#[1] "2016-10-23 05:39:18 GMT"
Related Topics
Select the Row With the Maximum Value in Each Group
Count Number of Rows Within Each Group
Quickly Reading Very Large Tables as Dataframes
How to Find the Statistical Mode
Use Dynamic Name For New Column/Variable in 'Dplyr'
Split Delimited Strings in a Column and Insert as New Rows
Find Complement of a Data Frame (Anti - Join)
Reshape Three Column Data Frame to Matrix ("Long" to "Wide" Format)
How to Implement Coalesce Efficiently in R
Combine Two Data Frames by Rows (Rbind) When They Have Different Sets of Columns
Why Is '[' Better Than 'Subset'
Aggregating by Unique Identifier and Concatenating Related Values into a String
Subset Rows Corresponding to Max Value by Group Using Data.Table
What Are the Differences Between "=" and "≪-" Assignment Operators