R: strptime() and is.na () unexpected results
The problem is likely that all the times that return NA
do not exist in whatever timezone you're using, due to daylight saving time.
Check with the data source to determine the timezone the data were recorded in, then set the tz
argument to that value in your call to strptime
.
strptime, as.POSIXct and as.Date return unexpected NA
I think it is exactly as you guessed, strptime
fails to parse your date-time string because of your locales. Your string contains both abbreviated weekday (%a
) and abbreviated month name (%b
). These time specifications are described in ?strptime
:
Details
%a
: Abbreviated weekday name in the current locale on this
platform
%b
: Abbreviated month name in the current locale on this platform."Note that abbreviated names are platform-specific (although the
standards specify that in theC
locale they must be the first three
letters of the capitalized English name:""Knowing what the abbreviations are is essential if you wish to use
%a
,%b
or%h
as part of an input format: see the examples for
how to check."See also
[...]
locales
to query or set a locale.
The issue of locales
is relevant also for as.POSIXct
, as.POSIXlt
and as.Date
.
From ?as.POSIXct
:
Details
If
format
is specified, remember that some of the format
specifications are locale-specific, and you may need to set the
LC_TIME
category appropriately viaSys.setlocale
. This most often
affects the use of%b
,%B
(month names) and%p
(AM/PM).
From ?as.Date
:
Details
Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months.
Thus, if weekdays and month names in the string differ from those in the current locale, strptime
, as.POSIXct
and as.Date
fail to parse the string correctly and NA
is returned.
However, you may solve this issue by changing the locales
:
# First save your current locale
loc <- Sys.getlocale("LC_TIME")
# Set correct locale for the strings to be parsed
# (in this particular case: English)
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized
Sys.setlocale("LC_TIME", "en_GB.UTF-8")
# or
Sys.setlocale("LC_TIME", "C")
#Then proceed as you intended
x <- "Thu Nov 8 15:41:45 2012"
strptime(x, "%a %b %d %H:%M:%S %Y")
# [1] "2012-11-08 15:41:45"
# Then set back to your old locale
Sys.setlocale("LC_TIME", loc)
With my personal locale I can reproduce your error:
Sys.setlocale("LC_TIME", loc)
# [1] "fr_FR.UTF-8"
strptime(var,"%a %b %d %H:%M:%S %Y")
# [1] NA
NA while using strptime function
I find another thread which gives me a solution. Because my system language is not English, it cannot recognize "October". After I change it to "10", the code runs perfectly. Thanks for all the help.
Using Filter(Negate(is.na), x) on a list results in unexpected behaviour
is.na
returns a vector for each element of the list; you want anyNA
(or perhaps exactlyNA
as defined below):
l1 <- list("a", NA, 1:3, NA)
l2 <- list("a", NULL, 1:3, NULL)
Filter(Negate(is.na), l1)
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] 1 2 3
#>
#> [[3]]
#> [1] NA
#>
#> [[4]]
#> NULL
Filter(Negate(anyNA), l1)
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] 1 2 3
exactlyNA <- function(x) identical(x, NA)
Filter(Negate(exactlyNA), l1)
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] 1 2 3
Created on 2018-11-13 by the reprex package (v0.2.1)
Your first example effectively tries to select the 1st, 3rd, 4th, and 5th elements of your list. Nothing to do with NA
.
R strptime issue when using %b to format a date
The simplest solution will be this:
as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y")
#> [1] "2018-12-01"
format(x = as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y"),
format = "%b-%y")
#> [1] "Dec-18"
Created on 2019-05-15 by the reprex package (v0.2.1)
R
doesn't recognise Dec-18 as date. Add a 01-
so that it can detect it as date, and then display as you prefer.
as.Date produces unexpected result in a sequence of week-based dates
Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek
package:
# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))
The result
#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"
is of class Date
.
Note that the ISO week-based date format is yyyy-Www-d
with a capital W
preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd
.
So, in order to convert the date strings provided by the OP using ISOweek2date()
it is necessary to insert a W
after the first hyphen which is accomplished by replacing the first -
by -W
in each string.
Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7
is converted to 2017-01-01
.
About the ISOweek
package
Back in 2011, the %G
, %g
, %u
, and %V
format specifications weren't available to strptime()
in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek
package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime
for details).
Europe/Moscow time zone issue with strptime
In this case, since you're not interested in the time but only in the date you can use as.Date
:
> as.Date(strange_days,"%m/%d/%Y")
[1] "1984-02-01" "1984-03-01" "1984-04-01" "1984-05-01" "1984-06-01"
The error you're confronted to is (as you already noticed) most likely due to Daylight Saving Time: it so happens that DST in Russia in 1984 started specifically on the first of April (source).
That being said, on a Mac OSX 10.7.5 running with R 2.14.2 (yes a little outdated) this error is not reproducible:
> strange_days <- c("2/1/1984", "3/1/1984", "4/1/1984", "5/1/1984", "6/1/1984")
> Sys.setenv(TZ='Europe/Moscow')
> d <- strptime(strange_days, '%m/%d/%Y')
> d
[1] "1984-02-01 MSK" "1984-03-01 MSK" "1984-04-01 MSD" "1984-05-01 MSD" "1984-06-01 MSD"
> as.numeric(d)
[1] 444430800 446936400 449611200 452203200 454881600
This suggests that one of the changes made to strptime
between R version 2.14.2 and 3.1.0 modified this behaviour. I'm currently looking for it in the Changelogs but I have no definite evidences yet. Another possibility would be that it is platform-specific.
Additionally here is an excerpt from ?strptime
:
Remember that in most timezones some times do not occur and some occur
twice because of transitions to/from summer time. strptime does not
validate such times (it does not assume a specific timezone), but
conversion by as.POSIXct) will do so. Conversion by strftime and
formatting/printing uses OS facilities and may (and does on Windows)
return nonsensical results for non-existent times at DST transitions.
POSIXct date conversion error
Try using a time zone that does not use daylight savings time:
as.POSIXct(t, format = "%m/%d/%Y %H:%M", tz = "GMT")
## [1] "2007-03-11 01:30:00 GMT" "2007-03-11 02:00:00 GMT" "2007-04-11 02:00:00 GMT"
Related Topics
Insert Function Variable into Graph Title
Renaming and Hiding an Exported Rcpp Function in an R Package
Additional Metrics in Caret - Ppv, Sensitivity, Specificity
Add Missing Rows to a Data Table
Extracting Data from Text Files
Robust and Clustered Standard Error in R for Probit and Logit Regression
Scoping of Variables in Aes(...) Inside a Function in Ggplot
Joining Factor Levels of Two Columns
Dividing Each Cell in a Data Set by the Column Sum in R
Contrasts Can Be Applied Only to Factor
Loess Fit and Resulting Equation
Breaks for Scale_X_Date in Ggplot2 and R
How to Ignore Na in Ifelse Statement
How to Define Fill Colours in Ggplot Histogram
How to Get the Min/Max Possible Numeric