R: Strptime() and Is.Na () Unexpected Results

R: strptime() and is.na () unexpected results

The problem is likely that all the times that return NA do not exist in whatever timezone you're using, due to daylight saving time.

Check with the data source to determine the timezone the data were recorded in, then set the tz argument to that value in your call to strptime.

strptime, as.POSIXct and as.Date return unexpected NA

I think it is exactly as you guessed, strptime fails to parse your date-time string because of your locales. Your string contains both abbreviated weekday (%a) and abbreviated month name (%b). These time specifications are described in ?strptime:

Details

%a: Abbreviated weekday name in the current locale on this
platform

%b: Abbreviated month name in the current locale on this platform.

"Note that abbreviated names are platform-specific (although the
standards specify that in the C locale they must be the first three
letters of the capitalized English name:"

"Knowing what the abbreviations are is essential if you wish to use
%a, %b or %h as part of an input format: see the examples for
how to check."

See also

[...] locales to query or set a locale.

The issue of locales is relevant also for as.POSIXct, as.POSIXlt and as.Date.

From ?as.POSIXct:

Details

If format is specified, remember that some of the format
specifications are locale-specific, and you may need to set the
LC_TIME category appropriately via Sys.setlocale. This most often
affects the use of %b, %B (month names) and %p (AM/PM).

From ?as.Date:

Details

Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months.


Thus, if weekdays and month names in the string differ from those in the current locale, strptime, as.POSIXct and as.Date fail to parse the string correctly and NA is returned.

However, you may solve this issue by changing the locales:

# First save your current locale
loc <- Sys.getlocale("LC_TIME")

# Set correct locale for the strings to be parsed
# (in this particular case: English)
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized
Sys.setlocale("LC_TIME", "en_GB.UTF-8")
# or
Sys.setlocale("LC_TIME", "C")

#Then proceed as you intended
x <- "Thu Nov 8 15:41:45 2012"
strptime(x, "%a %b %d %H:%M:%S %Y")
# [1] "2012-11-08 15:41:45"

# Then set back to your old locale
Sys.setlocale("LC_TIME", loc)

With my personal locale I can reproduce your error:

Sys.setlocale("LC_TIME", loc)
# [1] "fr_FR.UTF-8"

strptime(var,"%a %b %d %H:%M:%S %Y")
# [1] NA

NA while using strptime function

I find another thread which gives me a solution. Because my system language is not English, it cannot recognize "October". After I change it to "10", the code runs perfectly. Thanks for all the help.

Using Filter(Negate(is.na), x) on a list results in unexpected behaviour

is.na returns a vector for each element of the list; you want anyNA (or perhaps exactlyNA as defined below):

l1 <- list("a", NA, 1:3, NA)
l2 <- list("a", NULL, 1:3, NULL)
Filter(Negate(is.na), l1)
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] 1 2 3
#>
#> [[3]]
#> [1] NA
#>
#> [[4]]
#> NULL
Filter(Negate(anyNA), l1)
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] 1 2 3
exactlyNA <- function(x) identical(x, NA)
Filter(Negate(exactlyNA), l1)
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] 1 2 3

Created on 2018-11-13 by the reprex package (v0.2.1)

Your first example effectively tries to select the 1st, 3rd, 4th, and 5th elements of your list. Nothing to do with NA.

R strptime issue when using %b to format a date

The simplest solution will be this:

as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y")
#> [1] "2018-12-01"

format(x = as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y"),
format = "%b-%y")
#> [1] "Dec-18"

Created on 2019-05-15 by the reprex package (v0.2.1)

R doesn't recognise Dec-18 as date. Add a 01- so that it can detect it as date, and then display as you prefer.

as.Date produces unexpected result in a sequence of week-based dates

Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:

# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))

The result

#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"

is of class Date.

Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.

So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.

Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.

About the ISOweek package

Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).

Europe/Moscow time zone issue with strptime

In this case, since you're not interested in the time but only in the date you can use as.Date:

> as.Date(strange_days,"%m/%d/%Y")
[1] "1984-02-01" "1984-03-01" "1984-04-01" "1984-05-01" "1984-06-01"

The error you're confronted to is (as you already noticed) most likely due to Daylight Saving Time: it so happens that DST in Russia in 1984 started specifically on the first of April (source).

That being said, on a Mac OSX 10.7.5 running with R 2.14.2 (yes a little outdated) this error is not reproducible:

> strange_days <- c("2/1/1984", "3/1/1984", "4/1/1984", "5/1/1984", "6/1/1984") 
> Sys.setenv(TZ='Europe/Moscow')
> d <- strptime(strange_days, '%m/%d/%Y')
> d
[1] "1984-02-01 MSK" "1984-03-01 MSK" "1984-04-01 MSD" "1984-05-01 MSD" "1984-06-01 MSD"
> as.numeric(d)
[1] 444430800 446936400 449611200 452203200 454881600

This suggests that one of the changes made to strptime between R version 2.14.2 and 3.1.0 modified this behaviour. I'm currently looking for it in the Changelogs but I have no definite evidences yet. Another possibility would be that it is platform-specific.

Additionally here is an excerpt from ?strptime:

Remember that in most timezones some times do not occur and some occur
twice because of transitions to/from summer time. strptime does not
validate such times (it does not assume a specific timezone), but
conversion by as.POSIXct) will do so. Conversion by strftime and
formatting/printing uses OS facilities and may (and does on Windows)
return nonsensical results for non-existent times at DST transitions.

POSIXct date conversion error

Try using a time zone that does not use daylight savings time:

as.POSIXct(t, format = "%m/%d/%Y  %H:%M", tz = "GMT")
## [1] "2007-03-11 01:30:00 GMT" "2007-03-11 02:00:00 GMT" "2007-04-11 02:00:00 GMT"


Related Topics



Leave a reply



Submit