Difference Between As.Posixct/As.Posixlt and Strptime for Converting Character Vectors to Posixct/Posixlt

Difference between as.POSIXct/as.POSIXlt and strptime for converting character vectors to POSIXct/POSIXlt

Well, the functions do different things.

First, there are two internal implementations of date/time: POSIXct, which stores seconds since UNIX epoch (+some other data), and POSIXlt, which stores a list of day, month, year, hour, minute, second, etc.

strptime is a function to directly convert character vectors (of a variety of formats) to POSIXlt format.

as.POSIXlt converts a variety of data types to POSIXlt. It tries to be intelligent and do the sensible thing - in the case of character, it acts as a wrapper to strptime.

as.POSIXct converts a variety of data types to POSIXct. It also tries to be intelligent and do the sensible thing - in the case of character, it runs strptime first, then does the conversion from POSIXlt to POSIXct.

It makes sense that strptime is faster, because strptime only handles character input whilst the others try to determine which method to use from input type. It should also be a bit safer in that being handed unexpected data would just give an error, instead of trying to do the intelligent thing that might not be what you want.

differences between subsetting POSIXlt and POSIXct in R

You misunderstand a critical difference between POSIXlt and POSIXct:

  • POSIXlt is a 'list type' with components you can access as you do
  • POSIXct is a 'compact type' that is essentially just a number

You almost always want POSIXct for comparison and effective storage (eg in a data.frame, or to index a zoo or xts object with) and can use POSIXlt to access components. Be warned, though, that the components follow C library standards so e.g. the current years is 115 (as you always need to add 1900), weekdays start at zero etc pp.

Doing str() or unclass on these is illuminating. For historical reasons, strptime() returns a POSIXlt. I wish it would return a POSIXct.

Why do some dates become NA when converted from character to POSIXlt?

I updated my code to specify the GMT timezone as the data is collected in GMT without a change to or from daylight savings time.

dateValue <- strptime(dateString, format='%m/%d/%y %I:%M:%S %p', tz="GMT")

This ensures properly formatted date time values are not evaluated to TRUE with is.na()

Converting datetime from character to POSIXct object

For your real data issue replace the %m% with %m:

## Reading in the file:
fpath <- "c:/r/data/real_data.txt"
x <- read.csv(fpath, skip = 1, header = FALSE, sep = "", stringsAsFactors = FALSE)
names(x) <- c("date","time","bscat","scat_coef","pressure_mbar","temp_K","CH1","CH2") ## This is data from a Radiance Research Integrating Nephelometer Model M903 for anyone who is interested!

## issue was the %m% - fixed
x$datetime1 <- as.POSIXct(paste(x$date, x$time), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")

## Here too - fixed
x$datetime2 <- strptime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
head(x)

Determine and set timezone in POSIXct, POSIXlt, strptime, etc. in R

If you do not use a timezone specifically, POSIXct and POSIXlt will reference to your local timezone. However, this is not entirely reliable. POSIXlt will not display the timezone in the output string.

Note, the tzone argument is not set.

t.ct <- as.POSIXct("2009-01-05 14:19 +1200", format="%Y-%m-%d %H:%M %z")
t.lt <- as.POSIXlt("2009-01-05 14:19 +1200", format="%Y-%m-%d %H:%M %z")
t.ct
t.lt
attr(t.ct,"tzone") #""
attr(t.lt,"tzone") #NULL

If you do want to avoid ambiguous behaviour, you have to specifiy a time zone. The output string will still be different (by default POSIXlt shows no timezone), but the attribute is the same

t.ct <- as.POSIXct("2009-01-05 14:19 +1200", format="%Y-%m-%d %H:%M %z", tz="Europe/Helsinki")
t.lt <- as.POSIXlt("2009-01-05 14:19 +1200", format="%Y-%m-%d %H:%M %z", tz="Europe/Helsinki")
t.ct
t.lt
attr(t.ct,"tzone") #Europe/Helsinki
attr(t.lt,"tzone") #Europe/Helsinki

Now, if you want to change time zones after the original assignment:

attr(t.ct, "tzone") <- "UTC" #this will SHIFT the time zone to UTC
attr(t.lt, "tzone") <- "UTC" #this will REPLACE the time zone to UTC
t.ct
t.lt

As for your problem with strftime and %z, this does not give you the time zone attribute. The difference in your case, probably comes from a combination of string formatting, object conversions and time zone formating, IMO. But maybe somebody more knowledgable, can clarify this.

Converting dates with R using as.POSIXct

Two mistakes:

  • you used %H where you want %I for the dreaded 12-hour format
  • you omitted %p to catch the "pm" marker

With that corrected:

R> date_string <- "03/11/2017, 3:14:32 pm"
R> as.POSIXct(date_string, format = "%m/%d/%Y, %I:%M:%S %p",tz="PST8PDT")
[1] "2017-03-11 15:14:32 PST"
R>

NA for 1 particular date when converting dates from character format to POSIXct with as.POSIXct

We can specify the %T for time. In the format, there are minutes, seconds and millseconds. So, the %H is only matching the hour part

as.POSIXct("2017-03-26 02:00:00.000",format="%Y-%m-%d %T")
[1] "2017-03-26 02:00:00 EDT"

Or to take care of the milliseconds as well

as.POSIXct("2017-03-26 02:00:00.000",format="%Y-%m-%d %H:%M:%OS")
#[1] "2017-03-26 02:00:00 EDT"

Or using lubridate

library(lubridate)
ymd_hms("2017-03-26 02:00:00.000")


Related Topics



Leave a reply



Submit