Dealing with Timestamps in R

Dealing with timestamps in R

You want the (standard) POSIXt type from base R that can be had in 'compact form' as a POSIXct (which is essentially a double representing fractional seconds since the epoch) or as long form in POSIXlt (which contains sub-elements). The cool thing is that arithmetic etc are defined on this -- see help(DateTimeClasses)

Quick example:

R> now <- Sys.time()
R> now
[1] "2009-12-25 18:39:11 CST"
R> as.numeric(now)
[1] 1.262e+09
R> now + 10 # adds 10 seconds
[1] "2009-12-25 18:39:21 CST"
R> as.POSIXlt(now)
[1] "2009-12-25 18:39:11 CST"
R> str(as.POSIXlt(now))
POSIXlt[1:9], format: "2009-12-25 18:39:11"
R> unclass(as.POSIXlt(now))
$sec
[1] 11.79

$min
[1] 39

$hour
[1] 18

$mday
[1] 25

$mon
[1] 11

$year
[1] 109

$wday
[1] 5

$yday
[1] 358

$isdst
[1] 0

attr(,"tzone")
[1] "America/Chicago" "CST" "CDT"
R>

As for reading them in, see help(strptime)

As for difference, easy too:

R> Jan1 <- strptime("2009-01-01 00:00:00", "%Y-%m-%d %H:%M:%S")
R> difftime(now, Jan1, unit="week")
Time difference of 51.25 weeks
R>

Lastly, the zoo package is an extremely versatile and well-documented container for matrix with associated date/time indices.

How to change a timestamp from character to datetime in R and add missing timestamps

What you tried is correct as long as you assign the result back to the column of your data frame. This is what you should do:

> data$TIMESTAMP <- as.POSIXct(data$TIMESTAMP, format="%Y-%m-%d %H:%M:%S")

After that, the TIMESTAMP column will have the desired class:

> class(data$TIMESTAMP)
[1] "POSIXct" "POSIXt"

For completing your data frame with missing lines, you can first build a new data.frame with all the expected times and then merge it to your initial data. Bellow I'm using min and max to find the range of date-time, then I'm using seq.POSIXt by minute to generate the full set of date-time. The merge will then use the already existing price values from your initial data frame:

> data_full <- data.frame(TIMESTAMP = seq.POSIXt(from=min(data$TIMESTAMP), to=max(data$TIMESTAMP), by='min'))
> data_complete <- merge(data_full, data, all.x = T)

Dealing with twitter timestamps in R

The format above is in epochs. Assuming this is in milliseconds since the epoch (you would have to double-check with the Twitter api), you can convert from epoch to UTC time using anytime function from the anytime package as shown below, which returns "2015-01-01 14:08:15 UTC."

 anytime(1420121295000*0.001) #times 0.001 to convert to seconds
format(anytime(1420121295000*0.001), tz = "America/New_York", usetz=TRUE) #converting from UTC to EST timezone.

How to convert specific time format to timestamp in R?

First of all, we need to substitute the colon separating the milliseconds from the seconds to a dot, otherwise the final step won't work (thanks to Dirk Eddelbuettel for this one). Since in the end R will use the separators it wants, to be quicker, I'll just go ahead and substitute all the colons for dots:

x <- "27.05.2009 14:03:25:777"  # this is a simplified version of your data
y <- gsub(":", ".", x) # this is your vector with the aforementioned substitution

By the way, this is how your vector should look after gsub:

> y
[1] "27.05.2009 14.03.25.777"

Now, in order to have it show the milliseconds, you first need to adjust an R option and then use a function called strptime, which will convert your date vector to POSIXlt (an R-friendly) format. Just do the following:

> options(digits.secs = 3)           # this tells R you want it to consider 3 digits for seconds.
> strptime(y, "%d.%m.%Y %H:%M:%OS") # this finally formats your vector
[1] "2009-05-27 14:03:25.777"

I've learned this nice trick here. This other answer also says you can skip the options setting and use, for example, strptime(y, "%d.%m.%Y %H:%M:%OS3"), but it doesn't work for me. Henrik noted that the function's help page, ?strptime states that the %OS3 bit is OS-dependent. I'm using an updated Ubuntu 13.04 and using %OS3 yields NA.

When using strptime (or other POSIX-related functions such as as.Date), keep in mind some of the most common conversions used (edited for brevity, as suggested by DWin. Complete list at strptime):

  • %a Abbreviated weekday name in the current locale.
  • %A Full weekday name in the current locale.
  • %b Abbreviated month name in the current locale.
  • %B Full month name in the current locale.
  • %d Day of the month as decimal number (01–31).
  • %H Hours as decimal number (00–23). Times such as 24:00:00 are accepted for input.
  • %I Hours as decimal number (01–12).
  • %j Day of year as decimal number (001–366).
  • %m Month as decimal number (01–12).
  • %M Minute as decimal number (00–59).
  • %p AM/PM indicator in the locale. Used in conjunction with %I and not with %H.
  • `%S Second as decimal number (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
  • %U Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
  • %w Weekday as decimal number (0–6, Sunday is 0).
  • %W Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
  • %y Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19
  • %Y Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC)

Finding the mean of timestamp data in r for time series data

Since you want to do the aggregation per second, then the only thing you have to do is to convert to proper datetime and use it as your group variable, i.e.

df$grp <- as.POSIXct(paste(as.character(df$ID), as.character(df$Time_Stamp)), format = "%d/%m/%Y %H:%M:%OS")

aggregate(list(mean1 = df$A, mean2 = df$B, mean3 = df$C), list(df$grp), mean)

# Group.1 mean1 mean2 mean3
#1 2018-02-02 07:45:00 122.3333 455.3333 411
#2 2018-02-02 07:45:01 112.0000 2323.0000 2323

Extract year from date

if all your dates are the same width, you can put the dates in a vector and use substring

Date
a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
substring(a,7,10) #This takes string and only keeps the characters beginning in position 7 to position 10

output

[1] "2009" "2010" "2011"

How to convert this timestamp format to standard timestamp format?

You can use strptime function in this way:

my_time = strptime("1/28/15 16:34", "%m/%d/%y %H:%M")

Note in particular the %m and the %y to say, respectively, that months will be written with 1 character from Jan to Sept and year will be written with 2 character.

For example, if you need to convert "01/28/2015" you need %M and %Y:

my_time = strptime('01/28/2015 16:34', '%M/%d/%Y %H:%M')

To extract the day of week and the hour:

library(lubridate)
week_day = wday(my_time) # or wday(my_time, label=T) if you want the weekday label (Wed in this case)
day_hour = hour(my_time)


Related Topics



Leave a reply



Submit