Dealing with timestamps in R
You want the (standard) POSIXt
type from base R that can be had in 'compact form' as a POSIXct
(which is essentially a double representing fractional seconds since the epoch) or as long form in POSIXlt
(which contains sub-elements). The cool thing is that arithmetic etc are defined on this -- see help(DateTimeClasses)
Quick example:
R> now <- Sys.time()
R> now
[1] "2009-12-25 18:39:11 CST"
R> as.numeric(now)
[1] 1.262e+09
R> now + 10 # adds 10 seconds
[1] "2009-12-25 18:39:21 CST"
R> as.POSIXlt(now)
[1] "2009-12-25 18:39:11 CST"
R> str(as.POSIXlt(now))
POSIXlt[1:9], format: "2009-12-25 18:39:11"
R> unclass(as.POSIXlt(now))
$sec
[1] 11.79
$min
[1] 39
$hour
[1] 18
$mday
[1] 25
$mon
[1] 11
$year
[1] 109
$wday
[1] 5
$yday
[1] 358
$isdst
[1] 0
attr(,"tzone")
[1] "America/Chicago" "CST" "CDT"
R>
As for reading them in, see help(strptime)
As for difference, easy too:
R> Jan1 <- strptime("2009-01-01 00:00:00", "%Y-%m-%d %H:%M:%S")
R> difftime(now, Jan1, unit="week")
Time difference of 51.25 weeks
R>
Lastly, the zoo package is an extremely versatile and well-documented container for matrix with associated date/time indices.
How to change a timestamp from character to datetime in R and add missing timestamps
What you tried is correct as long as you assign the result back to the column of your data frame. This is what you should do:
> data$TIMESTAMP <- as.POSIXct(data$TIMESTAMP, format="%Y-%m-%d %H:%M:%S")
After that, the TIMESTAMP
column will have the desired class:
> class(data$TIMESTAMP)
[1] "POSIXct" "POSIXt"
For completing your data frame with missing lines, you can first build a new data.frame
with all the expected times and then merge it to your initial data. Bellow I'm using min
and max
to find the range of date-time, then I'm using seq.POSIXt
by minute to generate the full set of date-time. The merge will then use the already existing price values from your initial data frame:
> data_full <- data.frame(TIMESTAMP = seq.POSIXt(from=min(data$TIMESTAMP), to=max(data$TIMESTAMP), by='min'))
> data_complete <- merge(data_full, data, all.x = T)
Dealing with twitter timestamps in R
The format above is in epochs. Assuming this is in milliseconds since the epoch (you would have to double-check with the Twitter api), you can convert from epoch to UTC time using anytime function from the anytime package as shown below, which returns "2015-01-01 14:08:15 UTC."
anytime(1420121295000*0.001) #times 0.001 to convert to seconds
format(anytime(1420121295000*0.001), tz = "America/New_York", usetz=TRUE) #converting from UTC to EST timezone.
How to convert specific time format to timestamp in R?
First of all, we need to substitute the colon separating the milliseconds from the seconds to a dot, otherwise the final step won't work (thanks to Dirk Eddelbuettel for this one). Since in the end R will use the separators it wants, to be quicker, I'll just go ahead and substitute all the colons for dots:
x <- "27.05.2009 14:03:25:777" # this is a simplified version of your data
y <- gsub(":", ".", x) # this is your vector with the aforementioned substitution
By the way, this is how your vector should look after gsub
:
> y
[1] "27.05.2009 14.03.25.777"
Now, in order to have it show the milliseconds, you first need to adjust an R option and then use a function called strptime
, which will convert your date vector to POSIXlt (an R-friendly) format. Just do the following:
> options(digits.secs = 3) # this tells R you want it to consider 3 digits for seconds.
> strptime(y, "%d.%m.%Y %H:%M:%OS") # this finally formats your vector
[1] "2009-05-27 14:03:25.777"
I've learned this nice trick here. This other answer also says you can skip the options
setting and use, for example, strptime(y, "%d.%m.%Y %H:%M:%OS3")
, but it doesn't work for me. Henrik noted that the function's help page, ?strptime
states that the %OS3
bit is OS-dependent. I'm using an updated Ubuntu 13.04 and using %OS3
yields NA
.
When using strptime
(or other POSIX-related functions such as as.Date
), keep in mind some of the most common conversions used (edited for brevity, as suggested by DWin. Complete list at strptime
):
%a
Abbreviated weekday name in the current locale.%A
Full weekday name in the current locale.%b
Abbreviated month name in the current locale.%B
Full month name in the current locale.%d
Day of the month as decimal number (01–31).%H
Hours as decimal number (00–23). Times such as 24:00:00 are accepted for input.%I
Hours as decimal number (01–12).%j
Day of year as decimal number (001–366).%m
Month as decimal number (01–12).%M
Minute as decimal number (00–59).%p
AM/PM indicator in the locale. Used in conjunction with%I
and not with%H
.- `%S Second as decimal number (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.%w
Weekday as decimal number (0–6, Sunday is 0).%W
Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19%Y
Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC)
Finding the mean of timestamp data in r for time series data
Since you want to do the aggregation per second, then the only thing you have to do is to convert to proper datetime and use it as your group variable, i.e.
df$grp <- as.POSIXct(paste(as.character(df$ID), as.character(df$Time_Stamp)), format = "%d/%m/%Y %H:%M:%OS")
aggregate(list(mean1 = df$A, mean2 = df$B, mean3 = df$C), list(df$grp), mean)
# Group.1 mean1 mean2 mean3
#1 2018-02-02 07:45:00 122.3333 455.3333 411
#2 2018-02-02 07:45:01 112.0000 2323.0000 2323
Extract year from date
if all your dates are the same width, you can put the dates in a vector and use substring
Date
a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
substring(a,7,10) #This takes string and only keeps the characters beginning in position 7 to position 10
output
[1] "2009" "2010" "2011"
How to convert this timestamp format to standard timestamp format?
You can use strptime
function in this way:
my_time = strptime("1/28/15 16:34", "%m/%d/%y %H:%M")
Note in particular the %m
and the %y
to say, respectively, that months will be written with 1 character from Jan to Sept and year will be written with 2 character.
For example, if you need to convert "01/28/2015" you need %M
and %Y
:
my_time = strptime('01/28/2015 16:34', '%M/%d/%Y %H:%M')
To extract the day of week and the hour:
library(lubridate)
week_day = wday(my_time) # or wday(my_time, label=T) if you want the weekday label (Wed in this case)
day_hour = hour(my_time)
Related Topics
R Shiny Rest API Communication
Joining Aggregated Values Back to the Original Data Frame
Do You Use Attach() or Call Variables by Name or Slicing
Why Does As.Factor Return a Character When Used Inside Apply
What Is Integer Overflow in R and How Can It Happen
Apply a Function Over Groups of Columns
Get_Map Not Passing the API Key (Http Status Was '403 Forbidden')
R Error "Sum Not Meaningful for Factors"
How to Parse Year + Week Number in R
In R, Use Gsub to Remove All Punctuation Except Period
Setting Function Defaults R on a Project Specific Basis
Assigning Dates to Fiscal Year
Create End of the Month Date from a Date Variable
Speed Up Plot() Function for Large Dataset
How to Plot a Hybrid Boxplot: Half Boxplot with Jitter Points on the Other Half
Finding Out Which Functions Are Called Within a Given Function