aggregate 1-minute data into 5-minute average data
Using the dplyr
package and assuming, your data is stored in a data frame named df
:
require(dplyr)
df %>%
group_by(DeviceTime = cut(DeviceTime, breaks="5 min")) %>%
summarize(Concentration = mean(Concentration))
Use T-SQL window functions to retrieve 5-minute averages from 1-minute data
The gist of the problem is rounding datetime values to 5 minute boundary which (assuming that the datatype is datetime
) could be done using DATEADD(MINUTE, DATEDIFF(MINUTE, 0, time) / 5 * 5, 0)
. Rest is basic grouping/window functions:
WITH cte AS (
SELECT clamped_time
, [Open]
, [Close]
, [High]
, [Low]
, [Volume]
, rn1 = ROW_NUMBER() OVER (PARTITION BY clamped_time ORDER BY [Time])
, rn2 = ROW_NUMBER() OVER (PARTITION BY clamped_time ORDER BY [Time] DESC)
FROM t
CROSS APPLY (
SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, 0, time) / 5 * 5, 0)
) AS x(clamped_time)
)
SELECT clamped_time
, MIN(CASE WHEN rn1 = 1 THEN [Open] END) AS [Open]
, MIN(CASE WHEN rn2 = 1 THEN [Close] END) AS [Close]
, MAX([High]) AS [High]
, MIN([Low]) AS [Low]
, AVG([Volume])
FROM cte
GROUP BY clamped_time
Demo on db<>fiddle
How to get average value for each hourly increment that is split into 5 minute intervals
You didn't specify the type for interval_time, so I'm assuming a string, you can parse it out with a case statement like this:
SELECT interval_date,
CASE WHEN SUBSTRING(interval_time,4,2)='00' THEN interval_time
WHEN SUBSTRING(interval_time,1,2)='23' THEN '00:00'
ELSE FORMAT(convert(int,SUBSTRING(interval_time,1,2))+1,'00')+':00'
END interval_time,
AVG(power)
FROM mytable
WHERE on_status = 'Y'
GROUP BY interval_date,
CASE WHEN SUBSTRING(interval_time,4,2)='00' THEN interval_time
WHEN SUBSTRING(interval_time,1,2)='23' THEN '00:00'
ELSE FORMAT(convert(int,SUBSTRING(interval_time,1,2))+1,'00')+':00'
END
Note that to get your target of 5.17, I had to comment out the on_status = 'Y' filter.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b604e4fe6696465aac75676e69b92a47
Aggregate time series data, get average without NA or 0s in R
The following uses cut
to make of column V1
a grouping variable by 5 minutes intervals and then summarizes using a custom function to compute means without NA
's or zero values. I have left this function in two code lines to make it more readable but it could simply be
f <- function(x) mean(x[x != 0], na.rm = TRUE)
The date time column V1
is first coerced to class "POSIXct"
.
library(data.table)
f <- function(x){
y <- x[x != 0]
mean(y, na.rm = TRUE)
}
df[, V1 := as.POSIXct(V1)]
df[, V1 := cut(V1, "5 mins")]
df[, lapply(.SD, f), by = V1]
# V1 423 470 473 626
#1: 2018-01-01 00:00:00 25.33333 28.00000 50.83333 31
#2: 2018-01-01 00:05:00 30.25722 24.00000 49.25000 21
#3: 2018-01-01 00:10:00 30.42409 21.50000 35.40000 75
#4: 2018-01-01 00:15:00 26.41851 25.16667 36.83333 48
A one-liner could be
df[, lapply(.SD, f), by = cut(as.POSIXct(V1), "5 mins")]
Group DataFrame in 5-minute intervals
You can use df.resample
to do aggregation based on a date/time variable. You'll need a datetime index and you can specify that while reading the csv file:
df = pd.read_csv("filename.csv", parse_dates = [["DATE", "TIME"]], index_col=0)
This will result in a dataframe with an index where date and time are combined (source):
df.head()
Out[7]:
OPEN HIGH LOW CLOSE VOLUME
DATE_TIME
1997-02-03 09:04:00 3046.0 3048.5 3046.0 3047.5 505
1997-02-03 09:05:00 3047.0 3048.0 3046.0 3047.0 162
1997-02-03 09:06:00 3047.5 3048.0 3047.0 3047.5 98
1997-02-03 09:07:00 3047.5 3047.5 3047.0 3047.5 228
1997-02-03 09:08:00 3048.0 3048.0 3047.5 3048.0 136
After that you can use resample to get the sum, mean, etc. of those five minute intervals.
df.resample("5T").mean()
Out[8]:
OPEN HIGH LOW CLOSE VOLUME
DATE_TIME
1997-02-03 09:00:00 3046.0 3048.5 3046.0 3047.5 505.0
1997-02-03 09:05:00 3047.6 3047.9 3046.8 3047.3 159.6
1997-02-03 09:10:00 3045.6 3045.9 3044.8 3045.0 110.2
1997-02-03 09:15:00 3043.6 3044.0 3042.8 3043.2 69.2
1997-02-03 09:20:00 3044.7 3045.2 3044.5 3045.0 65.8
1997-02-03 09:25:00 3043.8 3044.0 3043.5 3043.7 59.0
1997-02-03 09:30:00 3044.6 3045.0 3044.3 3044.6 56.0
1997-02-03 09:35:00 3044.5 3044.5 3043.5 3044.5 44.0
(T is used for minute frequency. Here is a list of other units.)
Aggregate 10 minute interval data to hourly
This worked perfectly:
df.resample('60T').mean()
aggregate column values at 5 min intervals and create a new dataframe
First make sure the timestamp columns is a date.time column. You can skip this line if it already is in this format.
df1$timestamp <- as.POSIXct(df1$timestamp)
xts has some nice functions for working with timeseries. Especially for rolling functions or time aggregating functions. In this case period.apply
can help out.
library(xts)
# create xts object. Be sure to exclude the timestamp column otherwise you end up with a character matrix.
df1_xts <- as.xts(df1[, -4], order.by = df1$timestamp)
# sum per 5 minute intervals
df1_xts_summed <- period.apply(df1_xts, endpoints(df1_xts, on = "minutes", k = 5), colSums)
# count rows per 5 minute interval and add to data
df1_xts_summed$nrows <- period.apply(df1_xts$T1, endpoints(df1_xts, on = "minutes", k = 5), nrow)
df1_xts_summed
T1 T2 T3 nrows
2015-11-05 10:24:00 90.31 89.03 26.37 2
2015-11-05 10:29:16 136.24 133.60 39.56 3
2015-11-05 10:32:00 45.96 44.85 13.23 1
2015-11-05 10:37:00 90.88 89.18 26.78 2
If you want it all back into a data.frame:
df_final <- data.frame(timestamp = index(df1_xts_summed), coredata(df1_xts_summed))
df_final
timestamp T1 T2 T3 nrows
1 2015-11-05 10:24:00 90.31 89.03 26.37 2
2 2015-11-05 10:29:16 136.24 133.60 39.56 3
3 2015-11-05 10:32:00 45.96 44.85 13.23 1
4 2015-11-05 10:37:00 90.88 89.18 26.78 2
Edit if you want everything rounded at 5 minutes with these as the timestamps you need to do the following:
First step is to replace the timestamps with the 5 minute intervals, taking into account the starting minutes of the timestamps. For this I use the ceiling_date
from the lubridate package and add to it the difference between the first values of the timestamp and the ceiling of the first value of the timestamp. This will return the last values of each interval. (If you want to use the start of the interval you need to use floor_date
)
df1$timestamp <- lubridate::ceiling_date(df1$timestamp, "5 mins") + difftime(lubridate::ceiling_date(first(df1$timestamp), "5 mins"), first(df1$timestamp), unit = "secs")
Next the same xts code as before which returns the same data, but the timestamp is now the last value of the 5 minute intervals.
df1_xts <- as.xts(df1[, -4], order.by = df1$timestamp)
df1_xts_summed <- period.apply(df1_xts, ep, colSums)
df1_xts_summed$nrows <- period.apply(df1_xts$T1, endpoints(df1_xts, on = "minutes", k = 5), nrow)
df_final <- data.frame(timestamp = index(df1_xts_summed), coredata(df1_xts_summed))
df_final
timestamp T1 T2 T3 nrows
1 2015-11-05 10:27:00 90.31 89.03 26.37 2
2 2015-11-05 10:32:00 136.24 133.60 39.56 3
3 2015-11-05 10:37:00 45.96 44.85 13.23 1
4 2015-11-05 10:42:00 90.88 89.18 26.78 2
data:
df1 <- structure(list(T1 = c(45.37, 44.94, 45.32, 45.46, 45.46, 45.96,
45.52, 45.36), T2 = c(44.48, 44.55, 44.44, 44.51, 44.65, 44.85,
44.56, 44.62), T3 = c(13, 13.37, 13.09, 13.29, 13.18, 13.23,
13.53, 13.25), timestamp = c("2015-11-05 10:23:00", "2015-11-05 10:24:00",
"2015-11-05 10:27:00", "2015-11-05 10:28:00", "2015-11-05 10:29:16",
"2015-11-05 10:32:00", "2015-11-05 10:36:00", "2015-11-05 10:37:00"
)), class = "data.frame", row.names = c(NA, -8L))
Converting minutes data to hourly data
Use Grouper
for aggregate by hours with column Unique_id
by sum
:
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.groupby([pd.Grouper(freq='H', key='datetime'), 'Unique_id']).sum().reset_index()
print (df)
datetime Unique_id Value
0 2018-01-28 00:00:00 105714 1285
1 2018-01-28 00:00:00 206714 888
2 2018-01-28 23:00:00 105714 225
3 2018-01-28 23:00:00 206714 410
Related Topics
Leaflet Legend for Custom Markers in R
What Does the Function Invisible() Do
How to Read the Header But Also Skip Lines - Read.Table()
Read Gzipped CSV Directly from a Url in R
R Tm Package Vcorpus: Error in Converting Corpus to Data Frame
Solving Non-Square Linear System with R
Ggplot Scale Color Gradient to Range Outside of Data Range
Optimized Rolling Functions on Irregular Time Series with Time-Based Window
Equivalent to Rowmeans() for Min()
How Achieve Identical Facet Sizes and Scales in Several Multi-Facet Ggplot2 Graphics
How to Set Legend Alpha with Ggplot2
How to Change the Number of Decimal Places on Axis Labels in Ggplot2
R Markdown - Variable Output Name
How to Load Data Quickly into R
How to Subset a Matrix with Different Column Positions for Each Row