How to Average Time Intervals

Make the average of values for time intervals mysql

One way could be to extract date and hour part from the timestamp and group by the resultant.

select DATE_ADD(date(fecha), INTERVAL EXTRACT(HOUR FROM fecha) HOUR) as FECHA_DATE_HOUR, 
avg(Valor_Dispositivo) as Valor_Dispositivo
from Telegramas
group by date(fecha), EXTRACT(HOUR FROM fecha);

Result:

+---------------------+-------------------+
| FECHA_DATE_HOUR | Valor_Dispositivo |
+---------------------+-------------------+
| 14.12.2017 11:00:00 | 4.3333 |
| 14.12.2017 12:00:00 | 5.0000 |
+---------------------+-------------------+

DEMO

How to calculate average time between intervals in SQL?

Assuming your table is named "MyTable" (Is it really named Transaction!?) and that you want the difference in minutes:

SELECT  CustomerID ,
SUM(timeSinceLastTransaction) / COUNT(*)
FROM ( SELECT * ,
DATEDIFF(MINUTE,
( SELECT TOP 1
t2.DataTime
FROM MyTable t2
WHERE t2.DataTime < t1.DataTime
AND t2.CustomerId = t1.CustomerId
ORDER BY t2.DataTime DESC
),
t1.DataTime
) AS timeSinceLastTransaction
FROM MyTable t1
) AS IndividualTimes

This is a correlated subquery.

Calculate average time between an array of dates

The average date interval is the time elapsed between the first and last date and divide by n-1, the number of intervals. That’s going to be most efficient.

This works because the average is equal to the sum of the intervals divided by the number of intervals. But the sum of all the intervals is equal to the difference between the first and last date.

Assuming your date strings are already in order, just grab the first and last, calculate the difference and divide.

let dateStrings = ["2019-02-18T18:06:30.523", "2019-02-18T19:06:30.523", "2019-02-18T21:06:30.523"]

let dateFormatter = DateFormatter()
dateFormatter.dateFormat = "yyyy-MM-dd'T'HH:mm:ss.SSS"
dateFormatter.locale = Locale(identifier: "en_US_POSIX")
dateFormatter.timeZone = TimeZone(secondsFromGMT: 0) // I’m going to assume it’s GMT; what is it really?

guard dateStrings.count > 1,
let lastDateString = dateStrings.last,
let lastDate = dateFormatter.date(from: lastDateString),
let firstDateString = dateStrings.first,
let firstDate = dateFormatter.date(from: firstDateString) else { return }

let average = lastDate.timeIntervalSince(firstDate) / Double(dateStrings.count - 1)

That’s in seconds. If you’d like a nice string format and don’t care about milliseconds, the DateComponentsFormatter is convenient for localized strings:

let dateComponentsFormatter = DateComponentsFormatter()
dateComponentsFormatter.allowedUnits = [.hour, .minute, .second]
dateComponentsFormatter.unitsStyle = .full
let string = dateComponentsFormatter.string(from: average)

That produces:

"1 hour, 30 minutes"


Or you can, less efficiently, build the dates array:

let dateStrings = ["2019-02-18T18:06:30.523", "2019-02-18T19:06:30.523", "2019-02-18T21:06:30.523"]

guard dateStrings.count > 1 else { return }

let dates = dateStrings.map { dateFormatter.date(from: $0)! }

Then you could build an array of intervals between those dates:

var intervals: [TimeInterval] = []
for index in 1 ..< dates.count {
intervals.append(dates[index].timeIntervalSince(dates[index-1]))
}

And then average them:

let average = intervals.reduce(0.0, +) / Double(intervals.count)

And format to taste:

let dateComponentsFormatter = DateComponentsFormatter()
dateComponentsFormatter.allowedUnits = [.hour, .minute, .second]
dateComponentsFormatter.unitsStyle = .full
let string = dateComponentsFormatter.string(from: average)

How to average of data over specific time period recoding the ending time

TL;DR

import pandas as pd

from datetime import datetime, timedelta

x = [ ['2:30:01', '5'],
['2:30:02', '9'],
['2:30:03', '450'],
['2:30:04', '7'],
['2:30:05', '10'],
['2:30:06', '300']]

df = pd.DataFrame(x, columns=(['time', 's']))
df['time'] = df['time'].apply(lambda t: datetime.strptime(t, '%H:%M:%S'))
df['s'] = df['s'].astype(int)

df_new = pd.DataFrame([{'start_time':interval_start.strftime("%H:%M:%S"),
'end_time': (interval_start+timedelta(0,2)).strftime("%H:%M:%S"),
's': sum(rows['s'])/len(rows['s'])}
for interval_start, rows in
df.set_index('time').resample('3s',offset="1s")])

[out]:

    start_time  end_time    s
0 02:30:01 02:30:03 154.666667
1 02:30:04 02:30:06 105.666667

In Long

First it's easier to manipulate time if you convert the string time type to datetime objects (or if you are Dr. Who =)):

df['time'] = df['time'].apply(lambda t: datetime.strptime(t, '%H:%M:%S'))

The heavy lifting is done by:

df.set_index('time').resample('3s',offset="1s")]

The DataFrame.resample(...) acts like a group by where you group by 3 seconds using '3s' and you have an offset of 1s, thus grouping all your data points within the 3 secs interval.

And this converts the datetime object to the original string format of your timestamp:

interval_start.strftime("%H:%M:%S")

And to get the end time of the interval:

interval_start+timedelta(0,2))

How to calculate average interval time

Please find below code for MSSQL,

CREATE TABLE customer_data (customer_id BIGINT, date DATE, time time, answer VARCHAR(100), missed_call_type VARCHAR(100));

INSERT INTO customer_data
VALUES
(101, '2018/8/3', '12:13:00', 'no', 'employee'),
(102, '2018/8/3', '12:15:00', 'no', 'customer'),
(103, '2018/8/3', '12:20:00', 'no', 'employee'),
(102, '2018/8/3', '15:15:00', 'no', 'customer'),
(101, '2018/8/3', '18:15:00', 'no', 'employee'),
(105, '2018/8/3', '18:18:00', 'no', 'customer'),
(102, '2018/8/3', '19:18:00', 'no', 'employee')

select cd.customer_id, answer, missed_call_type,
CAST(CAST(cd.date as VARCHAR(10))+' ' +CAST(cd.time as VARCHAR(10)) as datetime) as date,
ROW_NUMBER() OVER(PARTITION BY cd.customer_id ORDER BY date desc, time desc) as ranks
INTO #temP
from customer_data cd
order by cd.customer_Id, ranks;

select AVG(DATEDIFF(MINUTE, x1.date, x2.date)) as avg_mins
from #temP x1
INNER JOIN #temP x2 ON x1.customer_id = x2.customer_id
WHERE x2.ranks = (x1.ranks-1)

How to get average values for time intervals in Postgres

DB design

While you can work with separate date and time columns, there is really no advantage over a single timestamp column. I would adapt:

ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time; -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;

If date and time are not actual date and time data types, use to_timestamp(). Related:

  • Calculating Cumulative Sum in PostgreSQL
  • How to convert "string" to "timestamp without time zone"

Query

Then the query is a bit simpler:

SELECT *
FROM (
SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
FROM tbl
WHERE sn = '4as11111111'
AND ts >= '2018-01-01'
AND ts < '2018-01-02'
GROUP BY 1
) grid
CROSS JOIN LATERAL (
SELECT round(avg(vin1), 2) AS vin1_av
, round(avg(vin2), 2) AS vin2_av
, round(avg(vin3), 2) AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts >= grid.ts
AND ts < grid.ts + interval '5 min'
) avg;

db<>fiddle here

Generate a grid of start times in the first subquery grid, running from the first to the last qualifying row in the given time frame.

Join to rows that fall in each partition with a LATERAL join and immediately aggregate averages in the subquery avg. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL in this case.

The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.

Index

At least have this multicolumn index:

CRATE INDEX foo_idx ON tbl (sn, ts);

Or on (sn, ts, vin1, vin2, vin3) to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.

Closely related:

  • Slow LEFT JOIN on CTE with time intervals
  • Best way to count records by arbitrary time intervals in Rails+Postgres

Based on your original table

As requested and clarified in the comment, and later updated again in the question to include the columns mac and loc. I assume you want separate averages per (mac, loc).

date and time are still separate columns, vin* columns are type float, and exclude time slots without rows:

The updated query also moves the set-returning function generate_series() to the FROM list, which is cleaner before Postgres 10:

SELECT t.mac, sn.sn, t.loc, ts.ts::time AS time, ts.ts::date AS date
, t.vin1_av, t.vin2_av, t.vin3_av
FROM (SELECT text '4as11111111') sn(sn) -- provide sn here once
CROSS JOIN LATERAL (
SELECT min(date+time) AS min_ts, max(date+time) AS max_ts
FROM tbl
WHERE sn = sn.sn
AND date+time >= '2018-01-01 0:0' -- provide time frame here
AND date+time < '2018-01-02 0:0'
) grid
CROSS JOIN LATERAL generate_series(min_ts, max_ts, interval '5 min') ts(ts)
CROSS JOIN LATERAL (
SELECT mac, loc
, round(avg(vin1)::numeric, 2) AS vin1_av -- cast to numeric for round()
, round(avg(vin2)::numeric, 2) AS vin2_av -- but rounding is optional
, round(avg(vin3)::numeric, 2) AS vin3_av
FROM tbl
WHERE sn = sn.sn
AND date+time >= ts.ts
AND date+time < ts.ts + interval '5 min'
GROUP BY mac, loc
HAVING count(*) > 0 -- exclude empty slots
) t;

Create a multicolumn expression index to support this:

CRATE INDEX bar_idx ON tbl (sn, (date+time));

db<>fiddle here

But I would much rather use timestamp all along.



Related Topics



Leave a reply



Submit