Make the average of values for time intervals mysql
One way could be to extract date
and hour
part from the timestamp
and group by the resultant.
select DATE_ADD(date(fecha), INTERVAL EXTRACT(HOUR FROM fecha) HOUR) as FECHA_DATE_HOUR,
avg(Valor_Dispositivo) as Valor_Dispositivo
from Telegramas
group by date(fecha), EXTRACT(HOUR FROM fecha);
Result:
+---------------------+-------------------+
| FECHA_DATE_HOUR | Valor_Dispositivo |
+---------------------+-------------------+
| 14.12.2017 11:00:00 | 4.3333 |
| 14.12.2017 12:00:00 | 5.0000 |
+---------------------+-------------------+
DEMO
How to calculate average time between intervals in SQL?
Assuming your table is named "MyTable" (Is it really named Transaction!?) and that you want the difference in minutes:
SELECT CustomerID ,
SUM(timeSinceLastTransaction) / COUNT(*)
FROM ( SELECT * ,
DATEDIFF(MINUTE,
( SELECT TOP 1
t2.DataTime
FROM MyTable t2
WHERE t2.DataTime < t1.DataTime
AND t2.CustomerId = t1.CustomerId
ORDER BY t2.DataTime DESC
),
t1.DataTime
) AS timeSinceLastTransaction
FROM MyTable t1
) AS IndividualTimes
This is a correlated subquery.
Calculate average time between an array of dates
The average date interval is the time elapsed between the first and last date and divide by n-1
, the number of intervals. That’s going to be most efficient.
This works because the average is equal to the sum of the intervals divided by the number of intervals. But the sum of all the intervals is equal to the difference between the first and last date.
Assuming your date strings are already in order, just grab the first and last, calculate the difference and divide.
let dateStrings = ["2019-02-18T18:06:30.523", "2019-02-18T19:06:30.523", "2019-02-18T21:06:30.523"]
let dateFormatter = DateFormatter()
dateFormatter.dateFormat = "yyyy-MM-dd'T'HH:mm:ss.SSS"
dateFormatter.locale = Locale(identifier: "en_US_POSIX")
dateFormatter.timeZone = TimeZone(secondsFromGMT: 0) // I’m going to assume it’s GMT; what is it really?
guard dateStrings.count > 1,
let lastDateString = dateStrings.last,
let lastDate = dateFormatter.date(from: lastDateString),
let firstDateString = dateStrings.first,
let firstDate = dateFormatter.date(from: firstDateString) else { return }
let average = lastDate.timeIntervalSince(firstDate) / Double(dateStrings.count - 1)
That’s in seconds. If you’d like a nice string format and don’t care about milliseconds, the DateComponentsFormatter
is convenient for localized strings:
let dateComponentsFormatter = DateComponentsFormatter()
dateComponentsFormatter.allowedUnits = [.hour, .minute, .second]
dateComponentsFormatter.unitsStyle = .full
let string = dateComponentsFormatter.string(from: average)
That produces:
"1 hour, 30 minutes"
Or you can, less efficiently, build the dates
array:
let dateStrings = ["2019-02-18T18:06:30.523", "2019-02-18T19:06:30.523", "2019-02-18T21:06:30.523"]
guard dateStrings.count > 1 else { return }
let dates = dateStrings.map { dateFormatter.date(from: $0)! }
Then you could build an array of intervals between those dates:
var intervals: [TimeInterval] = []
for index in 1 ..< dates.count {
intervals.append(dates[index].timeIntervalSince(dates[index-1]))
}
And then average them:
let average = intervals.reduce(0.0, +) / Double(intervals.count)
And format to taste:
let dateComponentsFormatter = DateComponentsFormatter()
dateComponentsFormatter.allowedUnits = [.hour, .minute, .second]
dateComponentsFormatter.unitsStyle = .full
let string = dateComponentsFormatter.string(from: average)
How to average of data over specific time period recoding the ending time
TL;DR
import pandas as pd
from datetime import datetime, timedelta
x = [ ['2:30:01', '5'],
['2:30:02', '9'],
['2:30:03', '450'],
['2:30:04', '7'],
['2:30:05', '10'],
['2:30:06', '300']]
df = pd.DataFrame(x, columns=(['time', 's']))
df['time'] = df['time'].apply(lambda t: datetime.strptime(t, '%H:%M:%S'))
df['s'] = df['s'].astype(int)
df_new = pd.DataFrame([{'start_time':interval_start.strftime("%H:%M:%S"),
'end_time': (interval_start+timedelta(0,2)).strftime("%H:%M:%S"),
's': sum(rows['s'])/len(rows['s'])}
for interval_start, rows in
df.set_index('time').resample('3s',offset="1s")])
[out]:
start_time end_time s
0 02:30:01 02:30:03 154.666667
1 02:30:04 02:30:06 105.666667
In Long
First it's easier to manipulate time if you convert the string time type to datetime objects (or if you are Dr. Who =)):
df['time'] = df['time'].apply(lambda t: datetime.strptime(t, '%H:%M:%S'))
The heavy lifting is done by:
df.set_index('time').resample('3s',offset="1s")]
The DataFrame.resample(...)
acts like a group by where you group by 3 seconds using '3s'
and you have an offset of 1s
, thus grouping all your data points within the 3 secs interval.
And this converts the datetime object to the original string format of your timestamp:
interval_start.strftime("%H:%M:%S")
And to get the end time of the interval:
interval_start+timedelta(0,2))
How to calculate average interval time
Please find below code for MSSQL,
CREATE TABLE customer_data (customer_id BIGINT, date DATE, time time, answer VARCHAR(100), missed_call_type VARCHAR(100));
INSERT INTO customer_data
VALUES
(101, '2018/8/3', '12:13:00', 'no', 'employee'),
(102, '2018/8/3', '12:15:00', 'no', 'customer'),
(103, '2018/8/3', '12:20:00', 'no', 'employee'),
(102, '2018/8/3', '15:15:00', 'no', 'customer'),
(101, '2018/8/3', '18:15:00', 'no', 'employee'),
(105, '2018/8/3', '18:18:00', 'no', 'customer'),
(102, '2018/8/3', '19:18:00', 'no', 'employee')
select cd.customer_id, answer, missed_call_type,
CAST(CAST(cd.date as VARCHAR(10))+' ' +CAST(cd.time as VARCHAR(10)) as datetime) as date,
ROW_NUMBER() OVER(PARTITION BY cd.customer_id ORDER BY date desc, time desc) as ranks
INTO #temP
from customer_data cd
order by cd.customer_Id, ranks;
select AVG(DATEDIFF(MINUTE, x1.date, x2.date)) as avg_mins
from #temP x1
INNER JOIN #temP x2 ON x1.customer_id = x2.customer_id
WHERE x2.ranks = (x1.ranks-1)
How to get average values for time intervals in Postgres
DB design
While you can work with separate date
and time
columns, there is really no advantage over a single timestamp
column. I would adapt:
ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time; -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;
If date and time are not actual date
and time
data types, use to_timestamp()
. Related:
- Calculating Cumulative Sum in PostgreSQL
- How to convert "string" to "timestamp without time zone"
Query
Then the query is a bit simpler:
SELECT *
FROM (
SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
FROM tbl
WHERE sn = '4as11111111'
AND ts >= '2018-01-01'
AND ts < '2018-01-02'
GROUP BY 1
) grid
CROSS JOIN LATERAL (
SELECT round(avg(vin1), 2) AS vin1_av
, round(avg(vin2), 2) AS vin2_av
, round(avg(vin3), 2) AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts >= grid.ts
AND ts < grid.ts + interval '5 min'
) avg;
db<>fiddle here
Generate a grid of start times in the first subquery grid
, running from the first to the last qualifying row in the given time frame.
Join to rows that fall in each partition with a LATERAL
join and immediately aggregate averages in the subquery avg
. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL
in this case.
The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.
Index
At least have this multicolumn index:
CRATE INDEX foo_idx ON tbl (sn, ts);
Or on (sn, ts, vin1, vin2, vin3)
to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.
Closely related:
- Slow LEFT JOIN on CTE with time intervals
- Best way to count records by arbitrary time intervals in Rails+Postgres
Based on your original table
As requested and clarified in the comment, and later updated again in the question to include the columns mac
and loc
. I assume you want separate averages per (mac, loc)
.
date
and time
are still separate columns, vin* columns are type float
, and exclude time slots without rows:
The updated query also moves the set-returning function generate_series()
to the FROM
list, which is cleaner before Postgres 10:
SELECT t.mac, sn.sn, t.loc, ts.ts::time AS time, ts.ts::date AS date
, t.vin1_av, t.vin2_av, t.vin3_av
FROM (SELECT text '4as11111111') sn(sn) -- provide sn here once
CROSS JOIN LATERAL (
SELECT min(date+time) AS min_ts, max(date+time) AS max_ts
FROM tbl
WHERE sn = sn.sn
AND date+time >= '2018-01-01 0:0' -- provide time frame here
AND date+time < '2018-01-02 0:0'
) grid
CROSS JOIN LATERAL generate_series(min_ts, max_ts, interval '5 min') ts(ts)
CROSS JOIN LATERAL (
SELECT mac, loc
, round(avg(vin1)::numeric, 2) AS vin1_av -- cast to numeric for round()
, round(avg(vin2)::numeric, 2) AS vin2_av -- but rounding is optional
, round(avg(vin3)::numeric, 2) AS vin3_av
FROM tbl
WHERE sn = sn.sn
AND date+time >= ts.ts
AND date+time < ts.ts + interval '5 min'
GROUP BY mac, loc
HAVING count(*) > 0 -- exclude empty slots
) t;
Create a multicolumn expression index to support this:
CRATE INDEX bar_idx ON tbl (sn, (date+time));
db<>fiddle here
But I would much rather use timestamp
all along.
Related Topics
Why Select Top Clause Could Lead to Long Time Cost
What SQLite Column Name Can Be/Cannot Be
Behaviour of Not Like with Null Values
Is Bigint(8) the Largest Integer MySQL Can Store
How to Include the Total Number of Returned Rows in the Resultset from Select T-SQL Command
Why Is Selecting Specified Columns, and All, Wrong in Oracle SQL
Display Parent-Child Relationship When Parent and Child Are Stored in Same Table
Looking for a SQL Transaction Log File Viewer
SQL Server Linked Database Aliases
SQL Update Records with Row_Number()
Why Would a SQL Query Have "Where 1 = 1"
Select Rows Until Condition Met
Count Rows Per Hour in SQL Server with Full Date-Time Value as Result