Query Records and Group by a Block of Time

Query Records and Group by a block of time

If you're certain these runs are contiguous and don't overlap, you should be able to use the Id field to break up your groups. Look for Id fields that are only 1 apart AND datecreated fields that are greater than some threshold apart. From your data, it looks like records within a run are entered within at most a minute of each other, so a safe threshold could be a minute or more.

This would get you your start times

SELECT mrtB.Id, mrtB.DateCreated
FROM MyReportTable AS mrtA
INNER JOIN MyReportTable AS mrtB
ON (mrtA.Id + 1) = mrtB.Id
WHERE DateDiff(mi, mrtA.DateCreated, mrtB.DateCreated) >= 1

I'll call that DataRunStarts

Now you can use that to get info about where the groups started and ended

SELECT drsA.Id AS StartID, drsA.DateCreated, Min(drsB.Id) AS ExcludedEndId
FROM DataRunStarts AS drsA, DataRunStarts AS drsB
WHERE (((drsB.Id)>[drsA].[id]))
GROUP BY drsA.Id, drsA.DateCreated

I'll call that DataRunGroups. I called that last field "Excluded" because the id it holds is just going to be used to define the end boundary for the set of ids that will be pulled.

Now we can use DataRunGroups and MyReportTable to get the counts

SELECT DataRunGroups.StartID, Count(MyReportTable.Id) AS CountOfRecords
FROM DataRunGroups, MyReportTable
WHERE (((MyReportTable.Id)>=[StartId] And (MyReportTable.Id)<[ExcludedEndId]))
GROUP BY DataRunGroups.StartID;

I'll call that DataRunCounts

Now we can put DataRunGroups and DataRunCounts together to get start times and counts.

SELECT DataRunGroups.DateCreated, DataRunCounts.CountOfRecords
FROM DataRunGroups
INNER JOIN DataRunCounts
ON DataRunGroups.StartID = DataRunCounts.StartID;

Depending on your setup, you may need to do all of this on one query, but you get the idea. Also, the very first and very last runs wouldn't be included in this, because there'd be no start id to go by for the very first run, and no end id to go by for the very last run. To include those, you would make queries for just those two ranges, and union them together along with the old DataRunGroups query to create a new DataRunGroups. The other queries that use DataRunGroups would work just as described above.

How can I group time by hour or by 10 minutes?

finally done with

GROUP BY
DATEPART(YEAR, DT.[Date]),
DATEPART(MONTH, DT.[Date]),
DATEPART(DAY, DT.[Date]),
DATEPART(HOUR, DT.[Date]),
(DATEPART(MINUTE, DT.[Date]) / 10)

Group mysql query by 15 min intervals

SELECT   FLOOR(UNIX_TIMESTAMP(timestamp)/(15 * 60)) AS timekey
FROM table
GROUP BY timekey;

Group timestamped record by 5, 10, 15 minutes block

You should use a GROUP BY with :

floor(extract('epoch' from dt) / 300)

to have your data grouped in 5 minutes intervals. 300 is the number of seconds in 5 minutes. Thus if you want 10 minutes, you'd divide by 600. If you want 1 hour, by 3600.

If you want your interval to begin at 00 05 10, use floor(). If you want them to finish at 00, 05, 10, use ceil()

In the SELECT clause, you should re-transform the Unix epoch used in the GROUP BY into a timestamp using

to_timestamp(floor((extract('epoch' from dt) / 300)) * 300)  as ts

Its not clear if you want all the "block" results in the same query, I assumed yes if you want a candlestick graph. I have also logically deduced the right aggregate function (MIN, MAX, AVG, SUM) for each column, following their names . You might have to adapt this.

Here we go :

 SELECT '5 minutes' as block,
to_timestamp(floor((extract('epoch' from dt) / 300)) * 300) as ts,
round(AVG(open),4) as avg_open,
round(MAX(high),4) as max_high,
round(MIN(low),4) as min_low,
round(AVG(close),4) as avg_close,
SUM(vol) as sum_vol
FROM mytable
GROUP BY floor(extract('epoch' from dt) / 300)

UNION ALL

SELECT '10 minutes' as block,
to_timestamp(floor((extract('epoch' from dt) / 600)) * 600) as ts,
round(AVG(open),4) as avg_open,
round(MAX(high),4) as max_high,
round(MIN(low),4) as min_low,
round(AVG(close),4) as avg_close,
SUM(vol) as sum_vol
FROM mytable
GROUP BY floor(extract('epoch' from dt) / 600)

UNION ALL

SELECT '1 hour' as block,
to_timestamp(floor((extract('epoch' from dt) / 3600)) * 3600) as ts,
round(AVG(open),4) as avg_open,
round(MAX(high),4) as max_high,
round(MIN(low),4) as min_low,
round(AVG(close),4) as avg_close,
SUM(vol) as sum_vol
FROM mytable
GROUP BY floor(extract('epoch' from dt) / 3600)

Results:

    block       ts                  avg_open    max_high    min_low     avg_close   sum_vol
5 minutes 04.05.2018 17:30:00 171 171,3 170,9 171 42817
5 minutes 04.05.2018 17:25:00 170,8625 171 170,75 170,85 142711
10 minutes 04.05.2018 17:20:00 170,8625 171 170,75 170,85 142711
10 minutes 04.05.2018 17:30:00 171 171,3 170,9 171 42817
1 hour 04.05.2018 17:00:00 170,89 171,3 170,75 170,88 185528

Test it on REXTESTER

Get top 1 row of each group

;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...

As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

Using ORDER BY and GROUP BY together

One way to do this that correctly uses group by:

select l.* 
from table l
inner join (
select
m_id, max(timestamp) as latest
from table
group by m_id
) r
on l.timestamp = r.latest and l.m_id = r.m_id
order by timestamp desc

How this works:

  • selects the latest timestamp for each distinct m_id in the subquery
  • only selects rows from table that match a row from the subquery (this operation -- where a join is performed, but no columns are selected from the second table, it's just used as a filter -- is known as a "semijoin" in case you were curious)
  • orders the rows


Related Topics



Leave a reply



Submit