How to Narrow Down Perf.Data to a Time Sub Interval

How do you group by any time based interval?

I think you are over complicating thins.

You can use GROUP BY (DATEDIFF(MINUTE, '2017-01-01', TheDateTime) / 30 for grouping by every 30 minutes. Of course, the date I've chosen is a just a random date. You can choose, if you want, the first (or last) date in your sample data.

And you can also use this technique to get every interval of any time part - just change the keyword MINUTE to any date part you want to use, and the intreval 30 to any interval you want.

Consider the following sample data:

;WITH CTE AS 
(
SELECT CAST('2017-01-01T00:00:00' as datetime) As TheDateTime, 0 as rn
UNION ALL
SELECT DATEADD(MINUTE, 1, TheDateTime), rn + 1
FROM CTE
WHERE rn < 60
)

SELECT TheDateTime, rn INTO #T
FROM CTE
OPTION(MAXRECURSION 0)

#T now contains the following data:

TheDateTime                 rn
2017-01-01 00:00:00.000 0
2017-01-01 00:01:00.000 1
2017-01-01 00:02:00.000 2
2017-01-01 00:03:00.000 3
...
2017-01-01 00:59:00.000 59
2017-01-01 01:00:00.000 60

To get the maximum rn grouped by 30 minutes you just need this:

SELECT DATEDIFF(MINUTE, '2017-01-01', TheDateTime) / 30, MAX(rn)
FROM #T
GROUP BY DATEDIFF(MINUTE, '2017-01-01', TheDateTime) / 30

Results:

interval    max_rn
0 29
1 59
2 60

How to improve the performance of timescaledb getting last timestamp

The database has to go to the subindexes of each chunk and retrieve find which is the latest timestamp for timeseries_id=x. The database correctly uses the index (as you can see from the explain) it does an index scan, not a full scan, of each sub-index in each of the chunks. So it does >1000 index scans. No chunks can be pruned because the planner can't know which chunks have the entries for that specific timeseries_id.

And you have 1300 chunks for only 66m records -> ~50k rows per chunk. That's too few rows per chunk. From the Timescale Docs they have the following recommendations:

The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space partitions) fit into memory. As such, we typically recommend setting the interval so that these chunk(s) comprise no more than 25% of main memory.

https://docs.timescale.com/latest/using-timescaledb/hypertables#best-practices

Reducing the number of chunks will significantly improve the query performance.

Additionally you might gain even more query performance if you utilize TimescaleDB compression, which will reduce the number of chunks required to be scanned even more, you could segment by the timeseries_id (https://docs.timescale.com/latest/api#compression) Or you could define a continuous aggregate that will hold the last item per timeseries_id (https://docs.timescale.com/latest/api#continuous-aggregates)

Reduce table records based on minimum time difference

Should be a simple LAG() to grab the previous timestamp and check the diff. Will say your column [timestamp] is an odd data type, what about different days? Is there a separate column for date?

Return Records >30 Minutes from Previous Record

WITH cte_DeltaSinceLastView AS (
SELECT *
/*Grab previous record for each user_id/entity_id combo*/
,PrevTimestamp = LAG([timestamp]) OVER (PARTITION BY [user_id],[entity_id] ORDER BY [timestamp])
FROM YourTable
) AS A(ID,[user_id],[entity_id],[timestamp])
)
SELECT *,MinutesSinceLastView = DATEDIFF(minute,PrevTimestamp,[Timestamp])
FROM cte_DeltaSinceLastView
WHERE DATEDIFF(minute,PrevTimestamp,[timestamp]) > 30 /*Over 30 minutes between last view*/
OR PrevTimestamp IS NULL /*First view will not have previous timestamp to compare against*/

In TimescaleDB how to add retention policy for size instead of time interval?

Retention policy drops entire chunk and chunks are measured by time intervals, thus there is no sense to define policy in size and not in time. The policy drops a chunk after entire chunk is older than given interval, thus if chunk size is 7 days and retention policy is 3 days, then the oldest dropped data will be 10 days old (the dropped chunk contains data from 10 to 3 days old). Chunks are represented by tables internally, thus dropping a chunk is dropping a table, which is the most efficient way to delete data in PostgreSQL. Deleting by row is much more expensive than dropping or truncating a table and doesn't free space until VACUUM is run.

TimescaleDB expects that you know your application load well and can correctly estimate desired size in time interval.

Time dimension column is not required to have time type, but can be a number. It is important that time dimension column increases over time and it is clear how to use in queries and define chunk size. So it is possible to use a counter for the time dimension column and increment it for each row by 1 or by row size. Notice that syncing counter can be a bottleneck.

It is also possible to write a user-defined action, where own action can be defined to be executed on regular basis as a custom policy.

Summary of thee possible solutions:

  1. Give good estimate of chunk size, which is expected way by TimescaleDB.
  2. Define numerical Time dimension column with counter-like implementation.
  3. Write custom policy using user-defined action.


Related Topics



Leave a reply



Submit