Truncate Timestamp to Arbitrary Intervals

Truncate timestamp to arbitrary intervals

Consider this demo to bring timestamps down to a resolution of 15 minutes and aggregate resulting dupes:

WITH tbl(id, ts) AS ( VALUES
(1::int, '2012-10-04 00:00:00'::timestamp)
,(2, '2012-10-04 18:23:01')
,(3, '2012-10-04 18:30:00')
,(4, '2012-10-04 18:52:33')
,(5, '2012-10-04 18:55:01')
,(6, '2012-10-04 18:59:59')
,(7, '2012-10-05 11:01:01')
)
SELECT to_timestamp((extract(epoch FROM ts)::bigint / 900)*900)::timestamp
AS lower_bound
, to_timestamp(avg(extract(epoch FROM ts)))::timestamp AS avg_ts
, count(*) AS ct
FROM tbl
GROUP BY 1
ORDER BY 1;

Result:

     lower_bound     |       avg_ts        | ct
---------------------+---------------------+----
2012-10-04 00:00:00 | 2012-10-04 00:00:00 | 1
2012-10-04 18:15:00 | 2012-10-04 18:23:01 | 1
2012-10-04 18:30:00 | 2012-10-04 18:30:00 | 1
2012-10-04 18:45:00 | 2012-10-04 18:55:51 | 3
2012-10-05 11:00:00 | 2012-10-05 11:01:01 | 1

The trick is to extract a unix epoch like @Michael already posted. Integer division lumps them together in buckets of the chosen resolution, because fractional digits are truncated.

I divide by 900, because 15 minutes = 900 seconds.

Multiply by the same number to get the resulting lower_bound.
Convert the unix epoch back to a timestamp with to_timestamp().

This works great for intervals that can be represented without fractional digits in the decimal system. For even more versatility use the often overlooked function width_bucket() like I demonstrate in this recent, closely related answer. More explanation, links and an sqlfiddle demo over there.

postgresql date_trunc to arbitrary precision?

There is no function you want, but as said in postgresql wiki you can define function for youself:

CREATE OR REPLACE FUNCTION round_time_10m(TIMESTAMP WITH TIME ZONE) 
RETURNS TIMESTAMP WITH TIME ZONE AS $$
SELECT date_trunc('hour', $1) + INTERVAL '10 min' * ROUND(date_part('minute', $1) / 10.0)
$$ LANGUAGE SQL;

Generally rounding up to $2 minutes:

CREATE OR REPLACE FUNCTION round_time_nm(TIMESTAMP WITH TIME ZONE, INTEGER) 
RETURNS TIMESTAMP WITH TIME ZONE AS $$
SELECT date_trunc('hour', $1) + ($2 || ' min')::INTERVAL * ROUND(date_part('minute', $1) / $2)
$$ LANGUAGE SQL;

Extract 30 minutes from timestamp and group it by 30 mins time interval -PGSQL

You can change the column on which you're aggregating to use the minute too:

select
count(*) as logged_users,
CONCAT(EXTRACT(hour from login_time::timestamp), '-', CASE WHEN EXTRACT(minute from login_time::timestamp) < 30 THEN 0 ELSE 30 END) as HalfHour
from loginhistory
where login_time::date = '2021-04-21'
group by HalfHour
order by HalfHour;

calculating average with grouping based on time intervals

Simple and fast solution for this particular example:

SELECT date_trunc('minute', ts) AS minute
, sum(speed)/6 AS avg_speed
FROM speed_table AS t
WHERE ts >= '2014-06-21 0:0'
AND ts < '2014-06-20 0:0' -- exclude dangling corner case
AND condition2 = 'something'
GROUP BY 1
ORDER BY 1;

You need to factor in missing rows as "0 speed". Since a minute has 6 samples, just sum and divide by 6. Missing rows evaluate to 0 implicitly.

This returns no row for minutes with no rows at all.avg_speed for missing result rows is 0.

General query for arbitrary intervals

Works for all any interval listed in the manual for date_trunc():

SELECT date_trunc('minute', g.ts) AS ts_start
, avg(COALESCE(speed, 0)) AS avg_speed
FROM (SELECT generate_series('2014-06-21 0:0'::timestamp
, '2014-06-22 0:0'::timestamp
, '10 sec'::interval) AS ts) g
LEFT JOIN speed_table t USING (ts)
WHERE (t.condition2 = 'something' OR
t.condition2 IS NULL) -- depends on actual condition!
AND g.ts <> '2014-06-22 0:0'::timestamp -- exclude dangling corner case
GROUP BY 1
ORDER BY 1;

The problematic part is the additional unknown condition. You would need to define that. And decide whether missing rows supplied by generate_series should pass the test or not (which can be tricky!).

I let them pass in my example (and all other rows with a NULL values).

Compare:

PostgreSQL: running count of rows for a query 'by minute'

Arbitrary intervals:

Truncate timestamp to arbitrary intervals

For completely arbitrary intervals consider @Clodoaldo's math based on epoch values or use the often overlooked function width_bucket(). Example:

Aggregating (x,y) coordinate point clouds in PostgreSQL
Aggregating (x,y) coordinate point clouds in PostgreSQL

custom DATE_TRUNC timeframes

A little painful, but you can do:

select (date_trunc('day', user.created_at) + 
floor(extract(hour from user.created_at) / 3) * interval '3 hour'
)

How to round timestamp to 10 minutes in Spark 3.0?

Convert the timestamp into seconds using unix_timestamp function, then perform the rounding by dividing by 600 (10 minutes), round the result of division and multiply by 600 again:

val df = Seq(
("2022-01-21 22:11:11"),
("2022-01-21 22:04:04"),
("2022-01-21 22:19:34"),
("2022-01-21 22:57:14")
).toDF("my_col").withColumn("my_col", to_timestamp($"my_col"))

df.withColumn(
"my_col_rounded",
from_unixtime(round(unix_timestamp($"my_col") / 600) * 600)
).show

//+-------------------+-------------------+
//|my_col |my_col_rounded |
//+-------------------+-------------------+
//|2022-01-21 22:11:11|2022-01-21 22:10:00|
//|2022-01-21 22:04:04|2022-01-21 22:00:00|
//|2022-01-21 22:19:34|2022-01-21 22:20:00|
//|2022-01-21 22:57:14|2022-01-21 23:00:00|
//+-------------------+-------------------+

You can also truncate the original timestamp to hours, get the minutes that your round to 10 and add them to truncated timestamp using interval:

df.withColumn(
"my_col_rounded",
date_trunc("hour", $"my_col") + format_string(
"interval %s minute",
expr("round(extract(MINUTE FROM my_col)/10.0)*10")
).cast("interval")
)

How to round to nearest X minutes with PL/pgSQL?

Instead of adding or subtracting

_minutes * interval '1 minute'

you should be subtracting

(_minutes % _nearest) * interval '1 minute'

or adding

(_nearest - (_minutes % _nearest)) * interval '1 minute'



Related Topics



Leave a reply



Submit