Perform This Hours of Operation Query in Postgresql

Perform this hours of operation query in PostgreSQL


Table layout

Re-design the table to store opening hours (hours of operation) as a set of tsrange (range of timestamp without time zone) values. Requires Postgres 9.2 or later.

Pick a random week to stage your opening hours. I like the week:

1996-01-01 (Monday) to 1996-01-07 (Sunday)

That's the most recent leap year where Jan 1st conveniently happens to be a Monday. But it can be any random week for this case. Just be consistent.

Install the additional module btree_gist first:

CREATE EXTENSION btree_gist;

See:

  • Equivalent to exclusion constraint composed of integer and range

Then create the table like this:

CREATE TABLE hoo (
hoo_id serial PRIMARY KEY
, shop_id int NOT NULL -- REFERENCES shop(shop_id) -- reference to shop
, hours tsrange NOT NULL
, CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id with =, hours WITH &&)
, CONSTRAINT hoo_bounds_inclusive CHECK (lower_inc(hours) AND upper_inc(hours))
, CONSTRAINT hoo_standard_week CHECK (hours <@ tsrange '[1996-01-01 0:0, 1996-01-08 0:0]')
);

The one column hours replaces all of your columns:

opens_on, closes_on, opens_at, closes_at

For instance, hours of operation from Wednesday, 18:30 to Thursday, 05:00 UTC are entered as:

'[1996-01-03 18:30, 1996-01-04 05:00]'

The exclusion constraint hoo_no_overlap prevents overlapping entries per shop. It is implemented with a GiST index, which also happens to support our queries. Consider the chapter "Index and Performance" below discussing indexing strategies.

The check constraint hoo_bounds_inclusive enforces inclusive boundaries for your ranges, with two noteworthy consequences:

  • A point in time falling on lower or upper boundary exactly is always included.
  • Adjacent entries for the same shop are effectively disallowed. With inclusive bounds, those would "overlap" and the exclusion constraint would raise an exception. Adjacent entries must be merged into a single row instead. Except when they wrap around Sunday midnight, in which case they must be split into two rows. The function f_hoo_hours() below takes care of this.

The check constraint hoo_standard_week enforces the outer bounds of the staging week using the "range is contained by" operator <@.

With inclusive bounds, you have to observe a corner case where the time wraps around at Sunday midnight:

'1996-01-01 00:00+0' = '1996-01-08 00:00+0'
Mon 00:00 = Sun 24:00 (= next Mon 00:00)

You have to search for both timestamps at once. Here is a related case with exclusive upper bound that wouldn't exhibit this shortcoming:

  • Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL

Function f_hoo_time(timestamptz)

To "normalize" any given timestamp with time zone:

CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz)
RETURNS timestamp
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT timestamp '1996-01-01' + ($1 AT TIME ZONE 'UTC' - date_trunc('week', $1 AT TIME ZONE 'UTC'))
$func$;

PARALLEL SAFE only for Postgres 9.6 or later.

The function takes timestamptz and returns timestamp. It adds the elapsed interval of the respective week ($1 - date_trunc('week', $1) in UTC time to the starting point of our staging week. (date + interval produces timestamp.)

Function f_hoo_hours(timestamptz, timestamptz)

To normalize ranges and split those crossing Mon 00:00. This function takes any interval (as two timestamptz) and produces one or two normalized tsrange values. It covers any legal input and disallows the rest:

CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz, _to timestamptz)
RETURNS TABLE (hoo_hours tsrange)
LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE COST 500 ROWS 1 AS
$func$
DECLARE
ts_from timestamp := f_hoo_time(_from);
ts_to timestamp := f_hoo_time(_to);
BEGIN
-- sanity checks (optional)
IF _to <= _from THEN
RAISE EXCEPTION '%', '_to must be later than _from!';
ELSIF _to > _from + interval '1 week' THEN
RAISE EXCEPTION '%', 'Interval cannot span more than a week!';
END IF;

IF ts_from > ts_to THEN -- split range at Mon 00:00
RETURN QUERY
VALUES (tsrange('1996-01-01', ts_to , '[]'))
, (tsrange(ts_from, '1996-01-08', '[]'));
ELSE -- simple case: range in standard week
hoo_hours := tsrange(ts_from, ts_to, '[]');
RETURN NEXT;
END IF;

RETURN;
END
$func$;

To INSERT a single input row:

INSERT INTO hoo(shop_id, hours)
SELECT 123, f_hoo_hours('2016-01-11 00:00+04', '2016-01-11 08:00+04');

For any number of input rows:

INSERT INTO hoo(shop_id, hours)
SELECT id, f_hoo_hours(f, t)
FROM (
VALUES (7, timestamptz '2016-01-11 00:00+0', timestamptz '2016-01-11 08:00+0')
, (8, '2016-01-11 00:00+1', '2016-01-11 08:00+1')
) t(id, f, t);

Each can insert two rows if a range needs splitting at Mon 00:00 UTC.

Query

With the adjusted design, your whole big, complex, expensive query can be replaced with ... this:

SELECT *

FROM hoo

WHERE hours @> f_hoo_time(now());

For a little suspense I put a spoiler plate over the solution. Move the mouse over it.

The query is backed by said GiST index and fast, even for big tables.

db<>fiddle here (with more examples)

Old sqlfiddle

If you want to calculate total opening hours (per shop), here is a recipe:

  • Calculate working hours between 2 dates in PostgreSQL

Index and Performance

The containment operator for range types can be supported with a GiST or SP-GiST index. Either can be used to implement an exclusion constraint, but only GiST supports multicolumn indexes:

Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.

And the order of index columns matters:

A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values in additional columns.

So we have conflicting interests here. For big tables, there will be many more distinct values for shop_id than for hours.

  • A GiST index with leading shop_id is faster to write and to enforce the exclusion constraint.
  • But we are searching hours in our query. Having that column first would be better.
  • If we need to look up shop_id in other queries, a plain btree index is much faster for that.
  • To top it off, I found an SP-GiST index on just hours to be fastest for the query.

Benchmark

New test with Postgres 12 on an old laptop.
My script to generate dummy data:

INSERT INTO hoo(shop_id, hours)
SELECT id
, f_hoo_hours(((date '1996-01-01' + d) + interval '4h' + interval '15 min' * trunc(32 * random())) AT TIME ZONE 'UTC'
, ((date '1996-01-01' + d) + interval '12h' + interval '15 min' * trunc(64 * random() * random())) AT TIME ZONE 'UTC')
FROM generate_series(1, 30000) id
JOIN generate_series(0, 6) d ON random() > .33;

Results in ~ 141k randomly generated rows, ~ 30k distinct shop_id, ~ 12k distinct hours. Table size 8 MB.

I dropped and recreated the exclusion constraint:

ALTER TABLE hoo
DROP CONSTRAINT hoo_no_overlap
, ADD CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id WITH =, hours WITH &&); -- 3.5 sec; index 8 MB

ALTER TABLE hoo
DROP CONSTRAINT hoo_no_overlap
, ADD CONSTRAINT hoo_no_overlap EXCLUDE USING gist (hours WITH &&, shop_id WITH =); -- 13.6 sec; index 12 MB

shop_id first is ~ 4x faster for this distribution.

In addition, I tested two more for read performance:

CREATE INDEX hoo_hours_gist_idx   on hoo USING gist (hours);
CREATE INDEX hoo_hours_spgist_idx on hoo USING spgist (hours); -- !!

After VACUUM FULL ANALYZE hoo;, I ran two queries:

  • Q1: late night, finding only 35 rows
  • Q2: in the afternoon, finding 4547 rows.

Results

Got an index-only scan for each (except for "no index", of course):

index                 idx size  Q1        Q2
------------------------------------------------
no index 38.5 ms 38.5 ms
gist (shop_id, hours) 8MB 17.5 ms 18.4 ms
gist (hours, shop_id) 12MB 0.6 ms 3.4 ms
gist (hours) 11MB 0.3 ms 3.1 ms
spgist (hours) 9MB 0.7 ms 1.8 ms -- !
  • SP-GiST and GiST are on par for queries finding few results (GiST is even faster for very few).
  • SP-GiST scales better with a growing number of results, and is smaller, too.

If you read a lot more than you write (typical use case), keep the exclusion constraint as suggested at the outset and create an additional SP-GiST index to optimize read performance.

Postgres Query execution time

Use \timing as explained by "How can I time SQL-queries using psql?".

See also the manual for psql.

If you want server-side execution times that don't include the time to transfer the result to the client, you can set log_min_duration_statement = 0 in the configuration, then SET client_min_messages = log so you get the log info in the console.

You can also use EXPLAIN ANALYZE to get detailed execution timings. There's some timing overhead for this unless you use EXPLAIN (ANALYZE TRUE, TIMING FALSE), which is only in newer versions, and disables detailed timing to give only an aggregate execution time instead.

PgBadger, especially when combined with the auto_explain module, can provide useful aggregate statistics from log analysis.

Finally, there's pg_stat_statements, which can collect handy aggregate information on the running system.

Get execution time of PostgreSQL query

There are various ways to measure execution time, each has pros and cons. But whatever you do, some degree of the observer effect applies. I.e., measuring itself may distort the result.

1. EXPLAIN ANALYZE

You can prepend EXPLAIN ANALYZE, which reports the whole query plan with estimated costs actually measured times. The query is actually executed (with all side -effect, if any!). Works for all DDL commands and some others. See:

  • EXPLAIN ANALYZE not working with ALTER TABLE

To check whether my adapted version of your query is, in fact, faster:

EXPLAIN ANALYZE
SELECT DISTINCT born_on.name
FROM born_on b
WHERE date '2012-01-30' - b.dob <= (
SELECT max(d1.dod - b1.dob)
FROM born_on b1
JOIN died_on d1 USING (name) -- name must be unique!
)
AND NOT EXISTS (
SELECT FROM died_on d2
WHERE d2.name = b.name
);

Execute a couple of times to get more comparable times with warm cache. Several options are available to adjust the level of detail.

While mainly interested in total execution time, make it:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)

Mostly, TIMING matters - the manual:

TIMING

Include actual startup time and time spent in each node in the output.
The overhead of repeatedly reading the system clock can slow down the
query significantly on some systems, so it may be useful to set this
parameter to FALSE when only actual row counts, and not exact times,
are needed. Run time of the entire statement is always measured, even
when node-level timing is turned off with this option. [...]

EXPLAIN ANALYZE measures on the server, using server time from the server OS, excluding network latency. But EXPLAIN adds some overhead to also output the query plan.

2. psql with \timing

Or use \timing in psql. Like Peter demonstrates.

The manual:

\timing [ on | off ]

With a parameter, turns displaying of how long each SQL statement
takes on or off. Without a parameter, toggles the display between on
and off. The display is in milliseconds; intervals longer than 1
second are also shown in minutes:seconds format, with hours and days
fields added if needed.

Important difference: psql measures on the client using local time from the local OS, so the time includes network latency. This can be a negligible difference or huge depending on connection and volume of returned data.

3. Enable log_duration

This has probably the least overhead per measurement and produces the least distorted timings. But it's a little heavy-handed as you have to be superuser, have to adjust the server configuration, cannot just target the execution of a single query, and you have to read the server logs (unless you redirect to stdout).

The manual:

log_duration (boolean)

Causes the duration of every completed statement to be logged. The
default is off. Only superusers can change this setting.

For clients using extended query protocol, durations of the Parse,
Bind, and Execute steps are logged independently.

There are related settings like log_min_duration_statement.

4. Precise manual measurement with clock_timestamp()

The manual:

clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command.

filiprem provided a great way to get execution times for ad-hoc queries as exact as possible. On modern hardware, timing overhead should be insignificant but depending on the host OS it can vary wildly. Find out with the server application pg_test_timing.

Else you can mostly filter the overhead like this:

DO
$do$
DECLARE
_timing1 timestamptz;
_start_ts timestamptz;
_end_ts timestamptz;
_overhead numeric; -- in ms
_timing numeric; -- in ms
BEGIN
_timing1 := clock_timestamp();
_start_ts := clock_timestamp();
_end_ts := clock_timestamp();
-- take minimum duration as conservative estimate
_overhead := 1000 * extract(epoch FROM LEAST(_start_ts - _timing1
, _end_ts - _start_ts));

_start_ts := clock_timestamp();
PERFORM 1; -- your query here, replacing the outer SELECT with PERFORM
_end_ts := clock_timestamp();

-- RAISE NOTICE 'Timing overhead in ms = %', _overhead;
RAISE NOTICE 'Execution time in ms = %' , 1000 * (extract(epoch FROM _end_ts - _start_ts)) - _overhead;
END
$do$;

Take the time repeatedly (doing the bare minimum with 3 timestamps here) and pick the minimum interval as conservative estimate for timing overhead. Also, executing the function clock_timestamp() a couple of times should warm it up (in case that matters for your OS).

After measuring the execution time of the payload query, subtract that estimated overhead to get closer to the actual time.

Of course, it's more meaningful for cheap queries to loop 100000 times or execute it on a table with 100000 rows if you can, to make distracting noise insignificant.

How to extract hour from query in postgres

The following should work

select extract(hour from observationtime) from smartvakt_device_report

Measure the time it takes to execute a PostgreSQL query

Coming from an MSSQL background myself and now more often working in Postgres I feel your pain =)

The "trouble" with Postgres is that it supports only 'basic' SQL commands (SELECT, INSERT, UPDATE, CREATE, ALTER, etc...) but the moment you want to add logic (IF THEN, WHILE, variables, etc.) you need to switch to pl/pgsql which you can only use inside functions (AFAIK). From a TSQL POV there are quite some limitations and in fact, some things suddenly don't work anymore (or need to be done differently.. e.g. SELECT * INTO TEMPORARY TABLE tempTable FROM someTable will not work but CREATE TABLE tempTable AS SELECT * FROM someTable will)

Something I learned the hard way too is that CURRENT_TIMESTAMP (or Now()) will return the same value within a transaction. And since everything inside a function runs inside a transaction this means you have to use clock_timstamp()

Anyway, to answer your question, I think this should get you going:

CREATE OR REPLACE FUNCTION fn_test ( nbrOfIterations int)
RETURNS TABLE (iterations int, totalTime interval, secondsPerIteration int)
AS $$
DECLARE

i int;
startTime TIMESTAMP;
endTime TIMESTAMP;
dummy text;

BEGIN

i := 1;
startTime := clock_timestamp();

WHILE ( i <= nbrOfIterations) LOOP

-- your query here
-- (note: make sure to not return anything or you'll get an error)

-- example:
SELECT pg_sleep INTO dummy FROM pg_sleep(1);

i := i + 1;

END LOOP;

endTime := clock_timestamp();

iterations := nbrOfIterations;
totalTime := (endTime - startTime);
secondsPerIteration := (EXTRACT(EPOCH FROM endTime) - EXTRACT(EPOCH FROM startTime)) / iterations;


RETURN NEXT;

END;
$$ language plpgsql;


SELECT * FROM fn_test(5);

how to use \timing in postgres

You can use \timing only with the command line client psql, since this is a psql command.

It is a switch that turns execution time reporting on and off:

test=> \timing
Timing is on.
test=> SELECT 42;
┌──────────┐
│ ?column? │
├──────────┤
│ 42 │
└──────────┘
(1 row)

Time: 0.745 ms
test=> \timing
Timing is off.

PostgreSQL PL/pgSQL : query stored within a table (opening hours)

Not fully satisfied by my answer below, but it works the way I want, not the mysql low-tech way.
My work below is based on How to execute a string result of a stored procedure in postgres .

If it can helps, here it is:

-- push message to debug, to 'RAISE' usefull things
SET client_min_messages TO DEBUG;
\set VERBOSITY terse

-- must return a SETOF to evaluate my test (see RETURN QUERY EXECUTE below)
-- so here is a dirty simple [temporary] table.
CREATE TEMP TABLE stupid_bool_table (opened BOOLEAN);
INSERT INTO stupid_bool_table VALUES (true),(false);

CREATE OR REPLACE FUNCTION grab_worker_test_opening_hour(shopNametext)
RETURNS SETOF stupid_bool_table AS
$BODY$
DECLARE
-- $Id: batch_workers.psql,v 1.15 2018/07/25 08:08:49 calyopea Exp $
openhour text;
BEGIN

--TODO: materialized view refreshed each hours or halfs OR clever query
SELECT INTO openhour description
FROM shop_flat_table
WHERE shop_id IN (select id from workers where shop=shopName)
AND flat_txt='openhour';

IF ( NOT FOUND ) THEN

RAISE DEBUG 'opening_hour for % is null',shopName;
RETURN QUERY EXECUTE 'SELECT opened FROM stupid_bool_table WHERE opened=true'; -- by DEFAULT
-- RAISE EXCEPTION 'cant be here'; -- could be !

ELSE

RAISE DEBUG 'opening_hour for % is % (before replace)',shopName,openhour;

openhour:=REPLACE(openhour,'dow', extract(dow from NOW())::text);
openhour:=REPLACE(openhour,'hour',extract(hour from NOW())::text);

RAISE DEBUG 'opening_hour for % is % (after replace)',shopName,openhour;

RETURN QUERY EXECUTE 'SELECT opened FROM stupid_bool_table WHERE opened=' || openhour;

END IF;

END;
$BODY$
LANGUAGE plpgsql IMMUTABLE
COST 100;

So now:
with the data :

shop  | opening_hours
------+------------------------------------------------------
ShopA | ((dow between 1 and 5) and (hour between 9 and 16)))
ShowB | ((dow between 1 and 5) and (hour between 9 and 17)))

SELECT * FROM grab_worker_test_opening_hour('ShopB');
psql:batch_workers.psql:124: DEBUG: opening_hour for ShopB is ((dow between 1 and 5) and (hour between 9 and 17)) OR (dow in (6,7)) (before replace)
psql:batch_workers.psql:124: DEBUG: opening_hour for ShopB is ((3 between 1 and 5) and (17 between 9 and 17)) OR (3 in (6,7)) (after replace)
opened
--------
t
(1 ligne)

(and opened=f for shopA at the same time: 2018-07-25 17:15:00 (iso time)).

Query to find out the exact last hour data in Postgresql

The main issue is your GROUP BY statement. time is unique across all values, but so is time combined with data. The group needs to specify the common value that all items in the group share. We can do that by truncating the time to the hour, then group by that.

The following shows the averages for just the last full hour:

SELECT date_trunc('hour', time) as Hour, avg(data) as Avg
FROM exercise
WHERE time >= date_trunc('hour', now()) - INTERVAL '1 hour'
AND time < (date_trunc('hour', now()))
GROUP BY date_trunc('hour', time);

For all hours except the current partial hour just use this filter:

WHERE time < (date_trunc('hour', now()))

UPDATE:

The data example from OP has changed significantly.
The following fiddle demonstrates a fictious example dataset that is used for the explanation that follows:


https://www.db-fiddle.com/f/rqZ5rsgKFJBKufPsxmFzFX/3

Query #1

Show the averages of the data values for each hour of the day. We are using DATE_TRUNC to group the readings and compute the average, this implementation will consider the following arbitrary groupings:

  • Group 01:00:

    01:00, 01:01, 01:59
  • Group 02:00:

    02:00, 02:01, 02:59
SELECT date_trunc('hour', time) as Hour, avg(data) as Avg
FROM exercise
GROUP BY date_trunc('hour', time);


Leave a reply



Submit