Refresh Materialized View Periodically Postgres

refresh materialized view periodically postgres

the best way is to execute periodically a script that does the task:

the script is:

#!/bin/sh
psql -U user_name -d database_instance_name -c 'refresh materialized view view_name'

and add an entry in the crontab like:

@hourly  /full_path/script_name.sh

How can I ensure that a materialized view is always up to date?

I'll need to invoke REFRESH MATERIALIZED VIEW on each change to the tables involved, right?

Yes, PostgreSQL by itself will never call it automatically, you need to do it some way.

How should I go about doing this?

Many ways to achieve this. Before giving some examples, keep in mind that REFRESH MATERIALIZED VIEW command does block the view in AccessExclusive mode, so while it is working, you can't even do SELECT on the table.

Although, if you are in version 9.4 or newer, you can give it the CONCURRENTLY option:

REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv;

This will acquire an ExclusiveLock, and will not block SELECT queries, but may have a bigger overhead (depends on the amount of data changed, if few rows have changed, then it might be faster). Although you still can't run two REFRESH commands concurrently.

Refresh manually

It is an option to consider. Specially in cases of data loading or batch updates (e.g. a system that only loads tons of information/data after long periods of time) it is common to have operations at end to modify or process the data, so you can simple include a REFRESH operation in the end of it.

Scheduling the REFRESH operation

The first and widely used option is to use some scheduling system to invoke the refresh, for instance, you could configure the like in a cron job:

*/30 * * * * psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv"

And then your materialized view will be refreshed at each 30 minutes.

Considerations

This option is really good, specially with CONCURRENTLY option, but only if you can accept the data not being 100% up to date all the time. Keep in mind, that even with or without CONCURRENTLY, the REFRESH command does need to run the entire query, so you have to take the time needed to run the inner query before considering the time to schedule the REFRESH.

Refreshing with a trigger

Another option is to call the REFRESH MATERIALIZED VIEW in a trigger function, like this:

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv;
RETURN NULL;
END;
$$;

Then, in any table that involves changes on the view, you do:

CREATE TRIGGER tg_refresh_my_mv AFTER INSERT OR UPDATE OR DELETE
ON table_name
FOR EACH STATEMENT EXECUTE PROCEDURE tg_refresh_my_mv();

Considerations

It has some critical pitfalls for performance and concurrency:

  1. Any INSERT/UPDATE/DELETE operation will have to execute the query (which is possible slow if you are considering MV);
  2. Even with CONCURRENTLY, one REFRESH still blocks another one, so any INSERT/UPDATE/DELETE on the involved tables will be serialized.

The only situation I can think that as a good idea is if the changes are really rare.

Refresh using LISTEN/NOTIFY

The problem with the previous option is that it is synchronous and impose a big overhead at each operation. To ameliorate that, you can use a trigger like before, but that only calls a NOTIFY operation:

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
NOTIFY refresh_mv, 'my_mv';
RETURN NULL;
END;
$$;

So then you can build an application that keep connected and uses LISTEN operation to identify the need to call REFRESH. One nice project that you can use to test this is pgsidekick, with this project you can use shell script to do LISTEN, so you can schedule the REFRESH as:

pglisten --listen=refresh_mv --print0 | xargs -0 -n1 -I? psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY ?;"

Or use pglater (also inside pgsidekick) to make sure you don't call REFRESH very often. For example, you can use the following trigger to make it REFRESH, but within 1 minute (60 seconds):

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
NOTIFY refresh_mv, '60 REFRESH MATERIALIZED VIEW CONCURRENLTY my_mv';
RETURN NULL;
END;
$$;

So it will not call REFRESH in less the 60 seconds apart, and also if you NOTIFY many times in less than 60 seconds, the REFRESH will be triggered only once.

Considerations

As the cron option, this one also is good only if you can bare with a little stale data, but this has the advantage that the REFRESH is called only when really needed, so you have less overhead, and also the data is updated more closer to when needed.

OBS: I haven't really tried the codes and examples yet, so if someone finds a mistake, typo or tries it and works (or not), please let me know.

Refresh a materialized view automatically using a rule or notify

PostgreSQL 9.4 added REFRESH CONCURRENTLY to Materialized Views.

This may be what you're looking for when you describe trying to setup an asynchronous update of the materialized view.

Users selecting from the materialized view will see incorrect data until the refresh finishes, but in many scenarios that use a materialized view, this is an acceptable tradeoff.

Use a statement level trigger that watches the underlying tables for any changes and then refreshes the materialized view concurrently.

Check last refreshed time for materialized view

I don't think there is anything built in the system that provides this as of 9.3.4. When I need to provide the date of last refresh I add a column called 'last_refresh' to the select query in the materialized view since data in the materialized view won't change until it is refreshed.

I also prefer this for security reasons as you may not want to give the sql user access to the system tables, if the information is being stored there.

Depending if you need the time, you can use either:

  1. CURRENT_DATE
  2. now()

Just date:

CREATE MATERIALIZED VIEW mv_address AS 
SELECT *, CURRENT_DATE AS last_refresh FROM address;

With date and time:

CREATE MATERIALIZED VIEW mv_address AS 
SELECT *, now() AS last_refresh FROM address;

Update 2017-02-17:

PostgreSQL version 9.4+ now includes CONCURRENTLY option. If you use REFRESH MATERIALIZED VIEW CONCURRENTLY option be aware of what @Smudge indicated in the comments. This would really only be an issue for large and frequently updated data sets. If your data set is small or infrequently updated then you should be fine.

How to refresh a materialized view as soon as the day gets over in postgres

As the documentation states:

https://www.postgresql.org/docs/current/sql-creatematerializedview.html

CREATE MATERIALIZED VIEW defines a materialized view of a query. The query is executed and used to populate the view at the time the command is issued (unless WITH NO DATA is used) and may be refreshed later using REFRESH MATERIALIZED VIEW.

So in order to refresh a Materialized View you will need:

https://www.postgresql.org/docs/current/sql-refreshmaterializedview.html

REFRESH MATERIALIZED VIEW

In order to to this periodically you will need some kind of cron implementation.

A Postgresql "inhouse" solution could be: https://github.com/citusdata/pg_cron

Postgres Materialized View Auto Refresh

As of now, it's not available, you can use the following command to refresh it whenever you make changes to the base tables:
REFRESH MATERIALIZED VIEW MY_NAME

Postgres: How to auto refresh materialized views per interval

It will have to be done in an external process. As of Postgres 9.3 doesn't offer a way to have the system refresh materialized views itself on a scheduled basis.

Postgres Materialized view refresh failure

REFRESH MATERIALIZED VIEW respects transactional guarantees just like other SQL statements. So yes, if the statement fails, the materialized view will be unchanged.

Dokku + Postgres : How to refresh a materialized view?

I couldn't find a command to do so. I ended up calling a shell script in cron, that triggers a psql function.

crontab :

0 0 * * * /home/scripts/refreshMaterializedViews.sh

refreshMaterializedViews.sh :

echo 'select RefreshAllMaterializedViews();' | dokku postgres:connect my-postgres-db

Where RefreshAllMaterializedViews() is a psql function to refresh all materialized views at once. I'll add it here for simplicity, but it's already all over S.O.

Psql function :

CREATE OR REPLACE FUNCTION RefreshAllMaterializedViews(schema_arg TEXT DEFAULT 'public')
RETURNS INT AS $$
DECLARE
r RECORD;
BEGIN
RAISE NOTICE 'Refreshing materialized view in schema %', schema_arg;
FOR r IN SELECT matviewname FROM pg_matviews WHERE schemaname = schema_arg
LOOP
RAISE NOTICE 'Refreshing %.%', schema_arg, r.matviewname;
EXECUTE 'REFRESH MATERIALIZED VIEW ' || schema_arg || '.' || r.matviewname;
END LOOP;

RETURN 1;
END
$$ LANGUAGE plpgsql;

Note that this function doesn't support 'concurrently'. Which means it will lock your Materialized views during the refresh. For other solutions, please refer to this S.O question.



Related Topics



Leave a reply



Submit