How to Create a Cron Job to Run an Postgres SQL Function

Postgres with cron job doesn't work on get request

A test case:

SELECT cron.schedule('test', '* * * * *', $$SELECT 1$$);
schedule
----------
4
--Where 4 is job number

--In Postgres log
2022-03-14 11:18:00.001 PDT [63241] LOG: cron job 4 starting: SELECT 1
2022-03-14 11:18:00.007 PDT [63241] LOG: cron job 4 completed: 1 row
2022-03-14 11:19:00.002 PDT [63241] LOG: cron job 4 starting: SELECT 1
2022-03-14 11:19:00.008 PDT [63241] LOG: cron job 4 completed: 1 row
2022-03-14 11:20:00.002 PDT [63241] LOG: cron job 4 starting: SELECT 1
2022-03-14 11:20:00.010 PDT [63241] LOG: cron job 4 completed: 1 row

Nothing shows up in psql, which I assume is the console you are referring to.

So:

create table cron_test(fld_1 int, ts timestamptz  default now());

SELECT cron.schedule('test', '* * * * *', $$insert into cron_test(fld_1) select 1$$);
schedule
----------
4

--Postgres log
2022-03-14 11:22:00.001 PDT [63241] LOG: cron job 4 starting: insert into cron_test(fld_1) select 1
2022-03-14 11:22:00.008 PDT [63241] LOG: cron job 4 COMMAND completed: INSERT 0 1 1
2022-03-14 11:23:00.002 PDT [63241] LOG: cron job 4 starting: insert into cron_test(fld_1) select 1
2022-03-14 11:23:00.012 PDT [63241] LOG: cron job 4 COMMAND completed: INSERT 0 1 1
2022-03-14 11:24:00.001 PDT [63241] LOG: cron job 4 starting: insert into cron_test(fld_1) select 1
2022-03-14 11:24:00.006 PDT [63241] LOG: cron job 4 COMMAND completed: INSERT 0 1 1
2022-03-14 11:25:00.003 PDT [63241] LOG: cron job 4 starting: insert into cron_test(fld_1) select 1
2022-03-14 11:25:00.010 PDT [63241] LOG: cron job 4 COMMAND completed: INSERT 0 1 1


select * from cron_test ;
fld_1 | ts
-------+-------------------------------
1 | 2022-03-14 11:22:00.007153-07
1 | 2022-03-14 11:23:00.010206-07
1 | 2022-03-14 11:24:00.004967-07
1 | 2022-03-14 11:25:00.008177-07

If you want to see the results of a command you will need to put them somewhere you can retrieve from. The issue being that pg_cron opens a separate libpq connection to run the job. This means you will not see the results in the console you started the job in.

How to run Postgres pg_cron Job AFTER another Job?

Preface: how I understand problem

I hope that I understand the problem described by OP.

If I was wrong then it makes everything below invalid.

I suppose that it's about periodic night tasks heavy in CPU and/or IO.

E.g:

  • there are tasks A-C for archiving data
  • maybe task D-E for rebuilding aggregates / refreshing mat views
  • and finally task F that runs reindexing/analyze on whole DB

So it makes sense to run task F only after tasks A-E are finished.

Every task is needed to be run just once in a period of time:

  • once in a day or hour or week or only during weekends in a night time
  • it's better not to run in a time when server is under load

Does it fits with OP requirement - IDK.

For the sake of simplicity let's presume that each task runs only once in a night. It's easy to extend for other periods/requirements.

Data-driven approach

1. Add log table

E.g.

CREATE TABLE job_log (
log_id bigint,
job_name text,
log_date timestamptz
)

Tasks A-E

On start

For each job function do check:

IF  EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskA' # TaskB-TaskE for each functiont
AND log_date::DATE = NOW()::DATE # check that function already executed this night
) OR EXISTS(
SELECT 1 FROM pg_stat_activity
WHERE
query like 'SELECT * FROM jobA_function();' # check that job not executing right now
) THEN RETURN;
END IF;

It's possible that other conditions could be added: look for amount of connections, existence of locks and so on.

This way it will be guaranteed that function will not be executed more frequently than needed.

On finish

INSERT INTO job_log
SELECT
(SELECT MAX(log_id) FROM job_log) + 1 # or use sequences/other autoincrements
,'TaskA'
,NOW()

Cronjob schedule

The meaning of it becames different.

Now it's: "try to initiate execution of task".

It's safe to schedule it for every hour between a chosen period or even more frequently.

Cronjob cannot know if the server is under load or not, are there locks on a table or maybe somebody started execution of task manually.

Job function could be more smart in that.

Task F

Same as above but check on start looks for completion of other tasks.

E.g.

IF NOT EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskA'
AND log_date::DATE = NOW()::DATE
) OR NOT EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskB'
AND log_date::DATE = NOW()::DATE
)
.... # checks for completions of other tasks
OR EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskF' # TaskB-TaskE for each functiont
AND log_date::DATE = NOW()::DATE # check that function already executed this night
) OR EXISTS(
SELECT 1 FROM pg_stat_activity
WHERE
query like 'SELECT * FROM jobF_function();' # check that job not executing right now
) THEN RETURN;

On completion

Write to job_log the same as other functions.

UPDATE. Cronjob schedule

Create multiple schedule in cronjob.

E.g.

Let's say tasks A-E will run approximately 10-15 minutes.

And it's possible that one or two of them could work for 30-45-60 minutes.

Create a schedule for task F to attempt start every 5 minutes.

How that will work:

  • attempt 1: task A finished, other still working -> exit
  • attempt 2: task A-C finished -> exit
  • attempt 3: tasks A-E finished -> start task F
  • attempt 4: tasks A-E finished but in pg_stat_activity there is an executing task F -> exit
  • attempt 5: tasks A-E finished, pg_stat_activity is empty but in logs we see that task F already executed -> no need to work -> exit
  • ... all other attempts will be the same till next night

Summary

It's easy extend this approach for any requirements:

  • another periodicity
  • or make it unperiodic at all. E.g. make a table with trigger and start execution on change
  • dependencies of any depth and/or "fuzzy" dependencies
  • ... literally everything

Conception remains the same:

  • cronjob schedule means "try to run"
  • decision to run or not is data-driven

I would be glad to hear criticism of any kind - who knows maybe I'm overlooking something.

SQL: Store and query `cron` field

First of all you don't need to query every second because the cron has only a one minute resolution.

Next, comparing a cron scheduler expression to a timestamp is not a trivial task.
I'm not aware of any PostgreSQL module that would be able to parse the cron expressions.

There are two options, either you write your own function to do the comparison, or else you use a external library in the programming language you are using to do the comparison outside of the Database.

Here you will find an example implementation of such a function for Oracle that could easily be ported to PostgreSQL: SQL Query to convert cron expression to date/time format

It is incomplete because it doesn't handle complex expressions like */5 or 5,10,15 for individual fields of the cron expression but this is where I would start.

PostgreSQL script execution every night

I just placed an entry in my crontab:

*/3 * * * * /scripts/matviewsRefresh.sh 

This calls the script every three minutes, you can tune that.
And inside matviewsRefresh.sh:

  echo 'select matview_refresh_all();' | su - postgres -c "psql MYDBNAME"

Of course, matview_refresh_all is a pl/pgsql function that loops over all my materialized views and refresh the old ones (I added an auxiliary table that records the time of last refresh for each mview, and each one has a different refresh frequency)

Crontab for Postgresql failing with command not found

crontab has it's own PATH. If you want it to find psql you need to add PATH to top of crontab:

PATH=/usr/local/bin, etc

or give the full path to psql in your script.



Related Topics



Leave a reply



Submit