Why Can Pl/Pgsql Functions Have Side Effect, While SQL Functions Can'T

Why can PL/pgSQL functions have side effect, while SQL functions can't?

You bolded the key sentence in the manual yourself:

The entire body of a SQL function is parsed before any of it is executed.

Also read about The Parser Stage in the manual.

It consists of two major parts: the parser and the transformation process. Quoting the manual:

the transformation process takes the tree handed back by the parser as
input and does the semantic interpretation needed to understand which
tables, functions, and operators are referenced by the query.

If an SQL function contains these commands:

CREATE TABLE foo (...);
INSERT INTO foo VALUES(...);

Both statements are planned at virtually the same time (based on the same snapshot of the system catalogs). Hence, the INSERT cannot see the table "foo" presumably created with the previous CREATE command. That creates one of the following problems:

If there is no other table named "foo" in your search_patch (yet), Postgres complains when trying to create the function:
```
ERROR:  relation "foo" does not exist
```
If another table named "foo" already exists in your search_patch (and you don't use conflicting column names), Postgres will plan the INSERT based on that pre-existing table. Typically that results in an error at execution time, if any values cause conflicts in the (wrong!) table. Or, with some bad luck, it might even write to that table without error message! Very sneaky bug.

That cannot happen with a PL/pgSQL function, because it treats SQL commands like prepared statements, planned and executed sequentially. So each statement can see objects created in previous statements.

Consequently, statements that are never visited are never even planned - unlike with SQL functions. And the execution plan for statements can be cached within the same session - also unlike SQL functions. Read details about plan caching in PL/pgSQL functions in the manual here.

Each approach has advantages for some use cases. Further reading:

Difference between language sql and language plpgsql in PostgreSQL functions

In PostgreSQL, do PL/pgSQL and SQL have different CREATE FUNCTION commands?

The cited definition of CREATE FUNCTION is common for all languages:

lang_name

The name of the language that the function is implemented in. It can be sql, c, internal, or the name of a user-defined procedural language, e.g. plpgsql.

Make faulty PL/pgSQL functions fail during creation (not on first call)

You can't. Not with current PL/pgSQL. It does not check identifiers at compile time.

Call the function once to check. Do it in a transaction and roll back to avoid side effects:

BEGIN;
SELECT foo();
ROLLBACK;

There have been attempts to provide a stricter variant, mostly by Pavel Stehule, but I don't know how mature they are.

PLPSM (announced May 2011)
plpgsql lint
plpgsql_check (anounced Dec. 2013)

How to use a temp sequence within a Postgresql function

Answer to question

The reason is that SQL functions (LANGUAGE sql) are parsed and planned as one. All objects used must exist before the function runs.

You can switch to PL/pgSQL, (LANGUAGE plpgsql) which plans each statement on demand. There you can create objects and use them in the next command.

See:

Why can PL/pgSQL functions have side effect, while SQL functions can't?

Since you are not returning anything, consider a PROCEDURE. (FUNCTION works, too.)

CREATE OR REPLACE PROCEDURE reindex_ids(IN bigint)
  LANGUAGE plpgsql AS
$proc$
BEGIN
   IF EXISTS ( SELECT FROM pg_catalog.pg_class
               WHERE  relname = 'id_seq_temp'
               AND    relnamespace = pg_my_temp_schema()
               AND    relkind = 'S') THEN
      ALTER SEQUENCE id_seq_temp RESTART;
   ELSE
      CREATE TEMP SEQUENCE id_seq_temp;
   END IF;

    UPDATE things SET id = id + 2000 WHERE group_id = $1;
    UPDATE things SET id = nextval('id_seq_temp') WHERE group_id = $1;
END
$proc$;

Call:

CALL reindex_ids(123);

This creates your temp sequence if it does not exist already.

If the sequence exists, it is reset. (Remember that temporary objects live for the duration of a session.)

In the unlikely event that some other object occupies the name, an exception is raised.

Alternative solutions

Solution 1

This usually works:

UPDATE things t
SET    id = t1.new_id
FROM  (
   SELECT pk_id, row_number() OVER (ORDER BY id) AS new_id
   FROM   things
   WHERE  group_id = $1     -- your input here
   ) t1
WHERE  t.pk_id = t1.pk_id;

And only updates each row once, so half the cost.

Replace pk_id with your PRIMARY KEY column, or any UNIQUE NOT NULL (combination of) column(s).

The trick is that the UPDATE typically processes rows according to the sort order of the subquery in the FROM clause. Updating in ascending order should never hit a duplicate key violation.

And the ORDER BY clause of the window function row_number() imposes that sort order on the resulting set. That's an undocumented implementation detail, so you might want to add an explicit ORDER BY to the subquery. But since the behavior of UPDATE is undocumented anyway, it still depends on an implementation detail.

You can wrap that into a plain SQL function.

Solution 2

Consider not doing what you are doing at all. Gaps in sequential numbers are typically expected and not a problem. Just live with it. See:

Serial numbers per group of rows for compound key

Difference between language sql and language plpgsql in PostgreSQL functions

SQL functions

... are the better choice:

For simple scalar queries. Not much to plan, better save any overhead.
For single (or very few) calls per session. Nothing to gain from plan caching via prepared statements that PL/pgSQL has to offer. See below.
If they are typically called in the context of bigger queries and are simple enough to be inlined.
For lack of experience with any procedural language like PL/pgSQL. Many know SQL well and that's about all you need for SQL functions. Few can say the same about PL/pgSQL. (Though it's rather simple.)
A bit shorter code. No block overhead.

PL/pgSQL functions

... are the better choice:

When you need any procedural elements or variables that are not available in SQL functions, obviously.
For any kind of dynamic SQL, where you build and EXECUTE statements dynamically. Special care is needed to avoid SQL injection. More details:
- Postgres functions vs prepared queries
When you have computations that can be reused in several places and a CTE can't be stretched for the purpose. In an SQL function you don't have variables and would be forced to compute repeatedly or write to a table. This related answer on dba.SE has side-by-side code examples for solving the same problem using an SQL function / a plpgsql function / a query with CTEs:
- How to pass a parameter into a function

Assignments are somewhat more expensive than in other procedural languages. Adapt a programming style that doesn't use more assignments than necessary.

When a function cannot be inlined and is called repeatedly. Unlike with SQL functions, query plans can be cached for all SQL statements inside a PL/pgSQL functions; they are treated like prepared statements, the plan is cached for repeated calls within the same session (if Postgres expects the cached (generic) plan to perform better than re-planning every time. That's the reason why PL/pgSQL functions are typically faster after the first couple of calls in such cases.
Here is a thread on pgsql-performance discussing some of these items:
- Re: pl/pgsql functions outperforming sql ones?
When you need to trap errors.
For trigger functions.
When including DDL statements changing objects or altering system catalogs in any way relevant to subsequent commands - because all statements in SQL functions are parsed at once while PL/pgSQL functions plan and execute each statement sequentially (like a prepared statement). See:
- Why can PL/pgSQL functions have side effect, while SQL functions can't?

Also consider:

PostgreSQL Stored Procedure Performance

To actually return from a PL/pgSQL function, you could write:

CREATE FUNCTION f2(istr varchar)
  RETURNS text AS
$func$
BEGIN
   RETURN 'hello! ';  -- defaults to type text anyway
END
$func$ LANGUAGE plpgsql;

There are other ways:

Can I make a plpgsql function return an integer without using a variable?
The manual on "Returning From a Function"

Can I create and access a table in the same SQL function?

I think so it is not possible - and minimally it should not by possible in future versions. SQL functions are similar to views, and then references to database object should be valid in function's creating time.

There is not any workaround - if you need temp table, use PLpgSQL, or try to write your code without temp table (it can be much better).

PL/pgSQL perform vs execute

PERFORM is plpgsql command used for calls of void functions. PLpgSQL is careful about useless SELECT statements - the SELECT without INTO clause is not allowed. But sometimes you need to call a function and you don't need to store result (or functions has no result). The function in SQL is called with SELECT statement. But it is not possible in PLpgSQL - so the command PERFORM was introduced.

CREATE OR REPLACE FUNCTION foo()
RETURNS void AS $$
BEGIN
  RAISE NOTICE 'Hello from void function';
END;
$$ LANGUAGE plpgsql;

-- direct call from SQL
SELECT foo();

-- in PLpgSQL
DO $$
BEGIN
  SELECT foo(); -- is not allowed
  PERFORM foo(); -- is ok
END;
$$;

The PERFORM statements execute a parameter and forgot result.

Your example perform 'create table foo as (select 1)';

is same like SELECT 'create table foo as (select 1)'. It returns a string "create table foo as (select 1)" and this string is discarded.

The EXECUTE statement evaluate a expression to get string. In next step this string is executed.

So EXECUTE 'create table ' || some_var || '(a int)'; has two steps

evaluate expression 'create table ' || some_var || '(a int)'
if some_var is mytab for example, then execute a command create table mytab(a int)

The PERFORM statement is used for function calls, when functions are not used in assignment statement. The EXECUTE is used for evaluation of dynamic SQL - when a form of SQL command is known in runtime.

What is the difference between prepared statements and SQL or PL/pgSQL functions, in terms of their purpose?

All three "work the same" in that they execute the simple SQL statement:

INSERT INTO foo VALUES (3, 'ben');

The prepared statement is only good for a single prepared SQL statement (as the name suggests). And only DML commands. The manual:

Any SELECT, INSERT, UPDATE, DELETE, or VALUES statement.

The function can contain any number of statements. DML and DDL. Only SQL for SQL functions. Plus some non-SQL procedural elements in PL/pgSQL.

The prepared statement is only visible inside the same session and gone at the end of the session, while the function persists and is visible to all - still only usable for those with the EXECUTE privilege.

The prepared statement is encumbered with the least overhead. (Not much difference.)

The SQL function is the only one of the three that cannot save the query plan (by itself). Read details about plan caching in PL/pgSQL functions in the manual here.

The SQL function is also the only one that could be inlined when used within a bigger query. (Not with an INSERT, though.)

A rather comprehensive list of differences between SQL and PL/pgSQL functions:

Difference between language sql and language plpgsql in PostgreSQL functions

Starting with Postgres 11 there are also SQL procedures:

When to use stored procedure / user-defined function?

PostgreSQL functions returning void

(I'm no expert in this source code. You've been warned.)

The source is online here. I've omitted the filenames; you can search for the function
names to find their definitions. I left the line numbers (usually) because it's easier to cut and paste, and different line numbers will mean the source has changed.

The short story is that some "void" returns are probably empty cstrings (empty null-terminated strings), and others are null pointers.

Here are the parts of the source that look relevant.

00228 /*
00229  * void_out     - output routine for pseudo-type VOID.
00230  *
00231  * We allow this so that "SELECT function_returning_void(...)" works.
00232  */
00233 Datum
00234 void_out(PG_FUNCTION_ARGS)
00235 {
00236     PG_RETURN_CSTRING(pstrdup(""));
00237 }

00251 /*
00252  * void_send    - binary output routine for pseudo-type VOID.
00253  *
00254  * We allow this so that "SELECT function_returning_void(...)" works
00255  * even when binary output is requested.
00256  */
00257 Datum
00258 void_send(PG_FUNCTION_ARGS)
00259 {
00260     StringInfoData buf;
00261 
00262     /* send an empty string */
00263     pq_begintypsend(&buf);
00264     PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
00265 }

We also have

00285 /* To return a NULL do this: */
00286 #define PG_RETURN_NULL()  \
00287     do { fcinfo->isnull = true; return (Datum) 0; } while (0)
00288 
00289 /* A few internal functions return void (which is not the same as NULL!) */
00290 #define PG_RETURN_VOID()     return (Datum) 0

So it makes sense to me that a user-defined function that returns through PG_RETURN_VOID() would not test equivalent to one that returns through void_out() or void_send(). I don't yet know why that is, but I have to stop and get some sleep.

postgresql functions - execute 'create table...' - unexpected results

Would work like this:

CREATE OR REPLACE FUNCTION qwert(_tbl text, cols text[])
  RETURNS numeric AS
$func$
BEGIN

EXECUTE format('
     DROP TABLE IF EXISTS %1$I;
     CREATE TEMPORARY TABLE %1$I AS 
     SELECT %2$s AS col_sum FROM src;'
   ,_tbl
   ,(SELECT string_agg(quote_ident(i), ' + ') FROM unnest(cols) i)
    );

RETURN 1;  -- still unclear? Add yourself ...
END
$func$ LANGUAGE PLPGSQL;

Call:

SELECT qwert('t1', ARRAY['x1','x2']);

Or:

SELECT qwert('t1', '{x1,x2}');

format() requires Postgres 9.1 or later.

I use a text parameters for the temp table name and an array of text for the column names, then build the expression with a combination of unnest(), quote_ident() and string_agg(). Don't forget to name the column (col_sum in my ex.).

Details about sanitizing values for use as identifiers in this related answer on dba.SE. You can pass any number of columns this way.

Why Can Pl/Pgsql Functions Have Side Effect, While SQL Functions Can'T