How to Create a Temporary Function in Postgresql

Create function with temporary tables that return a select query using these temp tables

The most probable error this could raise in Postgres:

ERROR: column "foo" specified more than once

Meaning, there is at least one more column name (other than id which is folded to one instance with the USING clause) included in both tables. This would not raise an exception in a plain SQL SELECT which tolerates duplicate output column names. But you cannot create a table with duplicate names.

The problem also applies for Greenplum (like you later declared), which is not Postgres. It was forked from PostgreSQL in 2005 and developed separately. The current Postgres manual hardly applies at all any more. Look to the Greenplum documentation.

And psql is just the standard PostgreSQL interactive terminal program. Obviously you are using the one shipped with PostgreSQL 8.2.15, but the RDBMS is still Greenplum, not Postgres.

Syntax fix (for Postgres, like you first tagged, still relevant):

CREATE OR REPLACE FUNCTION myfunction()
RETURNS TABLE (column_a text, column_b text) AS
$func$
BEGIN
CREATE TEMPORARY TABLE raw_data ON COMMIT DROP AS
SELECT d.column_a, d2.column_b -- explicit SELECT list avoids duplicate column names
FROM dummy_data d
JOIN dummy_data_2 d2 using (id);

RETURN QUERY
SELECT DISTINCT column_a, column_b
FROM raw_data
LIMIT 100;
END
$func$ LANGUAGE plpgsql SECURITY DEFINER;

The example wouldn't need a temp table - unless you access the temp table after the function call in the same transaction (ON COMMIT DROP). Else, a plain SQL function is better in every way. Syntax for Postgres and Greenplum:

CREATE OR REPLACE FUNCTION myfunction(OUT column_a text, OUT column_b text)
RETURNS SETOF record AS
$func$
SELECT DISTINCT d.column_a, d2.column_b
FROM dummy_data d
JOIN dummy_data_2 d2 using (id)
LIMIT 100;
$func$ LANGUAGE plpgsql SECURITY DEFINER;

Not least, it should also work for Greenplum.

The only remaining reason for this function is SECURITY DEFINER. Else you could just use the simple SQL statement (possibly as prepared statement) instead.

RETURN QUERY was added to PL/pgSQL with version 8.3 in 2008, some years after the fork of Greenplum. Might explain your error msg:

ERROR: RETURN cannot have a parameter in function returning set;
use RETURN NEXT at or near "QUERY" Position: 237

Aside: LIMIT without ORDER BY produces arbitrary results. I assume you are aware of that.


If for some reason you actually need temp tables and cannot upgrade to Greenplum 5.0 like A. Scherbaum suggested, you can still make it work in Greenplum 4.3.x (like in Postgres 8.2). Use a FOR loop in combination with RETURN NEXT.

Examples:

  • plpgsql error "RETURN NEXT cannot have a parameter in function with OUT parameters" in table-returning function
  • How to use `RETURN NEXT`in PL/pgSQL correctly?
  • Use of custom return types in a FOR loop in plpgsql

How to run plpgsql without creating a function?

An anonymous code block returns void. However, you can use a trick with a temporary table, e.g.

CREATE TEMP TABLE IF NOT EXISTS trace (name_short text, name_long text);

DO
$body$
DECLARE
v_name_short VARCHAR;
BEGIN

v_name_short := 'test Account 1';

INSERT INTO trace
SELECT
a.name_short,
a.name_long
FROM enterprise.account a
WHERE
CASE WHEN v_name_short IS NOT NULL THEN
LOWER(a.name_short) = LOWER(v_name_short)
ELSE
1 = 1
END;
END;
$body$
LANGUAGE 'plpgsql';

SELECT * FROM trace;
-- DROP TABLE trace;

With EXPLAIN ANALYSE you can analyse only a single plain sql query, not a function, a do block nor a script. So you can try:

EXPLAIN ANALYSE
SELECT
a.name_short,
a.name_long
FROM enterprise.account a
WHERE
CASE WHEN 'test Account 1' IS NOT NULL THEN
LOWER(a.name_short) = LOWER('test Account 1')
ELSE
1 = 1
END;

Note that in this case you cannot use the variable beacuse it won't be recognized by the planner, use the literal instead.

Stored function with temporary table in postgresql

AFAIK one can't declare tables as variables in postgres. What you can do is create one in your funcion body and use it thourough (or even outside of function). Beware though as temporary tables aren't dropped until the end of the session or commit.

The way to go is to use RETURN NEXT or RETURN QUERY

As for the function result type I always found RETURNS TABLE to be more readable.

edit:
Your cons_id array is innecessary, just iterate the values returned by select.
Also you can have multiple return query statements in a single function to append result of the query to the result returned by function.

In your case:

CREATE OR REPLACE FUNCTION getPumps(status varchar) 
RETURNS TABLE (objectid INTEGER,pump_id INTEGER,pump_serial_id INTEGER....)
AS
$$
BEGIN
FOR i in SELECT consumer_id FROM db_consumer_pump_details LOOP

RETURN QUERY(
SELECT objectid,pump_id,pump_serial_id,repdate,pumpmake,db_consumer_pump_details.status,db_consumer.consumer_name,db_consumer.wenexa_id,db_consumer.rr_no FROM db_consumer_pump_details INNER JOIN db_consumer ON db_consumer.consumer_id=db_consumer_pump_details.consumer_id
WHERE db_consumer_pump_details.consumer_id=i AND db_consumer_pump_details.status=$1
ORDER BY db_consumer_pump_details.consumer_id,pump_id,createddate DESC LIMIT 2
);
END LOOP;
END;
$$

edit2:

You probably want to take a look at this solution for groupwise-k-maximum problem as that's exactly what you're dealing with here.

How to use a temp sequence within a Postgresql function


Answer to question

The reason is that SQL functions (LANGUAGE sql) are parsed and planned as one. All objects used must exist before the function runs.

You can switch to PL/pgSQL, (LANGUAGE plpgsql) which plans each statement on demand. There you can create objects and use them in the next command.

See:

  • Why can PL/pgSQL functions have side effect, while SQL functions can't?

Since you are not returning anything, consider a PROCEDURE. (FUNCTION works, too.)

CREATE OR REPLACE PROCEDURE reindex_ids(IN bigint)
LANGUAGE plpgsql AS
$proc$
BEGIN
IF EXISTS ( SELECT FROM pg_catalog.pg_class
WHERE relname = 'id_seq_temp'
AND relnamespace = pg_my_temp_schema()
AND relkind = 'S') THEN
ALTER SEQUENCE id_seq_temp RESTART;
ELSE
CREATE TEMP SEQUENCE id_seq_temp;
END IF;

UPDATE things SET id = id + 2000 WHERE group_id = $1;
UPDATE things SET id = nextval('id_seq_temp') WHERE group_id = $1;
END
$proc$;

Call:

CALL reindex_ids(123);

This creates your temp sequence if it does not exist already.

If the sequence exists, it is reset. (Remember that temporary objects live for the duration of a session.)

In the unlikely event that some other object occupies the name, an exception is raised.

Alternative solutions

Solution 1

This usually works:

UPDATE things t
SET id = t1.new_id
FROM (
SELECT pk_id, row_number() OVER (ORDER BY id) AS new_id
FROM things
WHERE group_id = $1 -- your input here
) t1
WHERE t.pk_id = t1.pk_id;

And only updates each row once, so half the cost.

Replace pk_id with your PRIMARY KEY column, or any UNIQUE NOT NULL (combination of) column(s).

The trick is that the UPDATE typically processes rows according to the sort order of the subquery in the FROM clause. Updating in ascending order should never hit a duplicate key violation.

And the ORDER BY clause of the window function row_number() imposes that sort order on the resulting set. That's an undocumented implementation detail, so you might want to add an explicit ORDER BY to the subquery. But since the behavior of UPDATE is undocumented anyway, it still depends on an implementation detail.

You can wrap that into a plain SQL function.

Solution 2

Consider not doing what you are doing at all. Gaps in sequential numbers are typically expected and not a problem. Just live with it. See:

  • Serial numbers per group of rows for compound key


Related Topics



Leave a reply



Submit