Refactor a Pl/Pgsql Function to Return the Output of Various Select Queries

Refactor a PL/pgSQL function to return the output of various SELECT queries

Dynamic SQL and RETURN type

(I saved the best for last, keep reading!)

You want to execute dynamic SQL. In principal, that's simple in plpgsql with the help of EXECUTE. You don't need a cursor. In fact, most of the time you are better off without explicit cursors.

The problem you run into: you want to return records of yet undefined type. A function needs to declare its return type in the RETURNS clause (or with OUT or INOUT parameters). In your case you would have to fall back to anonymous records, because number, names and types of returned columns vary. Like:

CREATE FUNCTION data_of(integer)
RETURNS SETOF record AS ...

However, this is not particularly useful. You have to provide a column definition list with every call. Like:

SELECT * FROM data_of(17)
AS foo (colum_name1 integer
, colum_name2 text
, colum_name3 real);

But how would you even do this, when you don't know the columns beforehand?

You could use less structured document data types like json, jsonb, hstore or xml. See:

  • How to store a data table in database?

But, for the purpose of this question, let's assume you want to return individual, correctly typed and named columns as much as possible.

Simple solution with fixed return type

The column datahora seems to be a given, I'll assume data type timestamp and that there are always two more columns with varying name and data type.

Names we'll abandon in favor of generic names in the return type.

Types we'll abandon, too, and cast all to text since every data type can be cast to text.

CREATE OR REPLACE FUNCTION data_of(_id integer)
RETURNS TABLE (datahora timestamp, col2 text, col3 text)
LANGUAGE plpgsql AS
$func$
DECLARE
_sensors text := 'col1::text, col2::text'; -- cast each col to text
_type text := 'foo';
BEGIN
RETURN QUERY EXECUTE '
SELECT datahora, ' || _sensors || '
FROM ' || quote_ident(_type) || '
WHERE id = $1
ORDER BY datahora'
USING _id;
END
$func$;

The variables _sensors and _type could be input parameters instead.

Note the RETURNS TABLE clause.

Note the use of RETURN QUERY EXECUTE. That is one of the more elegant ways to return rows from a dynamic query.

I use a name for the function parameter, just to make the USING clause of RETURN QUERY EXECUTE less confusing. $1 in the SQL-string does not refer to the function parameter but to the value passed with the USING clause. (Both happen to be $1 in their respective scope in this simple example.)

Note the example value for _sensors: each column is cast to type text.

This kind of code is very vulnerable to SQL injection. I use quote_ident() to protect against it. Lumping together a couple of column names in the variable _sensors prevents the use of quote_ident() (and is typically a bad idea!). Ensure that no bad stuff can be in there some other way, for instance by individually running the column names through quote_ident() instead. A VARIADIC parameter comes to mind ...

Simpler since PostgreSQL 9.1

With version 9.1 or later you can use format() to further simplify:

RETURN QUERY EXECUTE format('
SELECT datahora, %s -- identifier passed as unescaped string
FROM %I -- assuming the name is provided by user
WHERE id = $1
ORDER BY datahora'
,_sensors, _type)
USING _id;

Again, individual column names could be escaped properly and would be the clean way.

Variable number of columns sharing the same type

After your question updates it looks like your return type has

  • a variable number of columns
  • but all columns of the same type double precision (alias float8)

Use an ARRAY type in this case to nest a variable number of values. Additionally, I return an array with column names:

CREATE OR REPLACE FUNCTION data_of(_id integer)
RETURNS TABLE (datahora timestamp, names text[], values float8[])
LANGUAGE plpgsql AS
$func$
DECLARE
_sensors text := 'col1, col2, col3'; -- plain list of column names
_type text := 'foo';
BEGIN
RETURN QUERY EXECUTE format('
SELECT datahora
, string_to_array($1) -- AS names
, ARRAY[%s] -- AS values
FROM %s
WHERE id = $2
ORDER BY datahora'
, _sensors, _type)
USING _sensors, _id;
END
$func$;


Various complete table types

To actually return all columns of a table, there is a simple, powerful solution using a polymorphic type:

CREATE OR REPLACE FUNCTION data_of(_tbl_type anyelement, _id int)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format('
SELECT *
FROM %s -- pg_typeof returns regtype, quoted automatically
WHERE id = $1
ORDER BY datahora'
, pg_typeof(_tbl_type))
USING _id;
END
$func$;

Call (important!):

SELECT * FROM data_of(NULL::pcdmet, 17);

Replace pcdmet in the call with any other table name.

How does this work?

anyelement is a pseudo data type, a polymorphic type, a placeholder for any non-array data type. All occurrences of anyelement in the function evaluate to the same type provided at run time. By supplying a value of a defined type as argument to the function, we implicitly define the return type.

PostgreSQL automatically defines a row type (a composite data type) for every table created, so there is a well defined type for every table. This includes temporary tables, which is convenient for ad-hoc use.

Any type can be NULL. Hand in a NULL value, cast to the table type: NULL::pcdmet.

Now the function returns a well-defined row type and we can use SELECT * FROM data_of() to decompose the row and get individual columns.

pg_typeof(_tbl_type) returns the name of the table as object identifier type regtype. When automatically converted to text, identifiers are automatically double-quoted and schema-qualified if needed, defending against SQL injection automatically. This can even deal with schema-qualified table-names where quote_ident() would fail. See:

  • Table name as a PostgreSQL function parameter

How to return result of a SELECT inside a function in PostgreSQL?

Use RETURN QUERY:

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text -- also visible as OUT param in function body
, cnt bigint
, ratio bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, count(*) AS cnt -- column alias only visible in this query
, (count(*) * 100) / _max_tokens -- I added parentheses
FROM (
SELECT t.txt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
LIMIT _max_tokens
) t
GROUP BY t.txt
ORDER BY cnt DESC; -- potential ambiguity
END
$func$;

Call:

SELECT * FROM word_frequency(123);

Defining the return type explicitly is much more practical than returning a generic record. This way you don't have to provide a column definition list with every function call. RETURNS TABLE is one way to do that. There are others. Data types of OUT parameters have to match exactly what is returned by the query.

Choose names for OUT parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.

But note the potential naming conflict between the OUT parameter cnt and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over the OUT parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:

  1. Use the ordinal position of the item in the SELECT list: ORDER BY 2 DESC. Example:
    • Select first row in each GROUP BY group?
  2. Repeat the expression ORDER BY count(*).
  3. (Not required here.) Set the configuration parameter plpgsql.variable_conflict or use the special command #variable_conflict error | use_variable | use_column in the function. See:
    • Naming conflict between function parameter and result of JOIN with USING clause

Don't use "text" or "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt and cnt in my examples, you may want more explicit names.

Added a missing ; and corrected a syntax error in the header. (_max_tokens int), not (int maxTokens) - data type after name.

While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Or work with numeric or a floating point type. See below.

Alternative

This is what I think your query should actually look like (calculating a relative share per token):

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text
, abs_cnt bigint
, relative_share numeric)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt, t.cnt
, round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2) -- AS relative_share
FROM (
SELECT t.txt, count(*) AS cnt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
GROUP BY t.txt
ORDER BY cnt DESC
LIMIT _max_tokens
) t
ORDER BY t.cnt DESC;
END
$func$;

The expression sum(t.cnt) OVER () is a window function. You could use a CTE instead of the subquery. Pretty, but a subquery is typically cheaper in simple cases like this one (mostly before Postgres 12).

A final explicit RETURN statement is not required (but allowed) when working with OUT parameters or RETURNS TABLE (which makes implicit use of OUT parameters).

round() with two parameters only works for numeric types. count() in the subquery produces a bigint result and a sum() over this bigint produces a numeric result, thus we deal with a numeric number automatically and everything just falls into place.

A PL/pgSQL function with varying return type (and varying inner query)

I have two comments to the first solution.

First, use or %I or quote_ident() in format() function, not both. Compare:

with q(s) as (
values ('abba'), ('ABBA')
)
select
quote_ident(s) ok1,
format('%I', s) ok2,
format('%I', quote_ident(s)) bad_idea
from q;

ok1 | ok2 | bad_idea
--------+--------+------------
abba | abba | abba
"ABBA" | "ABBA" | """ABBA"""
(2 rows)

Second, you do not need two functions:

create or replace function my_select(into_table text, tbl text, order_by text default null)
returns void as $function$
declare
q text;
begin
q := 'select * from ' || quote_ident(tbl);
if order_by is not null then
q := q || ' order by ' || order_by;
end if;
execute format($$
create temp table %I
on commit drop
as %s
$$, into_table, q);
end
$function$ language plpgsql;

begin;
select my_select('my_tmp', 'my_data1', 'id');
select * from my_tmp;
commit;

BEGIN
my_select
-----------

(1 row)

id | val
----+-----
1 | a
2 | c
3 | d
4 | b
(4 rows)

COMMIT

In this particular case, the second solution is better.
A temporary table is not particularly expensive, but still unnecessary.
The cost will be the more important the more data in the table.
If you have a good alternative to create a temporary table, use it.
Besides, the need to include the function call and the select query in a transaction can be a bit cumbersome in some cases.

The second solution is smart and is ideally suited to the task at hand.

How to return diferent table data based on an ID passed to an SQL function

This is generally impossible with SQL functions. Even with a polymorphic return type, the actual return type must be determined at call time. But all statements in an SQL function are planned before the function is executed. So you'd always end up with an error message for one of the SELECT statements returning data that doesn't fit the return type.

The same can be done with dynamic SQL in a PL/pgSQL function - with some trickery:

CREATE OR REPLACE FUNCTION f_demo(_tabletype anyelement, _id int)
RETURNS SETOF anyelement LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
format('SELECT * FROM %s WHERE id = $1', pg_typeof(_tabletype))
USING _id;
END
$func$;

Call (important!):

SELECT * FROM f_demo(null::schemaZ.Table_1, 1);

The "trick" is to cast a null value to the desired table type, thereby defining the return type and choosing from which table to select. Detailed explanation:

  • Refactor a PL/pgSQL function to return the output of various SELECT queries

Take this as proof of concept. Typically, there are better (safer, less confusing, more performant) solutions ...

Related:

  • Difference between language sql and language plpgsql in PostgreSQL functions

returning a query result from dynamic query

In your case smth like:

t=# CREATE OR REPLACE FUNCTION data_of(_tbl_type anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE format('
SELECT *
FROM %s -- pg_typeof returns regtype, quoted automatically
WHERE true /*or some filter in additional arguments or so */
ORDER BY true /*or some filter in additional arguments or so */'
, pg_typeof(_tbl_type))
;
END
$func$ LANGUAGE plpgsql;
CREATE FUNCTION

would work:

t=# create table city(i serial, cn text);
CREATE TABLE
t=# insert into city(cn) values ('Moscow'), ('Barcelona'),('Visaginas');
INSERT 0 3
t=# SELECT * FROM data_of(NULL::city);
i | cn
---+-----------
1 | Moscow
2 | Barcelona
3 | Visaginas
(3 rows)

all credits to Erwin with his https://stackoverflow.com/a/11751557/5315974 which is one obligatory reading

How to define a function returning one of multiple data types?

SQL is strictly typed. The simple solution is to return one text field. Every value can be cast to text. (You might add a second return column holding the type name as meta information.)

Alternatively, return four typed fields (varchar, int, uuid, timestamp), and only fill one of them, depending on the result.

A polymorphic function would only be an option if you can tell the return type at call time and pass typed input accordingly.

Related:

  • Refactor a PL/pgSQL function to return the output of various SELECT queries


Related Topics



Leave a reply



Submit