Refactor a PL/pgSQL function to return the output of various SELECT queries
Dynamic SQL and RETURN
type
(I saved the best for last, keep reading!)
You want to execute dynamic SQL. In principal, that's simple in plpgsql with the help of EXECUTE
. You don't need a cursor. In fact, most of the time you are better off without explicit cursors.
The problem you run into: you want to return records of yet undefined type. A function needs to declare its return type in the RETURNS
clause (or with OUT
or INOUT
parameters). In your case you would have to fall back to anonymous records, because number, names and types of returned columns vary. Like:
CREATE FUNCTION data_of(integer)
RETURNS SETOF record AS ...
However, this is not particularly useful. You have to provide a column definition list with every call. Like:
SELECT * FROM data_of(17)
AS foo (colum_name1 integer
, colum_name2 text
, colum_name3 real);
But how would you even do this, when you don't know the columns beforehand?
You could use less structured document data types like json
, jsonb
, hstore
or xml
. See:
- How to store a data table in database?
But, for the purpose of this question, let's assume you want to return individual, correctly typed and named columns as much as possible.
Simple solution with fixed return type
The column datahora
seems to be a given, I'll assume data type timestamp
and that there are always two more columns with varying name and data type.
Names we'll abandon in favor of generic names in the return type.
Types we'll abandon, too, and cast all to text
since every data type can be cast to text
.
CREATE OR REPLACE FUNCTION data_of(_id integer)
RETURNS TABLE (datahora timestamp, col2 text, col3 text)
LANGUAGE plpgsql AS
$func$
DECLARE
_sensors text := 'col1::text, col2::text'; -- cast each col to text
_type text := 'foo';
BEGIN
RETURN QUERY EXECUTE '
SELECT datahora, ' || _sensors || '
FROM ' || quote_ident(_type) || '
WHERE id = $1
ORDER BY datahora'
USING _id;
END
$func$;
The variables _sensors
and _type
could be input parameters instead.
Note the RETURNS TABLE
clause.
Note the use of RETURN QUERY EXECUTE
. That is one of the more elegant ways to return rows from a dynamic query.
I use a name for the function parameter, just to make the USING
clause of RETURN QUERY EXECUTE
less confusing. $1
in the SQL-string does not refer to the function parameter but to the value passed with the USING
clause. (Both happen to be $1
in their respective scope in this simple example.)
Note the example value for _sensors
: each column is cast to type text
.
This kind of code is very vulnerable to SQL injection. I use quote_ident()
to protect against it. Lumping together a couple of column names in the variable _sensors
prevents the use of quote_ident()
(and is typically a bad idea!). Ensure that no bad stuff can be in there some other way, for instance by individually running the column names through quote_ident()
instead. A VARIADIC
parameter comes to mind ...
Simpler since PostgreSQL 9.1
With version 9.1 or later you can use format()
to further simplify:
RETURN QUERY EXECUTE format('
SELECT datahora, %s -- identifier passed as unescaped string
FROM %I -- assuming the name is provided by user
WHERE id = $1
ORDER BY datahora'
,_sensors, _type)
USING _id;
Again, individual column names could be escaped properly and would be the clean way.
Variable number of columns sharing the same type
After your question updates it looks like your return type has
- a variable number of columns
- but all columns of the same type
double precision
(aliasfloat8
)
Use an ARRAY
type in this case to nest a variable number of values. Additionally, I return an array with column names:
CREATE OR REPLACE FUNCTION data_of(_id integer)
RETURNS TABLE (datahora timestamp, names text[], values float8[])
LANGUAGE plpgsql AS
$func$
DECLARE
_sensors text := 'col1, col2, col3'; -- plain list of column names
_type text := 'foo';
BEGIN
RETURN QUERY EXECUTE format('
SELECT datahora
, string_to_array($1) -- AS names
, ARRAY[%s] -- AS values
FROM %s
WHERE id = $2
ORDER BY datahora'
, _sensors, _type)
USING _sensors, _id;
END
$func$;
Various complete table types
To actually return all columns of a table, there is a simple, powerful solution using a polymorphic type:
CREATE OR REPLACE FUNCTION data_of(_tbl_type anyelement, _id int)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format('
SELECT *
FROM %s -- pg_typeof returns regtype, quoted automatically
WHERE id = $1
ORDER BY datahora'
, pg_typeof(_tbl_type))
USING _id;
END
$func$;
Call (important!):
SELECT * FROM data_of(NULL::pcdmet, 17);
Replace pcdmet
in the call with any other table name.
How does this work?
anyelement
is a pseudo data type, a polymorphic type, a placeholder for any non-array data type. All occurrences of anyelement
in the function evaluate to the same type provided at run time. By supplying a value of a defined type as argument to the function, we implicitly define the return type.
PostgreSQL automatically defines a row type (a composite data type) for every table created, so there is a well defined type for every table. This includes temporary tables, which is convenient for ad-hoc use.
Any type can be NULL
. Hand in a NULL
value, cast to the table type: NULL::pcdmet
.
Now the function returns a well-defined row type and we can use SELECT * FROM data_of()
to decompose the row and get individual columns.
pg_typeof(_tbl_type)
returns the name of the table as object identifier type regtype
. When automatically converted to text
, identifiers are automatically double-quoted and schema-qualified if needed, defending against SQL injection automatically. This can even deal with schema-qualified table-names where quote_ident()
would fail. See:
- Table name as a PostgreSQL function parameter
How to return result of a SELECT inside a function in PostgreSQL?
Use RETURN QUERY
:
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text -- also visible as OUT param in function body
, cnt bigint
, ratio bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, count(*) AS cnt -- column alias only visible in this query
, (count(*) * 100) / _max_tokens -- I added parentheses
FROM (
SELECT t.txt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
LIMIT _max_tokens
) t
GROUP BY t.txt
ORDER BY cnt DESC; -- potential ambiguity
END
$func$;
Call:
SELECT * FROM word_frequency(123);
Defining the return type explicitly is much more practical than returning a generic record
. This way you don't have to provide a column definition list with every function call. RETURNS TABLE
is one way to do that. There are others. Data types of OUT
parameters have to match exactly what is returned by the query.
Choose names for OUT
parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.
But note the potential naming conflict between the OUT
parameter cnt
and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...
) Postgres uses the column alias over the OUT
parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:
- Use the ordinal position of the item in the SELECT list:
ORDER BY 2 DESC
. Example:- Select first row in each GROUP BY group?
- Repeat the expression
ORDER BY count(*)
. - (Not required here.) Set the configuration parameter
plpgsql.variable_conflict
or use the special command#variable_conflict error | use_variable | use_column
in the function. See:- Naming conflict between function parameter and result of JOIN with USING clause
Don't use "text" or "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt
and cnt
in my examples, you may want more explicit names.
Added a missing ;
and corrected a syntax error in the header. (_max_tokens int)
, not (int maxTokens)
- data type after name.
While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Or work with numeric
or a floating point type. See below.
Alternative
This is what I think your query should actually look like (calculating a relative share per token):
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text
, abs_cnt bigint
, relative_share numeric)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt, t.cnt
, round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2) -- AS relative_share
FROM (
SELECT t.txt, count(*) AS cnt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
GROUP BY t.txt
ORDER BY cnt DESC
LIMIT _max_tokens
) t
ORDER BY t.cnt DESC;
END
$func$;
The expression sum(t.cnt) OVER ()
is a window function. You could use a CTE instead of the subquery. Pretty, but a subquery is typically cheaper in simple cases like this one (mostly before Postgres 12).
A final explicit RETURN
statement is not required (but allowed) when working with OUT
parameters or RETURNS TABLE
(which makes implicit use of OUT
parameters).
round()
with two parameters only works for numeric
types. count()
in the subquery produces a bigint
result and a sum()
over this bigint
produces a numeric
result, thus we deal with a numeric
number automatically and everything just falls into place.
A PL/pgSQL function with varying return type (and varying inner query)
I have two comments to the first solution.
First, use or %I
or quote_ident()
in format()
function, not both. Compare:
with q(s) as (
values ('abba'), ('ABBA')
)
select
quote_ident(s) ok1,
format('%I', s) ok2,
format('%I', quote_ident(s)) bad_idea
from q;
ok1 | ok2 | bad_idea
--------+--------+------------
abba | abba | abba
"ABBA" | "ABBA" | """ABBA"""
(2 rows)
Second, you do not need two functions:
create or replace function my_select(into_table text, tbl text, order_by text default null)
returns void as $function$
declare
q text;
begin
q := 'select * from ' || quote_ident(tbl);
if order_by is not null then
q := q || ' order by ' || order_by;
end if;
execute format($$
create temp table %I
on commit drop
as %s
$$, into_table, q);
end
$function$ language plpgsql;
begin;
select my_select('my_tmp', 'my_data1', 'id');
select * from my_tmp;
commit;
BEGIN
my_select
-----------
(1 row)
id | val
----+-----
1 | a
2 | c
3 | d
4 | b
(4 rows)
COMMIT
In this particular case, the second solution is better.
A temporary table is not particularly expensive, but still unnecessary.
The cost will be the more important the more data in the table.
If you have a good alternative to create a temporary table, use it.
Besides, the need to include the function call and the select query in a transaction can be a bit cumbersome in some cases.
The second solution is smart and is ideally suited to the task at hand.
How to return diferent table data based on an ID passed to an SQL function
This is generally impossible with SQL functions. Even with a polymorphic return type, the actual return type must be determined at call time. But all statements in an SQL function are planned before the function is executed. So you'd always end up with an error message for one of the SELECT
statements returning data that doesn't fit the return type.
The same can be done with dynamic SQL in a PL/pgSQL function - with some trickery:
CREATE OR REPLACE FUNCTION f_demo(_tabletype anyelement, _id int)
RETURNS SETOF anyelement LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
format('SELECT * FROM %s WHERE id = $1', pg_typeof(_tabletype))
USING _id;
END
$func$;
Call (important!):
SELECT * FROM f_demo(null::schemaZ.Table_1, 1);
The "trick" is to cast a null
value to the desired table type, thereby defining the return type and choosing from which table to select. Detailed explanation:
- Refactor a PL/pgSQL function to return the output of various SELECT queries
Take this as proof of concept. Typically, there are better (safer, less confusing, more performant) solutions ...
Related:
- Difference between language sql and language plpgsql in PostgreSQL functions
returning a query result from dynamic query
In your case smth like:
t=# CREATE OR REPLACE FUNCTION data_of(_tbl_type anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE format('
SELECT *
FROM %s -- pg_typeof returns regtype, quoted automatically
WHERE true /*or some filter in additional arguments or so */
ORDER BY true /*or some filter in additional arguments or so */'
, pg_typeof(_tbl_type))
;
END
$func$ LANGUAGE plpgsql;
CREATE FUNCTION
would work:
t=# create table city(i serial, cn text);
CREATE TABLE
t=# insert into city(cn) values ('Moscow'), ('Barcelona'),('Visaginas');
INSERT 0 3
t=# SELECT * FROM data_of(NULL::city);
i | cn
---+-----------
1 | Moscow
2 | Barcelona
3 | Visaginas
(3 rows)
all credits to Erwin with his https://stackoverflow.com/a/11751557/5315974 which is one obligatory reading
How to define a function returning one of multiple data types?
SQL is strictly typed. The simple solution is to return one text
field. Every value can be cast to text
. (You might add a second return column holding the type name as meta information.)
Alternatively, return four typed fields (varchar, int, uuid, timestamp)
, and only fill one of them, depending on the result.
A polymorphic function would only be an option if you can tell the return type at call time and pass typed input accordingly.
Related:
- Refactor a PL/pgSQL function to return the output of various SELECT queries
Related Topics
Split Function Equivalent in T-Sql
Passing a Varchar Full of Comma Delimited Values to a SQL Server in Function
Oracle SQL Query: Retrieve Latest Values Per Group Based on Time
Get Top Results For Each Group (In Oracle)
Get List of All Tables in Oracle
How to Use an Insert Statement'S Output Clause to Get the Identity Value
How to Use the 'As' Keyword to Alias a Table in Oracle
Ordering by Specific Field Value First
Insert Text With Single Quotes in Postgresql
Get a List of Dates Between Two Dates
Postgresql Group_Concat Equivalent
How to List All Foreign Keys Referencing a Given Table in SQL Server
MySQL - How to Unpivot Columns to Rows
How to See the Raw SQL Queries Django Is Running
How to Combine Date from One Field With Time from Another Field - Ms SQL Server