Dynamic Alternative to Pivot With Case and Group By

Dynamic alternative to pivot with CASE and GROUP BY

If you have not installed the additional module tablefunc, run this command once per database:

CREATE EXTENSION tablefunc;

Answer to question

A very basic crosstab solution for your case:

SELECT * FROM crosstab(
'SELECT bar, 1 AS cat, feh
FROM tbl_org
ORDER BY bar, feh')
AS ct (bar text, val1 int, val2 int, val3 int); -- more columns?

The special difficulty here is, that there is no category (cat) in the base table. For the basic 1-parameter form we can just provide a dummy column with a dummy value serving as category. The value is ignored anyway.

This is one of the rare cases where the second parameter for the crosstab() function is not needed, because all NULL values only appear in dangling columns to the right by definition of this problem. And the order can be determined by the value.

If we had an actual category column with names determining the order of values in the result, we'd need the 2-parameter form of crosstab(). Here I synthesize a category column with the help of the window function row_number(), to base crosstab() on:

SELECT * FROM crosstab(
$$
SELECT bar, val, feh
FROM (
SELECT *, 'val' || row_number() OVER (PARTITION BY bar ORDER BY feh) AS val
FROM tbl_org
) x
ORDER BY 1, 2
$$
, $$VALUES ('val1'), ('val2'), ('val3')$$ -- more columns?
) AS ct (bar text, val1 int, val2 int, val3 int); -- more columns?

The rest is pretty much run-of-the-mill. Find more explanation and links in these closely related answers.

Basics:

Read this first if you are not familiar with the crosstab() function!

  • PostgreSQL Crosstab Query

Advanced:

  • Pivot on Multiple Columns using Tablefunc
  • Merge a table and a change log into a view in PostgreSQL

Proper test setup

That's how you should provide a test case to begin with:

CREATE TEMP TABLE tbl_org (id int, feh int, bar text);
INSERT INTO tbl_org (id, feh, bar) VALUES
(1, 10, 'A')
, (2, 20, 'A')
, (3, 3, 'B')
, (4, 4, 'B')
, (5, 5, 'C')
, (6, 6, 'D')
, (7, 7, 'D')
, (8, 8, 'D');

Dynamic crosstab?

Not very dynamic, yet, as @Clodoaldo commented. Dynamic return types are hard to achieve with plpgsql. But there are ways around it - with some limitations.

So not to further complicate the rest, I demonstrate with a simpler test case:

CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
('A', 'val1', 10)
, ('A', 'val2', 20)
, ('B', 'val1', 3)
, ('B', 'val2', 4)
, ('C', 'val1', 5)
, ('D', 'val3', 8)
, ('D', 'val1', 6)
, ('D', 'val2', 7);

Call:

SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2')
AS ct (row_name text, val1 int, val2 int, val3 int);

Returns:

 row_name | val1 | val2 | val3
----------+------+------+------
A | 10 | 20 |
B | 3 | 4 |
C | 5 | |
D | 6 | 7 | 8

Built-in feature of tablefunc module

The tablefunc module provides a simple infrastructure for generic crosstab() calls without providing a column definition list. A number of functions written in C (typically very fast):

crosstabN()

crosstab1() - crosstab4() are pre-defined. One minor point: they require and return all text. So we need to cast our integer values. But it simplifies the call:

SELECT * FROM crosstab4('SELECT row_name, attrib, val::text  -- cast!
FROM tbl ORDER BY 1,2')

Result:

 row_name | category_1 | category_2 | category_3 | category_4
----------+------------+------------+------------+------------
A | 10 | 20 | |
B | 3 | 4 | |
C | 5 | | |
D | 6 | 7 | 8 |

Custom crosstab() function

For more columns or other data types, we create our own composite type and function (once).

Type:

CREATE TYPE tablefunc_crosstab_int_5 AS (
row_name text, val1 int, val2 int, val3 int, val4 int, val5 int);

Function:

CREATE OR REPLACE FUNCTION crosstab_int_5(text)
RETURNS SETOF tablefunc_crosstab_int_5
AS '$libdir/tablefunc', 'crosstab' LANGUAGE c STABLE STRICT;

Call:

SELECT * FROM crosstab_int_5('SELECT row_name, attrib, val   -- no cast!
FROM tbl ORDER BY 1,2');

Result:

 row_name | val1 | val2 | val3 | val4 | val5
----------+------+------+------+------+------
A | 10 | 20 | | |
B | 3 | 4 | | |
C | 5 | | | |
D | 6 | 7 | 8 | |

One polymorphic, dynamic function for all

This goes beyond what's covered by the tablefunc module.

To make the return type dynamic I use a polymorphic type with a technique detailed in this related answer:

  • Refactor a PL/pgSQL function to return the output of various SELECT queries

1-parameter form:

CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _rowtype anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE
(SELECT format('SELECT * FROM crosstab(%L) t(%s)'
, _qry
, string_agg(quote_ident(attname) || ' ' || atttypid::regtype
, ', ' ORDER BY attnum))
FROM pg_attribute
WHERE attrelid = pg_typeof(_rowtype)::text::regclass
AND attnum > 0
AND NOT attisdropped);
END
$func$ LANGUAGE plpgsql;

Overload with this variant for the 2-parameter form:

CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _cat_qry text, _rowtype anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE
(SELECT format('SELECT * FROM crosstab(%L, %L) t(%s)'
, _qry, _cat_qry
, string_agg(quote_ident(attname) || ' ' || atttypid::regtype
, ', ' ORDER BY attnum))
FROM pg_attribute
WHERE attrelid = pg_typeof(_rowtype)::text::regclass
AND attnum > 0
AND NOT attisdropped);
END
$func$ LANGUAGE plpgsql;

pg_typeof(_rowtype)::text::regclass: There is a row type defined for every user-defined composite type, so that attributes (columns) are listed in the system catalog pg_attribute. The fast lane to get it: cast the registered type (regtype) to text and cast this text to regclass.

Create composite types once:

You need to define once every return type you are going to use:

CREATE TYPE tablefunc_crosstab_int_3 AS (
row_name text, val1 int, val2 int, val3 int);

CREATE TYPE tablefunc_crosstab_int_4 AS (
row_name text, val1 int, val2 int, val3 int, val4 int);

...

For ad-hoc calls, you can also just create a temporary table to the same (temporary) effect:

CREATE TEMP TABLE temp_xtype7 AS (
row_name text, x1 int, x2 int, x3 int, x4 int, x5 int, x6 int, x7 int);

Or use the type of an existing table, view or materialized view if available.

Call

Using above row types:

1-parameter form (no missing values):

SELECT * FROM crosstab_n(
'SELECT row_name, attrib, val FROM tbl ORDER BY 1,2'
, NULL::tablefunc_crosstab_int_3);

2-parameter form (some values can be missing):

SELECT * FROM crosstab_n(
'SELECT row_name, attrib, val FROM tbl ORDER BY 1'
, $$VALUES ('val1'), ('val2'), ('val3')$$
, NULL::tablefunc_crosstab_int_3);

This one function works for all return types, while the crosstabN() framework provided by the tablefunc module needs a separate function for each.

If you have named your types in sequence like demonstrated above, you only have to replace the bold number. To find the maximum number of categories in the base table:

SELECT max(count(*)) OVER () FROM tbl  -- returns 3
GROUP BY row_name
LIMIT 1;

That's about as dynamic as this gets if you want individual columns. Arrays like demonstrated by @Clocoaldo or a simple text representation or the result wrapped in a document type like json or hstore can work for any number of categories dynamically.

Disclaimer:

It's always potentially dangerous when user input is converted to code. Make sure this cannot be used for SQL injection. Don't accept input from untrusted users (directly).

Call for original question:

SELECT * FROM crosstab_n('SELECT bar, 1, feh FROM tbl_org ORDER BY 1,2'
, NULL::tablefunc_crosstab_int_3);

Dynamic Pivot Grouping by Case (Unpivot)

To get the result you want you don't need to use dynamic SQL at all, but instead you can use a simple group by with grouping sets.

WITH t AS (
SELECT
[ProcessName] =
CASE
WHEN ProcessPages >= 1 AND ProcessPages <= 5 THEN '1p'
WHEN ProcessPages >= 6 AND ProcessPages <= 10 THEN '2p'
WHEN ProcessPages >= 11 AND ProcessPages <= 16 THEN '3p'
WHEN ProcessPages >= 17 AND ProcessPages <= 50 THEN '4p'
WHEN ProcessPages > 50 THEN '5p'
END
, ProcessClass
FROM temp1
)
SELECT
ProcessName = CASE WHEN GROUPING(ProcessName) = 0 THEN ProcessName ELSE 'Total' END
, With_y = COUNT(CASE WHEN [ProcessClass] = 'Y' THEN ProcessClass END)
, Without_Y = COUNT(CASE WHEN [ProcessClass] <> 'Y' THEN ProcessClass END)
FROM t
GROUP BY GROUPING SETS (ProcessName, ());

-- use the next line for versions <2008:
-- GROUP BY ProcessName WITH ROLLUP;

To avoid having to repeat the case expression I used a common table expression.

Sample SQL Fiddle

Pivot table using crosstab and count

1. Static solution with a limited list of marking values :

SELECT year
, TO_CHAR( creation_date, 'Month') AS month
, COUNT(*) FILTER (WHERE marking = 'Delivered') AS Delivered
, COUNT(*) FILTER (WHERE marking = 'Not delivered') AS "Not delivered"
, COUNT(*) FILTER (WHERE marking = 'Not Received') AS "Not Received"
FROM invoices
GROUP BY 1,2

2. Full dynamic solution with a large list of marking values :

This proposal is an alternative solution to the crosstab solution as proposed in A and B.

The proposed solution here just requires a dedicated composite type which can be dynamically created and then it relies on the jsonb type and standard functions :

Starting from your query which counts the number of rows per year, month and marking value :

  • Using the jsonb_object_agg function, the resulting rows are first
    aggregated by year and month into jsonb objects whose jsonb keys
    correspond to the marking values and whose jsonb values
    correspond to the counts.
  • the resulting jsonb objects are then converted into records using the jsonb_populate_record function and the dedicated composite type.

First we dynamically create a composite type which corresponds to the ordered list of marking values :

CREATE OR REPLACE PROCEDURE create_composite_type() LANGUAGE plpgsql AS $$
DECLARE
column_list text ;
BEGIN
SELECT string_agg(DISTINCT quote_ident(marking) || ' bigint', ',' ORDER BY quote_ident(marking) || ' bigint' ASC)
INTO column_list
FROM invoices ;

EXECUTE 'DROP TYPE IF EXISTS composite_type' ;
EXECUTE 'CREATE TYPE composite_type AS (' || column_list || ')' ;
END ;
$$ ;

CALL create_composite_type() ;

Then the expected result is provided by the following query :

SELECT a.year
, TO_CHAR(a.year_month, 'Month') AS month
, (jsonb_populate_record( null :: composite_type
, jsonb_object_agg(a.marking, a.count)
)
).*
FROM
( SELECT year
, date_trunc('month', creation_date) AS year_month
, marking
, count(*) AS count
FROM invoices AS v
GROUP BY 1,2,3
) AS a
GROUP BY 1,2
ORDER BY month

Obviously, if the list of marking values may vary in time, then you have to recall the create_composite_type() procedure just before executing the query. If you don't update the composite_type, the query will still work (no error !) but some old marking values may be obsolete (not used anymore), and some new marking values may be missing in the query result (not displayed as columns).

See the full demo in dbfiddle.

Dynamic Pivot in Oracle's SQL

You can't put a non constant string in the IN clause of the pivot clause.

You can use Pivot XML for that.

From documentation:

subquery A subquery is used only in conjunction with the XML keyword.
When you specify a subquery, all values found by the subquery are used
for pivoting

It should look like this:

select xmlserialize(content t.B_XML) from t_aa
pivot xml(
sum(A) for B in(any)
) t;

You can also have a subquery instead of the ANY keyword:

select xmlserialize(content t.B_XML) from t_aa
pivot xml(
sum(A) for B in (select cl from t_bb)
) t;

Here is a sqlfiddle demo

Dynamic pivot query using PostgreSQL 9.3

SELECT *
FROM crosstab (
'SELECT ProductNumber, ProductName, Salescountry, SalesQuantity
FROM product
ORDER BY 1'
, $$SELECT unnest('{US,UK,UAE1}'::varchar[])$$
) AS ct (
"ProductNumber" varchar
, "ProductName" varchar
, "US" int
, "UK" int
, "UAE1" int);

Detailed explanation:

  • PostgreSQL Crosstab Query
  • Pivot on Multiple Columns using Tablefunc

Completely dynamic query for varying number of distinct Salescountry?

  • Dynamic alternative to pivot with CASE and GROUP BY

Query with dynamic target columns

Basically you want a pivot table or a cross tabulation. The additional module tablefunc provides the functionality you need. If you are not familiar with it, read this first:

  • PostgreSQL Crosstab Query

The special difficulty of your case: you first need a query joining the tables to produce the right input:

SELECT p.name, f.name, text 'x' AS marker -- required, logically redundant column
FROM people p
LEFT JOIN people_fruits pf ON pf.person_id = p.id -- LEFT JOIN !
LEFT JOIN fruits f ON f.id = pf.fruit_id
ORDER BY p.id, f.id; -- seems to be the desired sort order

LEFT [OUTER] JOIN, so you don't lose people without fruits.

Use it in a crosstab() function taking two parameters like this:

SELECT * FROM crosstab(
$$SELECT p.name, f.name, text 'x'
FROM people p
LEFT JOIN people_fruits pf ON pf.person_id = p.id
LEFT JOIN fruits f ON f.id = pf.fruit_id
ORDER BY p.id$$
,$$VALUES ('bananna'), ('orange'), ('pear'), ('apple'), ('grape')$$)
AS ct (name text, bananna text, orange text, pear text, apple text, grape text);

The order of fruits in the target column list has to match the order of fruits in the 2nd parameter (ordered by id in your case).

Missing fruits get a NULL value.

However, this is not dynamic, yet. Completely dynamic is strictly not possible with SQL, which requires to know resulting columns at call time. One way or the other, you need two round trips to the DB server. You can let Postgres build the crosstab query dynamically and then execute it in the next step.

Related answers with code examples:

  • Execute a dynamic crosstab query
  • Dynamic alternative to pivot with CASE and GROUP BY

An alternative would be to return an array or a document type (json, xml, ...) that contains a dynamic list of elements.

Creating dynamic columns from table data

For more than a few domains use crosstab() to make the query shorter and faster.

  • PostgreSQL Crosstab Query

A completely dynamic query, returning a dynamic number of columns based on data in your table is not possible, because SQL is strictly typed. Whatever you try, you'll end up needing two steps. Step 1: generate the query, step 2: execute it.

  • Execute a dynamic crosstab query

Or you return something more flexible instead of table columns, like an array or a document type like json. Details:

  • Dynamic alternative to pivot with CASE and GROUP BY
  • Refactor a PL/pgSQL function to return the output of various SELECT queries

PostgreSQL 9.3: Dynamic pivot table

You can do this with crosstab() from the additional module tablefunc:

SELECT b
, COALESCE(a1, 0) AS "A1"
, COALESCE(a2, 0) AS "A2"
, COALESCE(a3, 0) AS "A3"
, ... -- all the way up to "A30"
FROM crosstab(
'SELECT colb, cola, 1 AS val FROM matrix
ORDER BY 1,2'
, $$SELECT 'A'::text || g FROM generate_series(1,30) g$$
) AS t (b text
, a1 int, a2 int, a3 int, a4 int, a5 int, a6 int
, a7 int, a8 int, a9 int, a10 int, a11 int, a12 int
, a13 int, a14 int, a15 int, a16 int, a17 int, a18 int
, a19 int, a20 int, a21 int, a22 int, a23 int, a24 int
, a25 int, a26 int, a27 int, a28 int, a29 int, a30 int);

If NULL instead of 0 works, too, it can be just SELECT * in the outer query.

Detailed explanation:

  • PostgreSQL Crosstab Query

The special "difficulty" here: no actual "value". So add 1 AS val as last column.

Unknown number of categories

A completely dynamic query (with unknown result type) is not possible in a single query. You need two queries. First build a statement like the above dynamically, then execute it. Details:

  • Selecting multiple max() values using a single SQL statement

  • PostgreSQL convert columns to rows? Transpose?

  • Dynamically generate columns for crosstab in PostgreSQL

  • Dynamic alternative to pivot with CASE and GROUP BY

Too many categories

If you exceed the maximum number of columns (1600), a classic crosstab is impossible, because the result cannot be represented with individual columns. (Also, human eyes would hardly be able to read a table with that many columns)

Arrays or document types like hstore or jsonb are the alternative. Here is a solution with arrays:

SELECT colb, array_agg(cola) AS colas
FROM (
SELECT colb, right(colb, -1)::int AS sortb
, CASE WHEN m.cola IS NULL THEN 0 ELSE 1 END AS cola
FROM (SELECT DISTINCT colb FROM matrix) b
CROSS JOIN (SELECT DISTINCT cola FROM matrix) a
LEFT JOIN matrix m USING (colb, cola)
ORDER BY sortb, right(cola, -1)::int
) sub
GROUP BY 1, sortb
ORDER BY sortb;
  • Build the complete grid of values with:

                (SELECT DISTINCT colb FROM matrix) b
    CROSS JOIN (SELECT DISTINCT cola FROM matrix) a
  • LEFT JOIN existing combinations, order by the numeric part of the name and aggregate into arrays.

    • right(colb, -1)::int trims the leading character from 'A3' and casts the digits to integer so we get a proper sort order.

Basic matrix

If you just want a table of 0 an 1 where x = y, this can be had cheaper:

SELECT x, array_agg((x = y)::int) AS y_arr
FROM generate_series(1,10) x
, generate_series(1,10) y
GROUP BY 1
ORDER BY 1;

SQL Fiddle building on the one you provided in the comments.

Note that sqlfiddle.com currently has a bug that kills the display of array values. So I cast to text there to work around it.



Related Topics



Leave a reply



Submit