Turning Arbitrarily Many Rows into Columns in Postgresql

Turning arbitrarily many rows into columns in PostgreSQL

The main problem with pivot tables in Postgres (and other RDBMS) is that the structure (number and names of columns) of a query result cannot vary depending on the selected data. One of the possible solutions is to dynamically create a view, which structure is defined by the data. The example function creates a view based on the table example_table:

create or replace function create_pivot_view()
returns void language plpgsql as $$
declare
list text;
begin
select string_agg(format('jdata->>%1$L "%1$s"', name), ', ')
from (
select distinct name
from example_table
) sub
into list;

execute format($f$
drop view if exists example_pivot_view;
create view example_pivot_view as
select lbl, %s
from (
select lbl, json_object_agg(name, value) jdata
from example_table
group by 1
order by 1
) sub
$f$, list);
end $$;

Use the function after the table is modified (maybe in a trigger) and query the created view:

select create_pivot_view();

select *
from example_pivot_view;

lbl | num | colour | percentage
-----+-----+--------+------------
1 | 1 | Red | 25.0
2 | 2 | Green | 50.0
3 | 3 | Blue | 75.0
(3 rows)

Test it in db<>fiddle.

Note, that it's necessary to recreate a view (call the function) only after a new name is added to the table (or some name is removed from it). If the set of distinct names doesn't change you can query the view without recreating it. If the set is modified frequently creating a temporary view would be a better option.

You may be also interested in Flatten aggregated key/value pairs from a JSONB field?

Combine multiple rows into multiple columns in PostgreSQL

You need group by item_code to get a single row per group, then use an aggregate like max() to get the non-null value for each group:

select
item_code,
max(
case name_type
when 'full' then name_value
end
) as full_name,
max(
case name_type
when 'short' then name_value
end
) as short_name,
max(
case name_type
when 'half' then name_value
end
) as half_name
from t
group by 1;

Transform Postgres rows data into columns based on condition

Below is the query to create the respective table and insert some data.

begin;
create table trans_custom_fields(id text, _value text,transid integer );
insert into trans_custom_fields values('ACCOUNT_HOLDER_NAME','Manoj Sharma',1);
insert into trans_custom_fields values('ACCOUNT_NUMBER', '113565TTE44656', 1);
insert into trans_custom_fields values( 'RECIPT_NUMBER', '24324.', 1);
insert into trans_custom_fields values( 'ACCOUNT_HOLDER_NAME', 'Another User', 2);
insert into trans_custom_fields values('ACCOUNT_NUMBER', '35546656TRFG23', 2);
insert into trans_custom_fields values('RECIPT_NUMBER', '24324686', 2);
commit;

Now I want to do the transformation for this data and here I am going to use crosstab feature of Postgres.

SELECT *
FROM crosstab(
'SELECT transid, id, _value
FROM trans_custom_fields
ORDER BY 1,2'
) AS ct (transid int, ACCOUNT_HOLDER_NAME text, ACCOUNT_NUMBER text);

I am really thankful to crosstab example for just helping me understand and write my own answer for my question, also thank @mark who does provide the queries and resolution but that fit better as of now.

Convert multiple rows into one row with multiple columns

You can use CASE statements for simple queries.

Or use the crosstab() function of the tablefunc module for more complex cases and better performance.

You can find examples for both cases under this related question:

PostgreSQL Crosstab Query

Combining multiple rows into one

This would be simpler with a simpler schema:

  • No domain types (what's the purpose?)
  • Add an actual PK to table medium
  • Rather use a surrogate PKs (a serial column) instead of the multicolumn PK and FK over two domain types.

    Or at least use the same (simpler) column name for columns with the same content: just alpha_id instead of m_alphaID and w_alphaID etc.

That aside, here are solutions for your setup as is:

True crosstab() solution

There are several specific difficulties for your crosstab() query:

  • No single column that can serve as row_name.
  • Multiple extra columns.
  • No category column.
  • No defined order for values (so I use arbitrary order instead).

Basics (read this first!):

  • PostgreSQL Crosstab Query

For your special case:

  • Pivot on Multiple Columns using Tablefunc
  • Dynamic alternative to pivot with CASE and GROUP BY

Solution:

SELECT alphaid, numid, name_of_work, material_1, material_2, material_3
FROM crosstab(
'SELECT rn, w.alphaid, w.numid, w.name_of_work
, row_number() OVER (PARTITION BY rn) AS mat_nr -- order undefined!
, m_title AS Material_used
FROM (
SELECT w_alphaID AS alphaid, w_numID AS numid, w_title AS name_of_work
, row_number() OVER (ORDER BY w_alphaID, w_numID) AS rn
FROM works
) w
JOIN medium m ON w.alphaid = m.m_alphaID
AND w.numid = m.m_numID
ORDER BY rn, mat_nr'
, 'VALUES (1), (2), (3)' -- add more ...
)
AS ct (
rn bigint, alphaid text, numid int, name_of_work text
, material_1 text, material_2 text, material_3 text -- add more ...
);

Poor man's crosstab with standard SQL

If the additional module tablefunc cannot be installed or if top performance is not important, this simpler query does the same, slower:

SELECT w_alphaid AS alphaid, w_numid AS numid, w_title AS name_of_work
, arr[1] AS material_used_1
, arr[2] AS material_used_2
, arr[3] AS material_used_3 -- add more?
FROM works w
LEFT JOIN (
SELECT m_alphaid, m_numid, array_agg(m_title::text) AS arr
FROM medium
GROUP BY m_alphaid, m_numid
) m ON w.w_alphaid = m.m_alphaid
AND w.w_numid = m.m_numid;
  • The cast to text (or varchar ...) is necessary because there is no predefined array type for your custom domain. Alternatively you could define the missing array type.

  • One subtle difference to the above: using LEFT JOIN here instead of just JOIN to preserve rows from works that have no related materials in medium at all.

  • Since you return the whole table, it's cheaper to aggregate rows in medium before you join. For a small selection it might be cheaper to join first and then aggregate. Related:

    • GROUP or DISTINCT after JOIN returns duplicates

How to use PL/pgSQL to construct a table with dynamic columns

You can dynamically create a view. The idea and the solution for a simpler case than yours is described in this answer. Please read it before continuing.

We will use the following query to create a view:

with all_locations(location) as (
select distinct location_a
from locations
union
select distinct location_b
from locations
)

select location_a as location, json_object_agg(location_b, count order by location_b) as data
from (
select a.location as location_a, b.location as location_b, count(l.*)
from all_locations a
cross join all_locations b
left join locations l on location_a = a.location and location_b = b.location
group by 1, 2
) s
group by 1
order by 1;

Results:

 location |                                    data                                    
----------+----------------------------------------------------------------------------
Atlanta | { "Atlanta" : 0, "London" : 1, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
London | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
New York | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 0, "Tokyo" : 1 }
Sydney | { "Atlanta" : 0, "London" : 1, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
Tokyo | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 2, "Tokyo" : 0 }
(5 rows)

The list of cities will be used twice inside the function, so it is stored in the array cities. Note, that you can replace the first query in the function to a simpler one (it is simply ordered list of distinct cities).

create or replace function create_locations_view()
returns void language plpgsql as $$
declare
cities text[];
list text;
begin
-- fill array with all cities in alphabetical order
select array_agg(location_a order by location_a)
from (
select distinct location_a
from locations
union
select distinct location_b
from locations
) s
into cities;

-- construct list of columns to use in select list
select string_agg(format($s$data->>'%1$s' "%1$s"$s$, city), ', ')
from unnest(cities) city
into list;

-- create view from select based on the above list
execute format($ex$
drop view if exists locations_view;
create view locations_view as
select location, %1$s
from (
select location_a as location, json_object_agg(location_b, count order by location_b) as data
from (
select a.location as location_a, b.location as location_b, count(l.*)
from unnest(%2$L::text[]) a(location)
cross join unnest(%2$L::text[]) b(location)
left join locations l on location_a = a.location and location_b = b.location
group by 1, 2
) s
group by 1
) s
order by 1
$ex$, list, cities);
end $$;

Use the function and select data from the created view:

select create_locations_view();
select * from locations_view;

location | Atlanta | London | New York | Sydney | Tokyo
----------+---------+--------+----------+--------+-------
Atlanta | 0 | 1 | 0 | 0 | 0
London | 0 | 0 | 0 | 0 | 0
New York | 0 | 0 | 0 | 0 | 1
Sydney | 0 | 1 | 0 | 0 | 0
Tokyo | 0 | 0 | 0 | 2 | 0
(5 rows)

I used this method several times but I have no experience with really large data, so I cannot guarantee that it is efficient.



Related Topics



Leave a reply



Submit