Db2 Comma Separated Output by Groups

DB2 query, group result and allow different values to be a comma seperated value

You are describing listagg():

select name, address,
listagg(item, ',') within group (order by item) as items,
date
from t
group by name, address, date;

DB2 : Find distinct from a comma separated values

DB2 offers a way to tokens strings using XML. Using this, you can split the string into tokens and then use listagg(distinct):

select v.id, listagg(distinct tokens.token, ',')
from (values (1, 'A,B,C'), (2, 'A,D,A,C,B'), (3, 'B,A,C,C,D')) v(id, val),
xmltable('for $id in tokenize($s, ",") return {string($id)}' passing v.val as "s"
columns seq for ordinality, token varchar(20) path '.'
) tokens
group by v.id;

Here is a db<>fiddle.

Note: I strongly recommend that you fix the data model. Storing multiple values in a string is bad way to store data in any database.

DB2 concatenate unique column values into one row, comma seperated

This is rather complicated, so I will show all my work:

Table definitions

create table parts
(part_number Varchar(64),
load_date Date,
total_qty Dec(5,0));
create table locations
(part_number Varchar(64),
condition Char(1),
location Char(3),
qty Dec(5,0));
insert into parts
values ('m-123', '1994-01-02', 32),
('1234Cf', '2001-08-09', 3),
('wf9-2', '2016-04-21', 14);
insert into locations
values ('m-123', 'U', 'A02', 2),
('1234Cf', 'S', 'A02', 3),
('m-123', 'U', 'B01', 1),
('wf9-2', 'S', 'A06', 7),
('m-123', 'S', 'A18', 29),
('wf9-2', 'U', 'F16', 7);

The query:

with -- CTE's
-- This collects locations into a comma seperated list
tmp (part_number, condition, location, csv, level) as (
select part_number, condition, min(location),
cast(min(location) as varchar(128)), 1
from locations
group by part_number, condition
union all
select a.part_number, a.condition, b.location,
a.csv || ',' || b.location, a.level + 1
from tmp a
join locations b using (part_number, condition)
where a.csv not like '%' || b.location || '%'
and b.location > a.location),
-- This chooses the correct csv list, and adds quantity for the condition
tmp2 (part_number, condition, csv, qty) as (
select t.part_number, t.condition, t.csv,
(select sum(qty) qty
from locations
where part_number = t.part_number
and condition = t.condition)
from tmp t
where level = (select max(level)
from tmp
where part_number = t.part_number
and condition = t.condition))
-- This is the final select that combines the parts file with
-- the second stage CTE and arranges things horizontally by condition
select p.part_number, p.load_date,
(select sum(qty)
from locations
where part_number = p.part_number) as total_qty,
coalesce(u.csv, '') as u_loc,
coalesce(u.qty, 0) as uqty,
coalesce(s.csv, '') as s_loc,
coalesce(s.qty, 0) as sqty
from parts p
left outer join tmp2 u
on u.part_number = p.part_number and u.condition = 'U'
left outer join tmp2 s
on s.part_number = p.part_number and s.condition = 'S'
order by p.load_date;



EDIT I have had to add some extra bits in here to support more than two locations for a part/condition, and I have made the column naming in the CTEs more consistent. Ok, so let me explain this a bit, there are 3 parts to this quety, 2 CTEs and the query, you can see the three parts are separated by comments. The first CTE is a recursive CTE. It's purpose is to produce the comma separated location list. You should be able to run the select by itself to see just what it does. tmp is the table name, part_number, condition, csv, and level are the column names. A recursive CTE needs a SELECT to prime the CTE and a UNION ALL with a SELECT that fills in the next details. In this case the priming SELECT retrieves a part number, a condition, and the first location (alphabetically) for that combination. level is set to 1. If you run just the priming select, you will get:

part_number  condition  location  csv  level
----------- --------- -------- --- -----
1234Cf S A01 A02 1
m-123 S A18 A18 1
m-123 U A02 A02 1
wf9-2 U F16 F16 1
wf9-2 S A06 A06 1

Note one line per part/condition. The remainder of the recursive CTE will fill in the remaining locations in csv, but it will actually add additional records so we need to filter the results here and later. So records are processed as they are added. The first rows listed above are joined with the location file
on part_number and condition. Note in the priming select I have a cast of the second min(location) to a varchar(128). This leaves room for the CSV column to expand. Without this, it will still expand, but not enough to hold more than 2 locations.

The second select in the recursive CTE concatenates a comma and the next location to the end of CSV. The specific bit that does this is a.csv || ',' || b.location. It also increments the level column. This helps us keep track of where we are in the query. Eventually, the row with the highest level is the one we want to use. We also have a way to end the recursion, and some filters to reduce the number of rows added to the temporary result set. If we have 2 locations, A02 and B02, left unchecked, we will get the following rows: A02, A02,A02, A02,B02, A02,A02,A02, A02,B02,A02, A02,A02,B02, A02,B02,B02, ... ad infinitum. The anti-duplication filter where a.csv not like '%' || b.location || '%' is sufficient for two locations to end the recursion, and minimize rows, like above, for locations A02 and B02, with the anti-duplication filter, we will get rows A02, and A02,B02. Note that none of the other results from the first example with duplicate locations are returned. Adding a third location C02 will yield, with anti-duplication filter, the following rows: A02, A02,B02, A02,C02, A02,B02,C02, A02,C02,B02. No duplicates here, but we do have redundant rows, and as you add locations, it gets worse. This is where we need a way to detect these redundant rows. Since we are starting with the lowest location number, we can always make sure that locations added to CSV are greater than the previously added location. To do that all we need to do is include a column in the result set that indicates which column was added (we could interrogate CSV, but that is harder). This is why we need the location column in tmp. Then we can write filter b.location > a.location. In the above 3 location example, this filter prevents row A02,C02,B02 leaving just a single row with all three locations. Adding more than three locations to the locations file will cause the number of rows to expand even more in TMP, but for each part and condition, there will only be one row with all locations, and it will contain all locations in ascending order.

The second CTE does two things. First, it filters TMP to drop all but the rows containing all locations for a given part/condition. Second, it accumulates the total quantity for each part/condition.

The bit that performs the filtering is in the where clause:

where level = (select max(level)
from tmp
where part_number = t.part_number
and condition = t.condition))

Pretty straight forward. The bit that accumulates the total quantity for a part/condition is also an easy to understand sub-query:

(select sum(qty) qty
from locations
where part_number = t.part_number
and condition = t.condition)

The final piece of this monster query is the main select. It joins the parts file with the results of the second CTE to form the ultimate result set:

select p.part_number, p.load_date, 
(select sum(qty) from locations where part_number = p.part_number) as total_qty,
coalesce(u.csv, '') as u_loc, coalesce(u.qty, 0) as uqty,
coalesce(s.csv, '') as s_loc, coalesce(s.qty, 0) as sqty
from parts p
left outer join tmp2 u
on u.part_number = p.part_number and u.condition = 'U'
left outer join tmp2 s
on s.part_number = p.part_number and s.condition = 'S'
order by p.load_date

Bits of note are the subquery to retrieve the total quantity from the locations table. You could use the tqty field in parts, but that can get out of sync with the actual quantities in the locations table. In addition there are two left outer joins with tmp2, one for condition U, and another for condition S. These construct the horizontal array of Location/Quantity in the result row. The last thing is the coalesce functions. These give null values (when a result from an outer join is missing) a default value.

End of EDIT


The final result is:

part_number  load_date   tqty  u_loc    uqty  s_loc  sqty
----------- ---------- ---- ------- ---- ----- ----
m-123 1994-01-02 32 A02,B01 3 A18 29
1234Cf 2001-08-09 3 0 A02 3
wf9-2 2016-04-21 14 F16 7 A06 7

Note XMLAGG and XMLSERIALIZE became available at DB2 for i v7.1 and LISTAGG became available at DB2 for i v7.2. Most recent version as of 8/9/2017 is v7.3. As you are on v5r4, it is likely you will need not only a software, but also a hardware upgrade to get current.



Related Topics



Leave a reply



Submit