Produce Distinct Values in String_Agg

Produce DISTINCT values in STRING_AGG

Here is one way to do it.

Since you want the distinct counts as well, it can be done simply by grouping the rows twice. The first GROUP BY will remove duplicates, the second GROUP BY will produce the final result.

WITH
Sitings
AS
(
SELECT * FROM (VALUES
(1, 'Florida', 'Orlando', 'bird'),
(2, 'Florida', 'Orlando', 'dog'),
(3, 'Arizona', 'Phoenix', 'bird'),
(4, 'Arizona', 'Phoenix', 'dog'),
(5, 'Arizona', 'Phoenix', 'bird'),
(6, 'Arizona', 'Phoenix', 'bird'),
(7, 'Arizona', 'Phoenix', 'bird'),
(8, 'Arizona', 'Flagstaff', 'dog')
) F (ID, State, City, Siting)
)
,CTE_Animals
AS
(
SELECT
State, City, Siting
FROM Sitings
GROUP BY State, City, Siting
)
SELECT
State, City, COUNT(1) AS [# Of Sitings], STRING_AGG(Siting,',') AS Animals
FROM CTE_Animals
GROUP BY State, City
ORDER BY
State
,City
;

Result

+---------+-----------+--------------+----------+
| State | City | # Of Sitings | Animals |
+---------+-----------+--------------+----------+
| Arizona | Flagstaff | 1 | dog |
| Arizona | Phoenix | 2 | bird,dog |
| Florida | Orlando | 2 | bird,dog |
+---------+-----------+--------------+----------+

If you are still getting an error message about exceeding 8000 characters, then cast the values to varchar(max) before STRING_AGG.

Something like

STRING_AGG(CAST(Siting AS varchar(max)),',') AS Animals

Get unique values using STRING_AGG in SQL Server

Use the DISTINCT keyword in a subquery to remove duplicates before combining the results: SQL Fiddle

SELECT 
ProjectID
,STRING_AGG(value, ',') WITHIN GROUP (ORDER BY value) AS
NewField
from (
select distinct ProjectId, newId.value
FROM [dbo].[Data] WITH(NOLOCK)
CROSS APPLY STRING_SPLIT([bID],';') AS newID
WHERE newID.value IN ( 'O95833' , 'Q96NY7-2' )
) x
GROUP BY ProjectID
ORDER BY ProjectID

How to use DISTINCT with string_agg() and to_timestamp()?

DISTINCT is neither a function nor an operator but an SQL construct or syntax element. Can be added as leading keyword to the whole SELECT list or within most aggregate functions.

Add it to the SELECT list (consisting of a single column in your case) in a subselect where you can also cheaply add ORDER BY. Should yield best performance:

SELECT string_agg(to_char(the_date, 'DD-MM-YYYY'), ',') AS the_dates
FROM (
SELECT DISTINCT to_timestamp(from_date / 1000)::date AS the_date
FROM trn_day_bookkeeping_income_expense
WHERE enterprise_id = 5134650
ORDER BY the_date -- assuming this is the order you want
) sub;

First generate dates (multiple distinct values may result in the same date!).

Then the DISTINCT step (or GROUP BY).

(While being at it, optionally add ORDER BY.)

Finally aggregate.

An index on (enterprise_id) or better (enterprise_id, from_date) should greatly improve performance.

Ideally, timestamps are stored as type timestamp to begin with. Or timestamptz. See:

  • Ignoring time zones altogether in Rails and PostgreSQL

DISTINCT ON is a Postgres-specific extension of standard SQL DISTINCT functionality. See:

  • Select first row in each GROUP BY group?

Alternatively, you could also add DISTINCT(and ORDER BY) to the aggregate function string_agg() directly:

SELECT string_agg(DISTINCT to_char(to_timestamp(from_date / 1000), 'DD-MM-YYYY'), ',' ORDER BY to_char(to_timestamp(from_date / 1000), 'DD-MM-YYYY')) AS the_dates
FROM trn_day_bookkeeping_income_expense
WHERE enterprise_id = 5134650

But that would be ugly, hard to read and maintain, and more expensive. (Test with EXPLAIN ANALYZE).

SQL Server; How to incorporate unique values from STRING_AGG?

Just put it in a subquery with DISTINCT

SELECT
#fact1.dim1Key,
#fact1.factvalue1,
#fact1.groupKey,
#dim1.attributeTwo,
#dim1.attributeThree,
ISNULL(#dim2.attributeOne, '<missing>')
FROM #fact1
JOIN #dim1 ON #dim1.dim1key = #fact1.dim1key
CROSS APPLY (
SELECT
attributeOne = STRING_AGG(ISNULL(d2.attributeOne, '<missing>'), ', ') WITHIN GROUP (ORDER BY d2.attributeOne)
FROM (
SELECT DISTINCT
#dim2.attributeOne
FROM #bridge b
JOIN #dim2 ON #dim2.dim2key = b.dim2key
WHERE b.groupKey = #fact1.groupKey
) d2
) #dim2

Distinct inside String_Agg

Your GROUP BY should have the unaggregated columns in the SELECT. STRING_AGG() is an aggregation function. So you probably want:

select f.FagID, FagNavn, STRING_AGG(l.LaererID, ',') as Lærer
from Fag$ f join
Lektion$ l
on f.FagID = l.FagID
group by f.FagID, FagNavn;

If you still duplicates (which seems unlikely with an id, then you need to use a subquery of some sort to remove the duplicates.

Find and concatenate all & distinct values in one query

I used one column to illustrate the solution but you got the idea:

select id,
STRING_AGG(FirstEvent,';') as FirstEvent
, STRING_AGG(case when rw = 1 then FirstEvent else null end,';') as allevents
from (
select * , row_number() over (partition by id,firstevent order by id) rw
from xx
) t
group by t.id

Getting a distinct comma-delimited string in T-SQL

Like this:

with q as
(
SELECT * FROM (VALUES
(1, 'Alice', 'Alice'),
(2, 'Robert', 'Robert'),
(3, 'Robert', 'Bob'),
(4, 'Richard', 'Rich'),
(5, 'Richard', 'Rick'),
(6, 'Richard', 'Dick')
) tmp (ID, RealName, NickName)
)
select RealName,
string_agg(case when RealName <> NickName then NickName else null end, ', ') AKA
from q
group by RealName

outputs

RealName AKA
-------- ---------------------
Alice NULL
Richard Rick, Rich, Dick
Robert Bob


Related Topics



Leave a reply



Submit