Get Unique Values Using String_Agg in SQL Server

Get unique values using STRING_AGG in SQL Server

Use the DISTINCT keyword in a subquery to remove duplicates before combining the results: SQL Fiddle

SELECT 
ProjectID
,STRING_AGG(value, ',') WITHIN GROUP (ORDER BY value) AS
NewField
from (
select distinct ProjectId, newId.value
FROM [dbo].[Data] WITH(NOLOCK)
CROSS APPLY STRING_SPLIT([bID],';') AS newID
WHERE newID.value IN ( 'O95833' , 'Q96NY7-2' )
) x
GROUP BY ProjectID
ORDER BY ProjectID

Produce DISTINCT values in STRING_AGG

Here is one way to do it.

Since you want the distinct counts as well, it can be done simply by grouping the rows twice. The first GROUP BY will remove duplicates, the second GROUP BY will produce the final result.

WITH
Sitings
AS
(
SELECT * FROM (VALUES
(1, 'Florida', 'Orlando', 'bird'),
(2, 'Florida', 'Orlando', 'dog'),
(3, 'Arizona', 'Phoenix', 'bird'),
(4, 'Arizona', 'Phoenix', 'dog'),
(5, 'Arizona', 'Phoenix', 'bird'),
(6, 'Arizona', 'Phoenix', 'bird'),
(7, 'Arizona', 'Phoenix', 'bird'),
(8, 'Arizona', 'Flagstaff', 'dog')
) F (ID, State, City, Siting)
)
,CTE_Animals
AS
(
SELECT
State, City, Siting
FROM Sitings
GROUP BY State, City, Siting
)
SELECT
State, City, COUNT(1) AS [# Of Sitings], STRING_AGG(Siting,',') AS Animals
FROM CTE_Animals
GROUP BY State, City
ORDER BY
State
,City
;

Result

+---------+-----------+--------------+----------+
| State | City | # Of Sitings | Animals |
+---------+-----------+--------------+----------+
| Arizona | Flagstaff | 1 | dog |
| Arizona | Phoenix | 2 | bird,dog |
| Florida | Orlando | 2 | bird,dog |
+---------+-----------+--------------+----------+

If you are still getting an error message about exceeding 8000 characters, then cast the values to varchar(max) before STRING_AGG.

Something like

STRING_AGG(CAST(Siting AS varchar(max)),',') AS Animals

SQL: How to get distinct values out of string_Agg() function?

You need to subquery it and group again. Note that DISTINCT is not a function, it acts over the whole resultset, and is the same as grouping by all column.

SELECT 
ID
, string_agg(Code, ',') AS Code
, [Year]
FROM (
SELECT
p.ID
, PIT.Code AS Code
, year(PT.Date) AS Year
FROM fact.PreT PT
INNER JOIN dim.ProdIType PIT
ON PIT.ProdITypeSKey = PT.ProdITypeSKey
INNER JOIN dim.Proudct P
ON P.ProductSKey = pt.ProductSKey
WHERE p.ID = '15'
GROUP BY p.ID, year(PT.Date), PIT.Code
) p
GROUP BY p.ID, PT.Year;

Get unique value using STRING_AGG in SQL Server 2017

You can go for first getting unique values and then applying string aggregate like below:

;WITH CTE_UniqueValues
(
SELECT Reported_Name, Entry, MAX(ID) AS ID
FROM Table1
GROUP BY Reported_Name, Entry
)
SELECT T1.REPORTED_NAME, STRING_AGG(CAST(T1.ENTRY AS NVARCHAR(MAX)),',') AS Average_Str
FROM CTE_UniqueValues T1
INNER JOIN Table2 T2 ON T1.ID = T2.ProdID
WHERE T1.ENTRY like '%[A-Za-z]%'
GROUP BY T1.REPORTED_NAME
ORDER BY T1.REPORTED_NAME

SQL Server; How to incorporate unique values from STRING_AGG?

Just put it in a subquery with DISTINCT

SELECT
#fact1.dim1Key,
#fact1.factvalue1,
#fact1.groupKey,
#dim1.attributeTwo,
#dim1.attributeThree,
ISNULL(#dim2.attributeOne, '<missing>')
FROM #fact1
JOIN #dim1 ON #dim1.dim1key = #fact1.dim1key
CROSS APPLY (
SELECT
attributeOne = STRING_AGG(ISNULL(d2.attributeOne, '<missing>'), ', ') WITHIN GROUP (ORDER BY d2.attributeOne)
FROM (
SELECT DISTINCT
#dim2.attributeOne
FROM #bridge b
JOIN #dim2 ON #dim2.dim2key = b.dim2key
WHERE b.groupKey = #fact1.groupKey
) d2
) #dim2

How to use DISTINCT with string_agg() and to_timestamp()?

DISTINCT is neither a function nor an operator but an SQL construct or syntax element. Can be added as leading keyword to the whole SELECT list or within most aggregate functions.

Add it to the SELECT list (consisting of a single column in your case) in a subselect where you can also cheaply add ORDER BY. Should yield best performance:

SELECT string_agg(to_char(the_date, 'DD-MM-YYYY'), ',') AS the_dates
FROM (
SELECT DISTINCT to_timestamp(from_date / 1000)::date AS the_date
FROM trn_day_bookkeeping_income_expense
WHERE enterprise_id = 5134650
ORDER BY the_date -- assuming this is the order you want
) sub;

First generate dates (multiple distinct values may result in the same date!).

Then the DISTINCT step (or GROUP BY).

(While being at it, optionally add ORDER BY.)

Finally aggregate.

An index on (enterprise_id) or better (enterprise_id, from_date) should greatly improve performance.

Ideally, timestamps are stored as type timestamp to begin with. Or timestamptz. See:

  • Ignoring time zones altogether in Rails and PostgreSQL

DISTINCT ON is a Postgres-specific extension of standard SQL DISTINCT functionality. See:

  • Select first row in each GROUP BY group?

Alternatively, you could also add DISTINCT(and ORDER BY) to the aggregate function string_agg() directly:

SELECT string_agg(DISTINCT to_char(to_timestamp(from_date / 1000), 'DD-MM-YYYY'), ',' ORDER BY to_char(to_timestamp(from_date / 1000), 'DD-MM-YYYY')) AS the_dates
FROM trn_day_bookkeeping_income_expense
WHERE enterprise_id = 5134650

But that would be ugly, hard to read and maintain, and more expensive. (Test with EXPLAIN ANALYZE).



Related Topics



Leave a reply



Submit