String_Agg Not Behaving as Expected

STRING_AGG not behaving as expected

Yes, this is a Bug (tm), present in all versions of SQL Server 2017 (as of writing). It's fixed in Azure SQL Server and 2019 RC1. Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.

One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
    VALUES
        (1, 'a'),
        (1, 'b')
) x (y, z)
GROUP by y

SQL Server STRING_AGG function sorting is not working as expected

Firstly, I do agree that the behaviour you're getting shouldn't be happening, however, Stack Overflow isn't for reporting bugs with applications. For SQL Server, that should be done in their Azure Feedback portal.

As for resolving the issue, removing the redundant DISTINCT from your COUNT causes the problem to disappear. To implement a DISTINCT (either in a SELECT DISTINCT or a COUNT(DISTINCT {expression})) SQL Server needs to first sort the results as then it can easily remove any values that have the same sort position. As a result that sort is being expressed in your STRING_AGG expressions, even though they have an explicit ORDER BY clause.

The reason I say your DISTINCT is redundant is because at that point in the query there will be no duplicate values of Manager for a given value of ClCode. This is because you already grouped on both Manager and ClCode in the subquery. If you run that query alone, you'll see that Manager doesn't have any duplicates:

WITH tbl AS
    (SELECT Id,
            ClCode,
            Manager,
            ChangeDate
     FROM (VALUES (1, '000005', 'Cierra Vega', '2017-10-05'),
                  (2, '000005', 'Alden Cantrell', '2017-11-29'),
                  (3, '000005', 'Alden Cantrell', '2017-11-30'),
                  (4, '000005', 'Kierra Gentry', '2018-09-05'),
                  (5, '000005', 'Kierra Gentry', '2018-09-12'),
                  (6, '000005', 'Pierre Cox', '2018-11-06'),
                  (7, '000005', 'Thomas Crane', '2019-09-11'),
                  (8, '000005', 'Thomas Crane', '2019-10-01'),
                  (9, '000005', 'Miranda Shaffer', '2020-04-27'),
                  (10, '000360', 'Bradyn Kramer', '2017-10-06')) t (Id, ClCode, Manager, ChangeDate) )
SELECT x.ClCode,
       x.[Manager],
       MIN(x.ChangeDate) AS [MinChangeDate]
FROM tbl x
GROUP BY x.ClCode,
         x.[Manager];

As such, the DISTINCT in the COUNT is just added overhead for the instance, as it's not required (SQL Server has already sorted the data for the GROUP BY so why ask it to sort it again?). If you Are using a DISTINCT in a query you've already aggregated, then you very likely don't need it.

Why is my ORDER BY in STRING_AGG not always working?

This appears to be a bug in the optimizer.

The optimizer, having realized that the join is a self-join, is transforming it into a window aggregate. It can do this despite STRING_AGG not being available as a window aggregate. The rule is called GenGbApplySimple, and allows a self-join to be converted to a window aggregate. There is nothing specifically wrong with this so far.

Plan

PasteThePlan

The problem is that the aggregation is over the wrong value. It is aggregating the outer value rather than the inner one.

If you give the two references different aliases, then a careful examination of the query plan reveals the bug.

STRING_AGG([dbo].[HashTable].[Hash] as [HT1].[Hash],'')
WITHIN GROUP (ORDER BY [HT2].[Hash])

The other issue is that the aggregates used with that rule (e.g. MIN, MAX, AVG) don't have a WITHIN GROUP ordering to satisfy, so the replacement plan doesn't account for it. It seems likely that STRING_AGG was not intended to work the GbApply rules, or work would be needed to make it compatible (honouring the sort request).

As you can see below, the Sort only orders by the correlation column GroupIdentifier, not by the Hash column used in the WITHIN GROUP.

<OrderBy>
  <OrderByColumn Ascending="1">
    <ColumnReference
      Database="[...]"
      Schema="[dbo]"
      Table="[HashTable]"
      Alias="[HT1]"
      Column="GroupIdentifier">
    </ColumnReference>
  </OrderByColumn>
</OrderBy>

If you are a sysadmin, you can turn this rule off for the query, by using the following undocumented OPTION.

OPTION (QUERYRULEOFF GenGbApplySimple)

As a workaround, one option to prevent this optimization being applied is to use a grouped OUTER APPLY

UPDATE HT1
SET GroupHashList = C.HashList
OUTPUT inserted.*
FROM HashTable AS HT1
OUTER APPLY
(
    SELECT
        HashList =
            STRING_AGG(HT2.[Hash], ';')
                WITHIN GROUP (ORDER BY HT2.[Hash] ASC)
    FROM HashTable AS HT2
    WHERE HT2.GroupIdentifier = HT1.GroupIdentifier
) C;

This gets you a pretty straightforward self-join with a Stream Aggregate.

db<>fiddle

I strongly suggest you file this as a bug with Microsoft.

You could also leave feedback, but that does not typically lead to a specific response.

As an aside, you should follow the aliasing rules suggested by Conor Cunningham when writing multi-table UPDATE statements:

The non-ANSI FROM clause (which you are using here) has specific binding behaviors that may or may not be what you expect. I will suggest you start by aliasing the 3 references to hashtable to be different and then make sure you are explicitly refering to the one you want. It may be (I am guessing) that it is binding to a different one than you think and providing you an undesired output as a result.

order by in string_agg does not seems to work

Your line_no is varchar, as you can typically notice

'1' < '14' < '16 < '17' < '2'

So, just simply parse the varchar into int solve the problem.

select recid, 
STRING_AGG(DefaultDimension, '-')  WITHIN GROUP (ORDER BY CAST(line_no AS int) ASC) DefaultDimension,
STRING_AGG(DefaultDimensionName, '-') WITHIN GROUP (ORDER BY CAST(line_no AS int)ASC) DefaultDimensionName 
from #tmp
group by recid

T-SQL STRING_AGG problems dunno if bad writing or just not working

I don't think you want to group by InventoryId if that's what you're concatenating... Try this:

Edit, you need to remove columns that are different from row-to-row.

SELECT p.FirstName [Spelers Voornaam]
    ,p.LastName [Spelers Achternaam]
    ,pa.FamilyName [Familie's Groeps Naam]
    ,string_agg (i.InventoryId, ',') as [In Inventory]
FROM Player AS p
LEFT JOIN PlayerAvatar AS pa ON p.PlayerId = pa.PlayerId
LEFT JOIN Avatar AS Av ON pa.AvatarId = Av.AvatarId
LEFT JOIN Avatar AS a ON pa.AvatarId = a.AvatarId
LEFT JOIN Inventory as i on  i.InventoryId = pa.InventoryId
LEFT JOIN Item as it on it.ItemId = i.ItemId
GROUP BY p.FirstName, p.LastName, pa.AvatarName, pa.FamilyName, av.Type

Or you can aggregate those columns too.

SELECT p.FirstName [Spelers Voornaam]
    ,p.LastName [Spelers Achternaam]
    ,string_agg(pa.AvatarName,',') [Spelers Avatarnaam]
    ,pa.FamilyName [Familie's Groeps Naam]
    ,string_agg(Av.Type,',') [Avatar's Type]
    ,string_agg (i.InventoryId, ',') as [In Inventory]

FROM Player AS p
LEFT JOIN PlayerAvatar AS pa ON p.PlayerId = pa.PlayerId
LEFT JOIN Avatar AS Av ON pa.AvatarId = Av.AvatarId
LEFT JOIN Avatar AS a ON pa.AvatarId = a.AvatarId
LEFT JOIN Inventory as i on  i.InventoryId = pa.InventoryId
LEFT JOIN Item as it on it.ItemId = i.ItemId
GROUP BY p.FirstName, p.LastName, pa.FamilyName,

STRING_AGG working on compatibility level 140

This behavior is documented at the very last line of the remarks section:

STRING_AGG is available in any compatibility level.

Which means that if you are running an SQL Server version that supports string_agg -

SQL Server 2017 (or higher)
Azure SQL Database
Azure Synapse Analytics (SQL DW)

the string_agg built in function will work regardless of the compatibility level that set to the specific database you're working with.

SQL Server - STRING_AGG separator as conditional expression

You are almost there. Just reverse the order and use stuff and you can eliminate the need for a cte and most of the string functions:

SELECT STUFF(
           STRING_AGG(
               (IIF(i % 2 = 0, ' OR ', ' AND '))+c
           , '') WITHIN GROUP (ORDER BY i)
           , 1, 5, '') AS r
FROM t;

Results: a OR b AND c OR d AND e

db<>fiddle demo

Since the first row i % 2 equals 1, you know the string_agg result will always start with and: and a or b...
Then all you do is remove the first 5 chars from that using stuff and you're home free.

I've also taken the liberty to replace the CASE expression with the shorter IIF

Update

Well, in the case the selected separator is not known in advance, I couldn't come up with a single query solution, but I still think I found a simpler solution than you've posted - separating my initial solution to a cte with the string_agg and a select from it with the stuff, while determining the length of the delimiter by repeating the condition:

WITH CTE AS
(
SELECT MIN(i) As firstI,
       STRING_AGG(
               (IIF(i % 2 = 0, ' OR ', ' AND '))+c
           , '') WITHIN GROUP (ORDER BY i)
       AS r
FROM t
)

SELECT STUFF(r, 1, IIF(firstI % 2 = 0, 4, 5), '') AS r
FROM CTE;

db<>fiddle demo #2

Use String_AGG to query with condition in SQL?

I finally found the most accurate answer to my question. Thanks all!
The best solution is:

ALTER PROC FindPolicyByService
    @codes varchar(200)
AS 
BEGIN
   SELECT p.ID AS PolicyID,
       p.Code AS PolicyCode,
       p.Name AS PolicyName,
       STRING_AGG(s.Code, ',') AS ServiceCode
    FROM dbo.DXBusinessPolicy_Policy AS p
    JOIN dbo.DXBusinessPolicy_PolicyService AS ps ON p.ID = ps.PolicyID
    JOIN dbo.DXBusinessPolicy_Service AS s ON ps.ServiceID = s.ID
    WHERE p.ID IN 
    (
        SELECT subps.PolicyID
        FROM dbo.DXBusinessPolicy_PolicyService AS subps
        JOIN dbo.DXBusinessPolicy_Service AS subs ON subps.ServiceID = subs.ID
        WHERE subs.Code = @ServiceCode
    )
    GROUP by p.ID, p.Code, p.Name
END

Or the orther solution:

ALTER PROC FindPolicyByService
    @ServiceCode varchar(200)
AS 
BEGIN
    SELECT DIstinct policy.ID AS PolicyID,
           policy.Code AS PolicyCode,
           policy.Name AS PolicyName,
           (SELECT STRING_AGG(tempService.Code, ',')  FROM dbo.DXBusinessPolicy_Policy tempPolicy
                 JOIN dbo.DXBusinessPolicy_PolicyService tempPolicyService
                    ON tempPolicy.ID = tempPolicyService.PolicyID 
                 JOIN dbo.DXBusinessPolicy_Service tempService
                    ON tempPolicyService.ServiceID = tempService.ID
            WHERE policyservice.PolicyID = PolicyID) AS ServiceCode
    FROM dbo.DXBusinessPolicy_Policy policy
     JOIN dbo.DXBusinessPolicy_PolicyService policyservice
        ON policy.ID = policyservice.PolicyID 
     JOIN dbo.DXBusinessPolicy_Service service
        ON policyservice.ServiceID = service.ID AND service.Code = @ServiceCode
    GROUP BY
        policy.ID,
        policyservice.PolicyID,
        policy.Code,
        policy.Name
END

It all gave me the same result I expected:

PolicyCode	PolicyName	Services
COMBO.2103001	[Giá nền] T9/2020 #1	INT,IPTV
INT.2103001	Chính sách 2	INT

SQL how to prevent duplicates in STRING_AGG when joining multiple tables

One approach in this sort of situation is to STRING_AGG each of the constituent tables (or sets of tables) first; then, LEFT JOIN those contrived tables onto the main table. This sidesteps the problem of multiplication that can occur during consecutive LEFT JOINs.

In your case, try something like this:

SELECT
       shows.*,
       show_genre_names.genre_names,
       show_actors.actor_ids,
       show_actors.actor_names
FROM
       shows
       LEFT JOIN 
              ( -- one row per show_id
              SELECT
                     sg.show_id,
                     STRING_AGG(g.name, ', ') AS genre_names
              FROM
                     show_genres sg
                     JOIN genres g ON g.id = sg.genre_id
              GROUP BY
                     sg.show_id
              ) show_genre_names
              ON shows.id = show_genre_names.show_id 
       LEFT JOIN
              ( -- one row per show_id
              SELECT
                     sc.show_id,
                     STRING_AGG(a.id, ', ') AS actor_ids,
                     STRING_AGG(a.name, ', ') AS actor_names
              FROM
                     show_characters sc
                     JOIN actors a ON a.id = sc.actor_id
              GROUP BY
                     sc.show_id
              ) show_actors
              ON shows.id = show_actors.show_id
WHERE
       shows.id = 1390
;

You can solve this in other ways, too, but understanding this technique will be helpful in your SQL journey.

String_Agg Not Behaving as Expected