STRING_AGG not behaving as expected
Yes, this is a Bug (tm), present in all versions of SQL Server 2017 (as of writing). It's fixed in Azure SQL Server and 2019 RC1. Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>)
identical as long as x
matches, no matter what <separator>
is, and unifies these with the first calculated expression in the query.
One workaround is to make sure x
does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
SQL Server STRING_AGG function sorting is not working as expected
Firstly, I do agree that the behaviour you're getting shouldn't be happening, however, Stack Overflow isn't for reporting bugs with applications. For SQL Server, that should be done in their Azure Feedback portal.
As for resolving the issue, removing the redundant DISTINCT
from your COUNT
causes the problem to disappear. To implement a DISTINCT
(either in a SELECT DISTINCT
or a COUNT(DISTINCT {expression})
) SQL Server needs to first sort the results as then it can easily remove any values that have the same sort position. As a result that sort is being expressed in your STRING_AGG
expressions, even though they have an explicit ORDER BY
clause.
The reason I say your DISTINCT
is redundant is because at that point in the query there will be no duplicate values of Manager
for a given value of ClCode
. This is because you already grouped on both Manager
and ClCode
in the subquery. If you run that query alone, you'll see that Manager
doesn't have any duplicates:
WITH tbl AS
(SELECT Id,
ClCode,
Manager,
ChangeDate
FROM (VALUES (1, '000005', 'Cierra Vega', '2017-10-05'),
(2, '000005', 'Alden Cantrell', '2017-11-29'),
(3, '000005', 'Alden Cantrell', '2017-11-30'),
(4, '000005', 'Kierra Gentry', '2018-09-05'),
(5, '000005', 'Kierra Gentry', '2018-09-12'),
(6, '000005', 'Pierre Cox', '2018-11-06'),
(7, '000005', 'Thomas Crane', '2019-09-11'),
(8, '000005', 'Thomas Crane', '2019-10-01'),
(9, '000005', 'Miranda Shaffer', '2020-04-27'),
(10, '000360', 'Bradyn Kramer', '2017-10-06')) t (Id, ClCode, Manager, ChangeDate) )
SELECT x.ClCode,
x.[Manager],
MIN(x.ChangeDate) AS [MinChangeDate]
FROM tbl x
GROUP BY x.ClCode,
x.[Manager];
As such, the DISTINCT
in the COUNT
is just added overhead for the instance, as it's not required (SQL Server has already sorted the data for the GROUP BY
so why ask it to sort it again?). If you Are using a DISTINCT
in a query you've already aggregated, then you very likely don't need it.
Why is my ORDER BY in STRING_AGG not always working?
This appears to be a bug in the optimizer.
The optimizer, having realized that the join is a self-join, is transforming it into a window aggregate. It can do this despite STRING_AGG
not being available as a window aggregate. The rule is called GenGbApplySimple, and allows a self-join to be converted to a window aggregate. There is nothing specifically wrong with this so far.
PasteThePlan
The problem is that the aggregation is over the wrong value. It is aggregating the outer value rather than the inner one.
If you give the two references different aliases, then a careful examination of the query plan reveals the bug.
STRING_AGG([dbo].[HashTable].[Hash] as [HT1].[Hash],'')
WITHIN GROUP (ORDER BY [HT2].[Hash])
The other issue is that the aggregates used with that rule (e.g. MIN
, MAX
, AVG
) don't have a WITHIN GROUP
ordering to satisfy, so the replacement plan doesn't account for it. It seems likely that STRING_AGG
was not intended to work the GbApply rules, or work would be needed to make it compatible (honouring the sort request).
As you can see below, the Sort only orders by the correlation column GroupIdentifier
, not by the Hash
column used in the WITHIN GROUP
.
<OrderBy>
<OrderByColumn Ascending="1">
<ColumnReference
Database="[...]"
Schema="[dbo]"
Table="[HashTable]"
Alias="[HT1]"
Column="GroupIdentifier">
</ColumnReference>
</OrderByColumn>
</OrderBy>
If you are a sysadmin
, you can turn this rule off for the query, by using the following undocumented OPTION
.
OPTION (QUERYRULEOFF GenGbApplySimple)
As a workaround, one option to prevent this optimization being applied is to use a grouped OUTER APPLY
UPDATE HT1
SET GroupHashList = C.HashList
OUTPUT inserted.*
FROM HashTable AS HT1
OUTER APPLY
(
SELECT
HashList =
STRING_AGG(HT2.[Hash], ';')
WITHIN GROUP (ORDER BY HT2.[Hash] ASC)
FROM HashTable AS HT2
WHERE HT2.GroupIdentifier = HT1.GroupIdentifier
) C;
This gets you a pretty straightforward self-join with a Stream Aggregate.
db<>fiddle
I strongly suggest you file this as a bug with Microsoft.
You could also leave feedback, but that does not typically lead to a specific response.
As an aside, you should follow the aliasing rules suggested by Conor Cunningham when writing multi-table UPDATE
statements:
The non-ANSI FROM clause (which you are using here) has specific binding behaviors that may or may not be what you expect. I will suggest you start by aliasing the 3 references to hashtable to be different and then make sure you are explicitly refering to the one you want. It may be (I am guessing) that it is binding to a different one than you think and providing you an undesired output as a result.
order by in string_agg does not seems to work
Your line_no
is varchar
, as you can typically notice
'1' < '14' < '16 < '17' < '2'
So, just simply parse the varchar into int solve the problem.
select recid,
STRING_AGG(DefaultDimension, '-') WITHIN GROUP (ORDER BY CAST(line_no AS int) ASC) DefaultDimension,
STRING_AGG(DefaultDimensionName, '-') WITHIN GROUP (ORDER BY CAST(line_no AS int)ASC) DefaultDimensionName
from #tmp
group by recid
T-SQL STRING_AGG problems dunno if bad writing or just not working
I don't think you want to group by InventoryId if that's what you're concatenating... Try this:
Edit, you need to remove columns that are different from row-to-row.
SELECT p.FirstName [Spelers Voornaam]
,p.LastName [Spelers Achternaam]
,pa.FamilyName [Familie's Groeps Naam]
,string_agg (i.InventoryId, ',') as [In Inventory]
FROM Player AS p
LEFT JOIN PlayerAvatar AS pa ON p.PlayerId = pa.PlayerId
LEFT JOIN Avatar AS Av ON pa.AvatarId = Av.AvatarId
LEFT JOIN Avatar AS a ON pa.AvatarId = a.AvatarId
LEFT JOIN Inventory as i on i.InventoryId = pa.InventoryId
LEFT JOIN Item as it on it.ItemId = i.ItemId
GROUP BY p.FirstName, p.LastName, pa.AvatarName, pa.FamilyName, av.Type
Or you can aggregate those columns too.
SELECT p.FirstName [Spelers Voornaam]
,p.LastName [Spelers Achternaam]
,string_agg(pa.AvatarName,',') [Spelers Avatarnaam]
,pa.FamilyName [Familie's Groeps Naam]
,string_agg(Av.Type,',') [Avatar's Type]
,string_agg (i.InventoryId, ',') as [In Inventory]
FROM Player AS p
LEFT JOIN PlayerAvatar AS pa ON p.PlayerId = pa.PlayerId
LEFT JOIN Avatar AS Av ON pa.AvatarId = Av.AvatarId
LEFT JOIN Avatar AS a ON pa.AvatarId = a.AvatarId
LEFT JOIN Inventory as i on i.InventoryId = pa.InventoryId
LEFT JOIN Item as it on it.ItemId = i.ItemId
GROUP BY p.FirstName, p.LastName, pa.FamilyName,
STRING_AGG working on compatibility level 140
This behavior is documented at the very last line of the remarks section:
STRING_AGG is available in any compatibility level.
Which means that if you are running an SQL Server version that supports string_agg
-
- SQL Server 2017 (or higher)
- Azure SQL Database
- Azure Synapse Analytics (SQL DW)
the string_agg
built in function will work regardless of the compatibility level that set to the specific database you're working with.
SQL Server - STRING_AGG separator as conditional expression
You are almost there. Just reverse the order and use stuff and you can eliminate the need for a cte and most of the string functions:
SELECT STUFF(
STRING_AGG(
(IIF(i % 2 = 0, ' OR ', ' AND '))+c
, '') WITHIN GROUP (ORDER BY i)
, 1, 5, '') AS r
FROM t;
Results: a OR b AND c OR d AND e
db<>fiddle demo
Since the first row i % 2
equals 1
, you know the string_agg
result will always start with and
: and a or b...
Then all you do is remove the first 5 chars from that using stuff
and you're home free.
I've also taken the liberty to replace the CASE
expression with the shorter IIF
Update
Well, in the case the selected separator is not known in advance, I couldn't come up with a single query solution, but I still think I found a simpler solution than you've posted - separating my initial solution to a cte with the string_agg
and a select from it with the stuff
, while determining the length of the delimiter by repeating the condition:
WITH CTE AS
(
SELECT MIN(i) As firstI,
STRING_AGG(
(IIF(i % 2 = 0, ' OR ', ' AND '))+c
, '') WITHIN GROUP (ORDER BY i)
AS r
FROM t
)
SELECT STUFF(r, 1, IIF(firstI % 2 = 0, 4, 5), '') AS r
FROM CTE;
db<>fiddle demo #2
Use String_AGG to query with condition in SQL?
I finally found the most accurate answer to my question. Thanks all!
The best solution is:
ALTER PROC FindPolicyByService
@codes varchar(200)
AS
BEGIN
SELECT p.ID AS PolicyID,
p.Code AS PolicyCode,
p.Name AS PolicyName,
STRING_AGG(s.Code, ',') AS ServiceCode
FROM dbo.DXBusinessPolicy_Policy AS p
JOIN dbo.DXBusinessPolicy_PolicyService AS ps ON p.ID = ps.PolicyID
JOIN dbo.DXBusinessPolicy_Service AS s ON ps.ServiceID = s.ID
WHERE p.ID IN
(
SELECT subps.PolicyID
FROM dbo.DXBusinessPolicy_PolicyService AS subps
JOIN dbo.DXBusinessPolicy_Service AS subs ON subps.ServiceID = subs.ID
WHERE subs.Code = @ServiceCode
)
GROUP by p.ID, p.Code, p.Name
END
Or the orther solution:
ALTER PROC FindPolicyByService
@ServiceCode varchar(200)
AS
BEGIN
SELECT DIstinct policy.ID AS PolicyID,
policy.Code AS PolicyCode,
policy.Name AS PolicyName,
(SELECT STRING_AGG(tempService.Code, ',') FROM dbo.DXBusinessPolicy_Policy tempPolicy
JOIN dbo.DXBusinessPolicy_PolicyService tempPolicyService
ON tempPolicy.ID = tempPolicyService.PolicyID
JOIN dbo.DXBusinessPolicy_Service tempService
ON tempPolicyService.ServiceID = tempService.ID
WHERE policyservice.PolicyID = PolicyID) AS ServiceCode
FROM dbo.DXBusinessPolicy_Policy policy
JOIN dbo.DXBusinessPolicy_PolicyService policyservice
ON policy.ID = policyservice.PolicyID
JOIN dbo.DXBusinessPolicy_Service service
ON policyservice.ServiceID = service.ID AND service.Code = @ServiceCode
GROUP BY
policy.ID,
policyservice.PolicyID,
policy.Code,
policy.Name
END
It all gave me the same result I expected:
PolicyCode | PolicyName | Services |
---|---|---|
COMBO.2103001 | [Giá nền] T9/2020 #1 | INT,IPTV |
INT.2103001 | Chính sách 2 | INT |
SQL how to prevent duplicates in STRING_AGG when joining multiple tables
One approach in this sort of situation is to STRING_AGG each of the constituent tables (or sets of tables) first; then, LEFT JOIN those contrived tables onto the main table. This sidesteps the problem of multiplication that can occur during consecutive LEFT JOINs.
In your case, try something like this:
SELECT
shows.*,
show_genre_names.genre_names,
show_actors.actor_ids,
show_actors.actor_names
FROM
shows
LEFT JOIN
( -- one row per show_id
SELECT
sg.show_id,
STRING_AGG(g.name, ', ') AS genre_names
FROM
show_genres sg
JOIN genres g ON g.id = sg.genre_id
GROUP BY
sg.show_id
) show_genre_names
ON shows.id = show_genre_names.show_id
LEFT JOIN
( -- one row per show_id
SELECT
sc.show_id,
STRING_AGG(a.id, ', ') AS actor_ids,
STRING_AGG(a.name, ', ') AS actor_names
FROM
show_characters sc
JOIN actors a ON a.id = sc.actor_id
GROUP BY
sc.show_id
) show_actors
ON shows.id = show_actors.show_id
WHERE
shows.id = 1390
;
You can solve this in other ways, too, but understanding this technique will be helpful in your SQL journey.
Related Topics
Dividing 2 Numbers in SQL Server
Sql Server Login Disable Windows Authentication
Datareader.Getfieldtype Returned Null
How to Remove The Default Value from a Column in Oracle
Why Is There a Scan on My Clustered Index
How to Set a Default Value for One Column in SQL Based on Another Column
Datetime Query on Only Year in SQL Server
Sql Server: Do I Need to Use Go Statements Between Batches
How to Add a Running Count to Rows in a 'streak' of Consecutive Days
Looping Through Recordset with Vba
How to Handle 'Optional' Where Clause Filters in Sql
Best Way to Change Clustered Index (Pk) in SQL 2005
Difference of Create Index by Using Include Column or Not Using
How to Get Rightmost 10 Places of a String in Oracle
How to Specify an Input SQL File with Bcp
Postgresql: Table Name/Schema Confusion
How to Update an Xml Attribute Value in an Xml Variable Using T-Sql