SQL Split Column Based on 1 or More Possible Delimiter and Insert in New Table

SQL Split column based on 1 or more possible delimiter and insert in new table

To split and obtain a specific value, I prefer to use a user-defined function.

Public Function SplitString(str As String, delimiter As String, count As Integer) As String
Dim strArr() As String
strArr = Split(str, delimiter, count + 1)
count = count - 1 'zero-based
If UBound(strArr) >= count Then
SplitString = strArr(count)
End If
End Function

After this, you can adjust your SQL to the following:

SELECT * INTO importeddata
FROM (
SELECT SplitString(column_value, ',', 1), id
FROM SourceData
WHERE SplitString(column_value, ',', 1) <> ''
UNION ALL
SELECT SplitString(column_value, ',', 2), id
FROM SourceData
WHERE SplitString(column_value, ',', 2) <> ''
UNION ALL
SELECT SplitString(column_value, ',', 3), id
FROM SourceData
WHERE SplitString(column_value, ',', 3) <> ''
) AS A

If you really want an all-SQL solution, let me demonstrate to you how this can be achieved, and why this is a bad plan.

For this example, I've written the following code to automatically generate the appropriate SQL expression

Public Sub GenerateSQLSplit(str As String, Delimiter As String, Count As Integer)
Dim i As Integer
If Count = 1 Then
Debug.Print "IIf(InStr(1, " & str & ", " & Delimiter & ") = -1, " & str & ", Left(" & str & ", InStr(1, " & str & ", " & Delimiter & ") - 1))"
Else
Dim strPrevious As String
Dim strNext As String
strPrevious = "InStr(1, " & str & "," & Delimiter & ")"
i = Count - 1
Do While i <> 1
strPrevious = "InStr(" & strPrevious & " + Len(" & Delimiter & "), " & str & "," & Delimiter & ")"
i = i - 1
Loop
strNext = "InStr(" & strPrevious & " + Len(" & Delimiter & "), " & str & " , " & Delimiter & ")"
Debug.Print "IIf( " & strPrevious & "> 0, IIf(" & strNext & " < 1, Mid(" & str & ", " & strPrevious & " + Len(" & Delimiter & ")), Mid(" & str & ", " & strPrevious & " + Len(" & Delimiter & "), " & strNext & " - " & strPrevious & " - Len(" & Delimiter & "))), """") "
End If
End Sub

Let's use the example to generate a simple split: I want the 6th element of the following string: 1,2,3,4,5,6,7

To generate the string, in the immediate window:

GenerateSQLSplit "'1,2,3,4,5,6,7'", "','", 6

Results in the following expression to return the 6th element of that string (SQL only):

IIf( InStr(InStr(InStr(InStr(InStr(1, '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',')> 0, IIf(InStr(InStr(InStr(InStr(InStr(InStr(1, '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7' , ',') < 1, Mid('1,2,3,4,5,6,7', InStr(InStr(InStr(InStr(InStr(1, '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(',')), Mid('1,2,3,4,5,6,7', InStr(InStr(InStr(InStr(InStr(1, '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), InStr(InStr(InStr(InStr(InStr(InStr(1, '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7'
,',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7' , ',') - InStr(InStr(InStr(InStr(InStr(1, '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') + Len(','), '1,2,3,4,5,6,7',',') - Len(','))), "")

Append SELECT to the start of that, and execute it as a query, and it returns 6, as expected. Only you have a totally horrid query, while with the UDF you would just have SELECT SplitString("1,2,3,4,5,6,7", ",", 6)

You can, of course, use GenerateSQLSplit to create the query (I made sure it returned an empty string if the item was not in the string, so you can use that to test if an nth element exists). I do not recommend it, though, because the query will be long, inefficient and hard to maintain.

Split column value using delimiter and insert into different table

So your first issue is that assigning a variable to a column like that in a table is only going to pull one row from that column in table. If you want to loop through each row in a column like you're currently doing, you would need to use a CURSOR or some variant like that to go through each row.

The second issue you have is that your while statement is incorrect. You're missing the last value from it because you stop when there are no more commas in your datastring. There are no more commas at the last value, so it skips the last value. There's a way around this using your current method, but I would recommend an alternative method of splitting your string.

DECLARE @myTable TABLE (datastring VARCHAR(4000));
INSERT @myTable(datastring)
VALUES ('abc,def,gh,i,jkl'),('mnop,qr,stu,v,wxyz');

DECLARE @valueList VARCHAR(4000) = '', @i INT = 1, @pos INT = 0, @len INT = 0, @value VARCHAR(100) = '';
IF OBJECT_ID('tempTable', 'U') IS NOT NULL DROP TABLE tempTable;
CREATE TABLE tempTable (OriginalDataString VARCHAR(4000), stringVal VARCHAR(255), valNum INT);

DECLARE curs CURSOR FOR
SELECT datastring
FROM @myTable;

OPEN curs;
FETCH NEXT FROM curs INTO @valueList;
WHILE @@FETCH_STATUS = 0 BEGIN
SELECT @pos = 0, @len = 0, @i = 1;
WHILE 1 = 1
BEGIN
SET @len = ISNULL(NULLIF(CHARINDEX(',', @valueList, @pos+1), 0) - @pos, LEN(@valueList));
SET @value = SUBSTRING(@valueList, @pos, @len);

INSERT tempTable(OriginalDataString, stringVal, valNum)
VALUES (@valueList, @value, @i);

IF CHARINDEX(',', @valueList, @pos+1) = 0
BREAK;

SET @pos = CHARINDEX(',', @valueList, @pos+@len) + 1;
SET @i += 1;
END
FETCH NEXT FROM curs INTO @valueList;
END
CLOSE curs;
DEALLOCATE curs;

SELECT MAX(CASE WHEN valNum = 1 THEN stringVal END) val1
, MAX(CASE WHEN valNum = 2 THEN stringVal END) val2
, MAX(CASE WHEN valNum = 3 THEN stringVal END) val3
, MAX(CASE WHEN valNum = 4 THEN stringVal END) val4
, MAX(CASE WHEN valNum = 5 THEN stringVal END) val5
FROM tempTable
GROUP BY OriginalDataString;

This uses a cursor to get each datastring in your table, puts each value in a cursor, loops through them to get the value (the loop breaks when you reach the end of the string) and then selects val1, val2, val3, val4, val5 from the resulting table.

But, rather than using a cursor and a while loop, I would recommend the much simpler use of a recursive CTE (or even better, a split function you already have built in).

For example,

DECLARE @myTable TABLE (datastring VARCHAR(4000));
INSERT @myTable(datastring)
VALUES ('abc,def,gh,i,jkl'),('mnop,qr,stu,v,wxyz');

WITH CTE AS (
SELECT datastring
, SUBSTRING(datastring, 1, ISNULL(NULLIF(CHARINDEX(',', datastring), 0) - 1, LEN(datastring))) sString
, NULLIF(CHARINDEX(',', datastring), 0) cIndex
, 1 Lvl
FROM @myTable T
UNION ALL
SELECT datastring
, SUBSTRING(datastring, cIndex + 1, ISNULL(NULLIF(CHARINDEX(',', datastring, cIndex + 1), 0) - 1 - cIndex, LEN(datastring)))
, NULLIF(CHARINDEX(',', datastring, cIndex + 1), 0)
, Lvl + 1
FROM CTE
WHERE cIndex IS NOT NULL)
SELECT MAX(CASE WHEN Lvl = 1 THEN sString END) val1
, MAX(CASE WHEN Lvl = 2 THEN sString END) val2
, MAX(CASE WHEN Lvl = 3 THEN sString END) val3
, MAX(CASE WHEN Lvl = 4 THEN sString END) val4
, MAX(CASE WHEN Lvl = 5 THEN sString END) val5
--, datastring OriginalDataString
FROM CTE
GROUP BY datastring;

Split string in one column multiple delimiters to table maintaining ID

just include the id field in your subquery

SELECT
id
,SUBSTRING(t2.value,1,CHARINDEX(':',t2.value)-1) AS Lable
,SUBSTRING(t2.value,CHARINDEX(':',t2.value)+1,LEN(t2.value)) AS Value
FROM
(SELECT id, Cast ('<x>' + Replace(StringToSplit, '|', '</x><x>') + '</x>' AS XML) AS
RawData FROM #tmpsplit) t1
CROSS APPLY
(SELECT y.value('.','varchar(100)') as value FROM RawData.nodes('x') as f(y)) t2

Split multiple values from a string in one column, into multiple columns using SQL Server

With a bit of JSON and assuming you have a known or maximum number of tags

Select A.CompanyName
,A.CompanyNumber
,Tag1 = JSON_VALUE(S,'$[0]')
,Tag2 = JSON_VALUE(S,'$[1]')
,Tag3 = JSON_VALUE(S,'$[2]')
From YourTable A
Cross Apply ( values ( '["'+replace(STRING_ESCAPE(Tags,'json'),';','","')+'"]' ) ) B(S)

How to split a comma-separated value to columns

CREATE FUNCTION [dbo].[fn_split_string_to_column] (
@string NVARCHAR(MAX),
@delimiter CHAR(1)
)
RETURNS @out_put TABLE (
[column_id] INT IDENTITY(1, 1) NOT NULL,
[value] NVARCHAR(MAX)
)
AS
BEGIN
DECLARE @value NVARCHAR(MAX),
@pos INT = 0,
@len INT = 0

SET @string = CASE
WHEN RIGHT(@string, 1) != @delimiter
THEN @string + @delimiter
ELSE @string
END

WHILE CHARINDEX(@delimiter, @string, @pos + 1) > 0
BEGIN
SET @len = CHARINDEX(@delimiter, @string, @pos + 1) - @pos
SET @value = SUBSTRING(@string, @pos, @len)

INSERT INTO @out_put ([value])
SELECT LTRIM(RTRIM(@value)) AS [column]

SET @pos = CHARINDEX(@delimiter, @string, @pos + @len) + 1
END

RETURN
END

Split a single delimited string into a table for a given number of columns

I want to reiterate firstly that this has a strong smell of a XY Problem and that very likely this isn't something you should be doing in the RDBMS. That isn't to say it can't be, but certainly SQL Server isn't the best place.

Secondly, as mentioned, this is impossible in a FUNCTION. A FUNCTION must be well defined, meaning the data in and the data out must be defined at the time the function is created. An object that needs to return a variable amount of columns isn't well defined, as you can't define the result until the object is called.

Also the only way to achieve what you are after is by using Dynamic SQL, which you cannot use inside a FUNCTION; as it requires using EXEC (which can only be used against a very few system objects and sp_executesql is not one of them).

This means we would need to use a Stored Procedure to achieve this, however, you won't be able to use syntax like SELECT * FROM dbo.MyProcedure(@mystring,',',2); You'll need to execute it (EXEC dbo.MyProcedure(@mystring,',',2);).

Before we get onto the end dynamic solution, we need to work out how we would do with a static value. That isn't too bad, you need to simply use a a string splitter that is ordinal position aware (STRING_SPLIT is not), so I am using DelimitedSplit8K_LEAD. Then you can use a bit of integer maths to assign the rows both a column and row group. Finally we can use those value to pivot the data, using a "Cross Tab".

For a non-dynamic approach, this gets us a result like this:

DECLARE @String varchar(8000),
@Columns tinyint;

SET @String = '1,2,3,4,5,6';
SET @Columns = 3;

WITH Groupings AS(
SELECT *,
(ROW_NUMBER() OVER (ORDER BY DS.ItemNumber) -1) / @Columns AS RowNo,
(ROW_NUMBER() OVER (ORDER BY DS.ItemNumber) -1) % @Columns +1 AS ColumnNo
FROM dbo.DelimitedSplit8K_LEAD(@String,',') DS)
SELECT MAX(CASE G.ColumnNo WHEN 1 THEN G.Item END) AS Col1,
MAX(CASE G.ColumnNo WHEN 2 THEN G.Item END) AS Col2,
MAX(CASE G.ColumnNo WHEN 3 THEN G.Item END) AS Col3
FROM Groupings G
GROUP BY G.RowNo;

Of course, if you change the value of @Columns the number of columns does not change, it's hard coded.

It's also important to note I have used a varchar(8000) not a varchar(MAX). DelimitedSplit8K_LEAD (and DelimitedSplitN4K_LEAD) do not support MAX lengths, and the article above, and the original iteration of the function (DelimitedSplit8K) explain why.

Moving on, now we need to get onto the dynamic value. I'm going to assume that you won't have a "silly" value for @Columns, and that it'll be between 1 and 100. I'm also assuming you are using a recent version of SQL Server, and thus have access to STRING_AGG; if not you'll need to use FOR XML PATH (and STUFF) to do the aggregation of the dynamic statement.

First we can use a tally with up to 100 rows, to get the right number of column groups, and then (like mentioned) STRING_AGG to aggregate the dynamic part. The rest of the statement is still static. With the variable we have before, we end up with something like this to create the dynamic statement:

DECLARE @Delimiter nvarchar(20) = N',' + @CRLF + N'       ';

WITH Tally AS(
SELECT TOP (@Columns)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N1(N)
CROSS JOIN (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N2(N))
SELECT @SQL = N'WITH Groupings AS(' + @CRLF +
N' SELECT *,' + @CRLF +
N' (ROW_NUMBER() OVER (ORDER BY DS.ItemNumber) -1) / @Columns AS RowNo,' + @CRLF +
N' (ROW_NUMBER() OVER (ORDER BY DS.ItemNumber) -1) % @Columns +1 AS ColumnNo' + @CRLF +
N' FROM dbo.DelimitedSplit8K_LEAD(@String,'','') DS)' + @CRLF +
N'SELECT ' +
(SELECT STRING_AGG(CONCAT(N'MAX(CASE G.ColumnNo WHEN ',T.I,N' THEN G.Item END) AS ',QUOTENAME(CONCAT(N'Col',T.I))),@Delimiter) WITHIN GROUP (ORDER BY T.I)
FROM Tally T) + @CRLF +
N'FROM Groupings G' + @CRLF +
N'GROUP BY G.RowNo;'
PRINT @SQL;

And the PRINT outputs the below (which is what we had before, excellent!):

WITH Groupings AS(
SELECT *,
(ROW_NUMBER() OVER (ORDER BY DS.ItemNumber) -1) / @Columns AS RowNo,
(ROW_NUMBER() OVER (ORDER BY DS.ItemNumber) -1) % @Columns +1 AS ColumnNo
FROM dbo.DelimitedSplit8K_LEAD(@String,',') DS)
SELECT MAX(CASE G.ColumnNo WHEN 1 THEN G.Item END) AS [Col1],
MAX(CASE G.ColumnNo WHEN 2 THEN G.Item END) AS [Col2],
MAX(CASE G.ColumnNo WHEN 3 THEN G.Item END) AS [Col3]
FROM Groupings G
GROUP BY G.RowNo;

Now we need to wrap this into a parametrised stored procedure, and also execute the dynamic statement. This gives us the following end result, using sys.sp_executesql to execute and parametrise the dynamic statement:

CREATE PROC dbo.DynamicPivot @String varchar(8000), @Delim char(1), @Columns tinyint, @SQL nvarchar(MAX) = NULL OUTPUT AS
BEGIN
IF @Columns > 100
THROW 72001, N'@Columns cannot have a value greater than 100.', 16;

DECLARE @CRLF nchar(2) = NCHAR(13) + NCHAR(10);

DECLARE @Delimiter nvarchar(20) = N',' + @CRLF + N' ';

WITH Tally AS(
SELECT TOP (@Columns)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N1(N)
CROSS JOIN (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N2(N))
SELECT @SQL = N'WITH Groupings AS(' + @CRLF +
N' SELECT *,' + @CRLF +
N' (ROW_NUMBER() OVER (ORDER BY DS.ItemNumber) -1) / @Columns AS RowNo,' + @CRLF +
N' (ROW_NUMBER() OVER (ORDER BY DS.ItemNumber) -1) % @Columns +1 AS ColumnNo' + @CRLF +
N' FROM dbo.DelimitedSplit8K_LEAD(@String,@Delim) DS)' + @CRLF +
N'SELECT ' +
(SELECT STRING_AGG(CONCAT(N'MAX(CASE G.ColumnNo WHEN ',T.I,N' THEN G.Item END) AS ',QUOTENAME(CONCAT(N'Col',T.I))),@Delimiter) WITHIN GROUP (ORDER BY T.I)
FROM Tally T) + @CRLF +
N'FROM Groupings G' + @CRLF +
N'GROUP BY G.RowNo;'
--PRINT @SQL;

EXEC sys.sp_executesql @SQL, N'@String varchar(8000), @Delim char(1), @Columns tinyint', @String, @Delim, @Columns;
END;
GO

And then we can execute this like below:

EXEC dbo.DynamicPivot N'1,2,3,4,5,6',',',3; --Without @SQL
GO
DECLARE @SQL nvarchar(MAX)
EXEC dbo.DynamicPivot N'1 2 3 4 5 6 7 8 9 10',' ',5,@SQL OUTPUT; --With SQL to see statement run, and different delimiter
PRINT @SQL;

As I noted as well, and as you can see from the definition of the Procedure, if you try to pivot with more than 100 columns it will error:

--Will error, too many columns
EXEC dbo.DynamicPivot N'1,2,3,4,5,6',',',101; --Without @SQL

Which returns the error:

Msg 72001, Level 16, State 16, Procedure dbo.DynamicPivot, Line 5

@Columns cannot have a value greater than 100.

Edit: Noticed the value for the delimiter in the splitter needs to be parametrised too, so amended the SP's definition to add @Delim and added that the to dynamic SQL too and demonstrate in examples.

Split Column with delimiter into multiple columns

This is an example on how to do this:

DECLARE @tt TABLE(i INT IDENTITY,x VARCHAR(8000));
INSERT INTO @tt(x)VALUES('-9;-9;-1;-9;-9;-9;-9;-9;-1;-9;-9;-9;-9;-9;-9;-9;-9;-9;-1;-9;-9;-9;-9;-9;-9;-9;-9;-9;-1;-9;-1;-9;-9;-9;-1;-9;-9;-9;-9;-9;-9;-1;-1;-1;-1;-9;-1;-1;-9;-9;-9;-9;-1;-9;-1;-9;-9;-9;-1;-9;-1;-9;-1;-9;-9;-9;-9;-1;-9;-9;-1;-1;-9;-1;-1;0000;FFF8;-9;-9;-9;-1;-9;-1;-9;FFF6;-9;-1;-9;-1;-9;-1;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9');

SELECT
i,
val1=n.v.value('/e[1]','VARCHAR(16)'),
val2=n.v.value('/e[2]','VARCHAR(16)'),
val3=n.v.value('/e[3]','VARCHAR(16)'),
-- ... repeat for val4 .. val114
val115=n.v.value('/e[115]','VARCHAR(16)')
FROM
@tt
CROSS APPLY (
SELECT
CAST('<e>'+REPLACE(x,';','</e><e>')+'</e>' AS XML) AS itm
) AS i
CROSS APPLY i.itm.nodes('/') AS n(v);

This is some XML trickery, by making the column with delimited values a XML where each value is an e element. The individual elements are then retrieved using the index in the value function.

Since this is a single statement it can be used as the query in a view.



Related Topics



Leave a reply



Submit