Splitting Variable Length Delimited String Across Multiple Rows (Sql)

Splitting variable length delimited string across multiple rows (SQL)

First, let me just say that this is the reason that you shouln't have comma separated data in a field in the first place. There is no easy or efficient way to work with it.

That said, you can use a recursive query to split the string and get the numbers from it:

with split as
(
select
item = cast('' as varchar(max)),
source = cast('20,0, 5,,^24,0, 0,,^26,0, 0,,^281,0, 0,,^34,0, 2,,^48,0, 2,,^44,0, 2,,^20,0, 10,,^20,5, 5,,^379,1, 1,,^26,1, 2,,^32,0, 1,,^71,0, 2,,^' as varchar(max))
union all
select
item = substring(source, 1, charindex(',,', source)),
source = substring(source, charindex(',,', source) + 2, 10000)
from split
where source > ''
)
select substring(item, 1, charindex(',', item) -1)
from split
where item > ''

Result:

20
^24
^26
^281
^34
^48
^44
^20
^20
^379
^26
^32
^71

Split string into multiple rows and columns

I rewrote this as a set based query instead of a cursor and this whole script, including the generation of 100k test records runs in about 40 seconds on my Azure SQL DB, which is a Serverless gen 5 with only 1 vCore. Work through the script to make sure you understand it.

NB I'm dropping the tables as this is a test rig - not production code:

------------------------------------------------------------------------------------------------
-- Setup START
------------------------------------------------------------------------------------------------

DROP TABLE IF EXISTS dbo.sourceTable
DROP TABLE IF EXISTS dbo.finalTable
GO

CREATE TABLE dbo.sourceTable
(
PersonId INT IDENTITY PRIMARY KEY,
LocationId INT,
StartDate DATE,
AttendanceString VARCHAR(1000)
)
GO

CREATE TABLE dbo.finalTable
(
DayDate DATE,
LocationId INT,
PersonId INT,
MorningValue CHAR(1),
AfternoonValue CHAR(1)
)
GO

-- Generate some test data
SET IDENTITY_INSERT dbo.sourceTable ON

INSERT INTO dbo.sourceTable ( PersonId, LocationId, StartDate, AttendanceString )
VALUES
( 123, 987, '2018-09-01', 'XXXXZZZZ######PPLL' ),
( 234, 678, '2018-10-01', 'PPPPLL######ZZZZXX' ),
( 567, 999, '2018-10-01', 'abcdefghijklmnopqr' )

SET IDENTITY_INSERT dbo.sourceTable OFF
GO

-- Setup END
------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------
-- Test Data START
------------------------------------------------------------------------------------------------

;WITH cte AS (
SELECT 1 rn, 1 locationId, CAST( '1 Jan 2018' AS DATE ) startDate, REPLACE( NEWID(), '-', '' ) AttendanceString
UNION ALL
SELECT rn + 1, rn % 42, DATEADD( day, 1, startDate ), REPLACE( NEWID(), '-', '' )
FROM cte
WHERE rn < 100
)
INSERT INTO dbo.sourceTable ( LocationId, StartDate, AttendanceString )
SELECT LocationId, StartDate, AttendanceString
FROM cte
ORDER BY 1;
GO 1000

-- Test Data END
------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------
-- Rewritten query START
------------------------------------------------------------------------------------------------

DROP TABLE IF EXISTS #tmp;

;WITH cte AS (
SELECT 1 n, 1 x, 2 y
UNION ALL
SELECT n + 1, x + 2, y + 2
FROM cte
WHERE n < 20
)
SELECT
personId,
locationId,
DATEADD( day, c.n - 1, startDate ) xdate,
SUBSTRING ( attendanceString, c.x, 1 ) a,
SUBSTRING ( attendanceString, c.y, 1 ) b

INTO #tmp

FROM dbo.sourceTable s
CROSS APPLY cte c
WHERE c.y <= LEN(attendanceString);

select *
from sourceTable
WHERE personId = 999

select *
from #tmp
WHERE personId = 999

select *
from #tmp
WHERE locationId = 999

-- Rewritten query END
------------------------------------------------------------------------------------------------

Altered version of the script for longer attendanceIds here.

Split string of variable length, variable delimiters

You can also use a string splitter. Here is an excellent one that works with your version. DelimitedSplit8K

Now we need some sample data.

declare @Something table
(
Category varchar(100)
)

insert @Something values
('Zoo - Animals - Lions')
, ('Zoo - Personnel')
, ('Zoo - Operating Costs - Power / Cooling')

Now that we have a function and sample data the code for this is quite nice and tidy.

select s.Category
, Location = max(case when x.ItemNumber = 1 then Item end)
, Category = max(case when x.ItemNumber = 2 then Item end)
, SubCategory = max(case when x.ItemNumber = 3 then Item end)
from @Something s
cross apply dbo.DelimitedSplit8K(s.Category, '-') x
group by s.Category

And this will return:

Category                                |Location|Category       |SubCategory
Zoo - Animals - Lions |Zoo |Animals |Lions
Zoo - Operating Costs - Power / Cooling |Zoo |Operating Costs|Power / Cooling
Zoo - Personnel |Zoo |Personnel |NULL

SQL split values to multiple rows

If you can create a numbers table, that contains numbers from 1 to the maximum fields to split, you could use a solution like this:

select
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.name, ',', numbers.n), ',', -1) name
from
numbers inner join tablename
on CHAR_LENGTH(tablename.name)
-CHAR_LENGTH(REPLACE(tablename.name, ',', ''))>=numbers.n-1
order by
id, n

Please see fiddle here.

If you cannot create a table, then a solution can be this:

select
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.name, ',', numbers.n), ',', -1) name
from
(select 1 n union all
select 2 union all select 3 union all
select 4 union all select 5) numbers INNER JOIN tablename
on CHAR_LENGTH(tablename.name)
-CHAR_LENGTH(REPLACE(tablename.name, ',', ''))>=numbers.n-1
order by
id, n

an example fiddle is here.

Split String into given length but do not split word

Assuming you have a table of addresses, I'd use a recursive CTE.

On each iteration, find the last possible space to break on, then start the next iteration for the character after the space.

  • take 31 characters
  • reverse them
  • find the position of the first space

Extra care to be taken for corner cases:

  • The remaining string to be searched is less than 30 characters
  • The current string being searched has no space in the first 31 characters

Using the following test data...

CREATE TABLE test (
address VARCHAR(MAX)
);

INSERT INTO
test
VALUES
('216 Apartment123 AreaArea SampleWord1 Word2 MiddleTown Upper1Location Another5 NewYork'),
('216 Apartment123 AreaArea SampleWord1 Word2 MiddleTownxx Upper1LocationUpper1LocationUpper1Location Another5 NewYork'),
('216 Apartment123 AreaArea SampleWord1 Word2 MiddleTownxx Upper1LocationUpper1LocationUpper1Location Another5 NewYork x')

;

Using the following CTE...

DECLARE @chars BIGINT = 30;

WITH
parts AS
(
SELECT
address,
LEN(address) AS length,
CAST(0 AS BIGINT) AS last_space,
CAST(1 AS BIGINT) AS next,
address AS fragment
FROM
test

UNION ALL

SELECT
parts.address,
parts.length,
last_space.pos,
parts.next + COALESCE(last_space.pos, @chars),
SUBSTRING(parts.address, parts.next, COALESCE(last_space.pos - 1, @chars))
FROM
parts
CROSS APPLY
(
SELECT
@chars + 2
-
NULLIF(
CHARINDEX(
' ',
REVERSE(
SUBSTRING(
parts.address + ' ',
parts.next,
@chars + 1
)
)
)
, 0
)
)
last_space(pos)
WHERE
parts.next <= parts.length
)

SELECT
*, len(fragment) AS chars
FROM
parts
WHERE
next > 1
ORDER BY
address,
next

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=acd11f2bc73e5036bd82498ecf14b08f

Split value into multiple rows

You are on the right track with the table of numbers. You should start by adding more rows, so it matches (or exceeds) the maximum possible number of elements in a CSV list.

Then, you can use a join condition to generate the relevant number of rows per name only

select
t.*,
substring_index(substring_index(t.name, ';', n.n), ';', -1) name
from numbers n
inner join testLocation t
on n <= length(t.name) - length(replace(t.name, ';', '')) + 1

Demo on DB Fiddle (I expanded the numbers to 8):


id | State | name | name
-: | :------ | :-------------------------------------- | :-----------
1 | Alabama | Birmingham;Huntsville;Mobile;Montgomery | Birmingham
1 | Alabama | Birmingham;Huntsville;Mobile;Montgomery | Huntsville
1 | Alabama | Birmingham;Huntsville;Mobile;Montgomery | Mobile
1 | Alabama | Birmingham;Huntsville;Mobile;Montgomery | Montgomery
2 | Florida | Tampa;Jacksonville;Destin | Tampa
2 | Florida | Tampa;Jacksonville;Destin | Jacksonville
2 | Florida | Tampa;Jacksonville;Destin | Destin

Note, that, as commented already by others, storing CSV lists in a database is bad practice and should almost always be avoided. Recommended related reading: Is storing a delimited list in a database column really that bad?

SQL server split string into columns by delimiter (dynamic length)

An xml-based solution

declare @tmp table (STRING varchar(500))

insert into @tmp
values
('AA.0.HJ')
,('AABBCC.099.0')
,('0.91.JAH21')

;WITH Splitted
AS (
SELECT STRING
,CAST('<x>' + REPLACE(STRING, '.', '</x><x>') + '</x>' AS XML) AS Parts
FROM @tmp
)
SELECT STRING
,Parts.value(N'/x[1]', 'varchar(50)') AS [First]
,Parts.value(N'/x[2]', 'varchar(50)') AS [Second]
,Parts.value(N'/x[3]', 'varchar(50)') AS [Third]
FROM Splitted;

Output:

Sample Image



Related Topics



Leave a reply



Submit