Why (And How) to Split Column Using Master..Spt_Values

Why (and how) to split column using master..spt_values?


Purpose

Why use undocumented master..spt-values

Sybase, and therefore its bastard son MS SQL, provide various features and functions for the product, that is implemented in system procedures (as opposed to the binaries like sqlserver, which are started as a service). These system procedures procedures are written in SQL code and named sp_%. Except for some secret internals, they have the same limitations and needs as any other SQL code. They are part of the Sybase ASE or SQL Server product. As such, they are not required to document it; and the internal bits cannot be reasonably labelled as "undocumented".

master..spt_values contains all the various bits and pieces that the said system procedures need, in an SQL table, to produce the various reports. The sp means system procedure; spt means tables for system procedures; and of course values is the content.

Lookup Tables

What is the (meaning of) Type = 'P'

People often describe spt_values as "de-normalised", but that is the incorrect term. The correct term is folded, or packed. It is 26 or so logical Lookup tables, each beautifully Normalised, folded into one physical table, with a Type column to differentiate the logical tables.

Now in a normal database, that would be a gross error (just look at the answers for "one lookup table or many"). But in a server catalogue, it is desirable, it replaces 26 physical tables.

  • "L" stands for for LockType Lookup; "V" stands for DeviceType Lookup (V is short for Device throughout the server); etc. Type "P2" contains bitwise ordinals, for the expansion of bits that are packed into an INT.

  • A set of consecutive numbers within known bounds, that is available in the form of an SQL table is required, in order to perform a Projection, which many of the system procedures have to do. Type "P" is a list of consecutive numbers between 0 and 2047.

  • The term Projection is used here as the technically precise meaning, the natural logical sense, not the relational algebra meaning, which is unnatural.

There is therefore just one purpose for spt_values, to contain 26 folded, otherwise separate, Reference tables, and one Projection table.

Expansion

The ordinary use of spt_values then, is as an ordinary Lookup or Reference or ENUM table. First, the Lookup values:

    SELECT *                    -- list Genders
FROM Gender

It is used in the same way that Person has a GenderCode that needs to be expanded (very expanded, these freaky days):

    SELECT  P.*,                -- list Person
G.Name -- expand GenderCode to Name
FROM Person P
JOIN Gender G
ON P.GenderCode = G.GenderCode

Eg. sp_lock produces a report of active locks, displaying lock types as string names. But master..syslocks contains lock types as numbers, it does not contain those names; and if it did, it would be a badly denormalised table ! If you execute the query (Sybase ASE code, you will have to convert):

    SELECT *                    -- list LockTypes
FROM master..spt_values
WHERE type = "L"

you will notice 66 LockType numbers and names in the Lookup table. That allows sp_lock to execute simple code like Person::Gender above:

    SELECT  spid,               -- list Active Locks
DB_NAME(dbid),
OBJECT_NAME(id, dbid),
v.name, -- expand lock name
page,
row
FROM master..syslocks L,
master..spt_values LT
WHERE L.type = LT.number --
AND type = "L" -- LockType Lookup table
ORDER by 1, 2, 3, 4, 5, 6 -- such that perusal is easy

Projection

What is the (meaning of) Type = 'P' ?

What is Projection and how is it used ?

Say, for example, instead of the active locks produced by the query above, you wanted a list of all 66 LockTypes, showing the number of active locks (or Null). You don't need a cursor, or a WHILE loop. We could Project the LockType Lookup table, through the count of active locks:

    SELECT  LT.name,            -- list LockTypes
[Count] = ( -- with count
SELECT COUNT(*)
FROM master..syslocks
WHERE type = LT.number
)
FROM master..spt_values LT
WHERE type = "L"

There are several methods, that is just one. Another method is to use a Derived Table instead of the Subquery. But you still need the Projection.

That is typically what spt_values is used for, either Expansion or Projection. Now that you know it is there, you can use it too. It is safe (in the master database) and used by virtually all the system procedures, which means the system procedures cannot run without it.

for splitting a column?

Ah, you do not understand the "Split one CSV column into multiple rows" code.

  • Forget about spt_values for a moment, and examine that code again. It just needs a list of consecutive numbers, so that in can step through the list of values in the CSV column, byte by byte. The code is activated only for each byte that is a comma, or end-of-string.

  • Where to get a set of consecutive numbers in the form of an SQL table, rather than CREATing one from scratch and INSERTing into it? Why, master..spt_values of course. If you know it is there.

  • (You can learn a bit about the internals of ASE or SQL Server, just by reading the code of the system stored procedures.)

  • Note that any CSV field in one column is a gross Normalisation error, it breaks 2NF (contains repeating values) and 1NF (not atomic). Note, that is not packed or folded, it is a repeating group, it is un-normalised. One of the many negative consequences of such a gross error is, instead of using simple SQL to navigate the repeating group as rows, one has to use complex code to determine and extract the content of the un-normalised CSV field. Here spt_values P provides a vector for that complex code, making it easier.

What is the benefit of it?

I think I have answered that. If you did not have it, every system procedure that requires a list of Numbers would have to CREATE a temp table; and INSERT the rows into it; before running its code. Of course, not having to perform those steps, makes the system procedures much faster.

Now, when you need to perform a Projection, eg. calendar dates in the future, or whatever, you can use spt_values, instead of having to create your own temp table each time (or create your own private permanent table and maintain it).

What is the purpose of system table master..spt_values and what are the meanings of its values?

The spt_values table is not mentioned in the the SQL Server documentation but it goes back to the Sybase days and there is some extremely minimal documentation in the Sybase online docs that can be summed up in this comment:

To see how it is used, execute sp_helptext and look at the text for
one of the system procedures that references it.

In other words, read the code and work it out for yourself.

If you poke around the system stored procedures and examine the table data itself, it's fairly clear that the table is used to translate codes into readable strings (among other things). Or as the Sybase documentation linked above puts it, "to convert internal system values [...] into human-readable format"

The table is also sometimes used in code snippets in MSDN blogs - but never in formal documentation - usually as a convenient source of a list of numbers. But as discussed elsewhere, creating your own source of numbers is a safer and more reliable solution than using an undocumented system table. There is even a Connect request to provide a 'proper', documented number table in SQL Server itself.

Anyway, the table is entirely undocumented so there is no significant value in knowing anything about it, at least from a practical standpoint: the next servicepack or upgrade might change it completely. Intellectual curiosity is something else, of course :-)

What does `from master..spt_values` mean

That is the syntax for specifying a databasename, schema, and table name. You can use this syntax to specify a database different from the database you are currently connected to. In your example, master is the database name, the schema name is unspecified (so it becomes the default dbo schema) and spt_values is the table name.

For clarity, the above could also be: Select ... from master.dbo.spt_values

Need to split column into rows and columns

Try it along this:

Hint: There are some "normal" commas in your sample data.
I suspected these as wrong and used semicolons.
If this is wrong, you might use a general REPLACE() to use ";" instead of ",".

Create a declared table to simulate your issue

DECLARE @tbl TABLE(ID INT, cst VARCHAR(1000));
INSERT INTO @tbl(ID,cst)
VALUES(1,'string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8;string9; 65')
,(2,'string10;-3;string11;string12;56;string13;6;string14;string15;9');

--The query (for almost any version of SQL-Server, find v2017+ as UPDATE below)

WITH cte AS
(
SELECT t.ID
,B.Nr
,A.Casted.value('(/x[sql:column("B.Nr")]/text())[1]','varchar(max)') AS ValueAtPosition
,(B.Nr-1) % 5 AS Position
,(B.Nr-1)/5 AS GroupingKey
FROM @tbl t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.cst,';','</x><x>') + '</x>' AS XML)) A(Casted)
CROSS APPLY(SELECT TOP(A.Casted.value('count(x)','int')) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) FROM master..spt_values) B(Nr)
)
SELECT ID
,GroupingKey
,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;

The idea in short:

  • we use APPLY to add your string casted to XML to the result set. This will help to split the string ("a;b;c" => <x>a</x><x>b</x><x>c</x>)
  • We use another APPLY to create a tally on the fly with a computed TOP-clause. It will return as many virtual rows as there are elements in the XML
  • We use sql:column() to grab each element's value by its position and some simple maths to create a grouping key and a running number from 0 to 4 and so on.
  • We use GROUP BY together with MAX(CASE...) to place the values in the fitting column (old-fashioned pivot or conditional aggregation).

Hint: If you want this fully generically, with a number of columns not knwon in advance. You cannot use any kind of function or ad-hoc query. You would rather need some kind of dynamic statement creation together with EXEC within a stored procedure.
to be honest: This might be a case of XY-problem. Such approaches are the wrong idea - at least in almost all situations I can think of.

UPDATE for SQL-Server 2017+

You are on v2017, this allows for JSON, which is a bit faster in position safe string splitting. Try this:

    SELECT t.ID
,A.*
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A

The general idea is the same. We transform a string to a JSON-array ("a,b,c" => ["a","b","c"]) and read it with APPLY OPENJSON().
You can perform the same maths at the "key" column and do the rest as above.

Just because it is ready here, this is the full query for v2017+

WITH cte AS
(
SELECT t.ID
,A.[key]+1 AS Nr
,A.[value] AS ValueAtPosition
,A.[key] % 5 AS Position
,A.[key]/5 AS GroupingKey
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
)
SELECT ID
,GroupingKey
,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;

Split records based on multiple separator

This is way easier in Postgres. You first need to split the string by ; to create the rows, then use string_to_array() to split up the second level

select field, 
elements[1] as folder,
elements[2] as value
from (
select t.field, "values", string_to_array(x.elements, '»') as elements
from the_table t
cross join unnest(string_to_array(t."values", ';')) as x(elements)
) x;

Or if you need to preserve the first level of "splitting":

select field, 
splitfields,
split_part(splitfields, '»', 1) as folder,
split_part(splitfields, '»', 2) as value
from (
select t.field, "values", x.splitfields
from the_table t
cross join unnest(string_to_array(t."values", ';')) as x(splitfields)
) x;

Online example: https://rextester.com/IOW62752

Split a row on 2 or more rows depending on a column

You can use recursive CTE:

WITH RCTE AS 
(
SELECT
ordernumber, qty, articlenumber, qty AS L
FROM Table1

UNION ALL

SELECT
ordernumber, 1, articlenumber, L - 1 AS L
FROM RCTE
WHERE L>0
)
SELECT ordernumber,qty, articlenumber
FROM RCTE WHERE qty = 1

SQLFiddleDEMO

EDIT:
Based on Marek Grzenkowicz's answer and MatBailie's comment, whole new idea:

WITH CTE_Nums AS 
(
SELECT MAX(qty) n FROM dbo.Table1
UNION ALL
SELECT n-1 FROM CTE_Nums
WHERE n>1
)
SELECT ordernumber ,
1 AS qty,
articlenumber
FROM dbo.Table1 t1
INNER JOIN CTE_Nums n ON t1.qty >= n.n

Generating number from 1 to max(qty) and join table on it.

SQLFiddle DEMO

Split Column with delimiter into multiple columns

This is an example on how to do this:

DECLARE @tt TABLE(i INT IDENTITY,x VARCHAR(8000));
INSERT INTO @tt(x)VALUES('-9;-9;-1;-9;-9;-9;-9;-9;-1;-9;-9;-9;-9;-9;-9;-9;-9;-9;-1;-9;-9;-9;-9;-9;-9;-9;-9;-9;-1;-9;-1;-9;-9;-9;-1;-9;-9;-9;-9;-9;-9;-1;-1;-1;-1;-9;-1;-1;-9;-9;-9;-9;-1;-9;-1;-9;-9;-9;-1;-9;-1;-9;-1;-9;-9;-9;-9;-1;-9;-9;-1;-1;-9;-1;-1;0000;FFF8;-9;-9;-9;-1;-9;-1;-9;FFF6;-9;-1;-9;-1;-9;-1;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9;-9');

SELECT
i,
val1=n.v.value('/e[1]','VARCHAR(16)'),
val2=n.v.value('/e[2]','VARCHAR(16)'),
val3=n.v.value('/e[3]','VARCHAR(16)'),
-- ... repeat for val4 .. val114
val115=n.v.value('/e[115]','VARCHAR(16)')
FROM
@tt
CROSS APPLY (
SELECT
CAST('<e>'+REPLACE(x,';','</e><e>')+'</e>' AS XML) AS itm
) AS i
CROSS APPLY i.itm.nodes('/') AS n(v);

This is some XML trickery, by making the column with delimited values a XML where each value is an e element. The individual elements are then retrieved using the index in the value function.

Since this is a single statement it can be used as the query in a view.

Split the string '1/10/2/20/3/30/4/40' into two columns check body for more details

Next time please state your SQL-Server's version...

The following solution works with almost any version, if you are at v2016+ there might be better approaches. Try this out:

DECLARE @TEXT VARCHAR(60) = '1/10/2/20/3/30/4/40';

WITH Casted(ToXml) AS
(
SELECT CAST('<x>' + REPLACE(@TEXT,'/','</x><x>') + '</x>' AS XML)
)
,TallyOddNumbers(OddNumber) AS
(
SELECT TOP((SELECT ToXml.value('count(/x)','int')/2 FROM Casted)) (ROW_NUMBER() OVER(ORDER BY (SELECT NULL)))*2-1 FROM master..spt_values
)
SELECT ToXml.value('/x[sql:column("OddNumber")][1]','int') AS Col1
,ToXml.value('/x[sql:column("OddNumber")+1][1]','int') AS Col1
FROM Casted
CROSS JOIN TallyOddNumbers;

The idea in short:

  • The first cte "Casted" will create an XML from your string.
  • The second cte "TallyOddNumbers" will use ROW_NUMBER() against master..spt_values (just any larger set is okay here) to create a set of running odd numbers
  • The computed TOP clause will use XML.value() with count() to find the needed number of element pairs.
  • The final query will use the tally to grab the values by their positions.

Hint: You can use the code above simply together with INSERT in order to get the set in a persistant table.

UPDATE for version 2016+

In your own answer you are using string_split() (which is not a good idea due to the non-guaranteed sort order). But this points to a version of v2016+.

You might try this alternativ approach

DECLARE @TEXT VARCHAR(60) = '1/10/2/20/3/30/4/40';


WITH ToJson(j) AS(SELECT CONCAT('[',REPLACE(@TEXT,'/',','),']'))
SELECT p.Col1
,p.Col2
FROM
(
SELECT A.[key] /2 AS GroupIndex
,CONCAT('Col',A.[key]%2 +1) AS ColumnName
,A.[value] AS TheValue
FROM ToJson
CROSS APPLY OPENJSON(j) A
) t
PIVOT
(
MAX(TheValue) FOR ColumnName IN(Col1,Col2)
) p;

The idea in short:

  • The cte will transform your CSV-list to a JSON array
  • We use OPENJSON to read this array
  • the key is the elements position. Together with %2 (modulo operator) we get a computed, alternating column name
  • Finally we can use PIVOT to spread your values into Col1 and Col2.
  • The A.[key] /2 will use a trick with integer divisions to create one number per group (otherwise you would see just to last row).


Related Topics



Leave a reply



Submit