How to Split a Comma-Separated Value to Columns

How to split a comma-separated value to columns

CREATE FUNCTION [dbo].[fn_split_string_to_column] (
@string NVARCHAR(MAX),
@delimiter CHAR(1)
)
RETURNS @out_put TABLE (
[column_id] INT IDENTITY(1, 1) NOT NULL,
[value] NVARCHAR(MAX)
)
AS
BEGIN
DECLARE @value NVARCHAR(MAX),
@pos INT = 0,
@len INT = 0

SET @string = CASE
WHEN RIGHT(@string, 1) != @delimiter
THEN @string + @delimiter
ELSE @string
END

WHILE CHARINDEX(@delimiter, @string, @pos + 1) > 0
BEGIN
SET @len = CHARINDEX(@delimiter, @string, @pos + 1) - @pos
SET @value = SUBSTRING(@string, @pos, @len)

INSERT INTO @out_put ([value])
SELECT LTRIM(RTRIM(@value)) AS [column]

SET @pos = CHARINDEX(@delimiter, @string, @pos + @len) + 1
END

RETURN
END

Split Comma Separated values into multiple column

Your sample data may not need any splitting. You want to move the data to a column based on the value it finds. You can do this a bit simpler than splitting the data. This works just fine for your sample data.

declare @Something table
(
Combined_Column varchar(10)
)

insert @Something values
('1,2,3')
, ('2')
, ('1,3')
, ('1,2,3,4')
, ('1,3,4')
, ('1')
, ('4')

select *
, col1 = case when charindex('1', s.Combined_Column) > 0 then 1 end
, col2 = case when charindex('2', s.Combined_Column) > 0 then 2 end
, col3 = case when charindex('3', s.Combined_Column) > 0 then 3 end
, col4 = case when charindex('4', s.Combined_Column) > 0 then 4 end
from @Something s

how to split the comma separated value into columns

first create function to split values

create function [dbo].[udf_splitstring] (@tokens varchar(max),
@delimiter varchar(5))
returns @split table (
token varchar(200) not null )
as



begin

declare @list xml

select @list = cast('<a>'
+ replace(@tokens, @delimiter, '</a><a>')
+ '</a>' as xml)

insert into @split
(token)
select ltrim(t.value('.', 'varchar(200)')) as data
from @list.nodes('/a') as x(t)

return

end

SELECT
max(CASE WHEN TOKEN='CLAR' THEN TOKEN END) 'NAME1' ,
max(CASE WHEN TOKEN='ALWIN' THEN TOKEN END) 'NAME2',
max(CASE WHEN TOKEN='ANTONY' THEN TOKEN END) 'NAME3',
max(CASE WHEN TOKEN='RINU' THEN TOKEN END) 'NAME4',
max(CASE WHEN TOKEN='DAMI' THEN TOKEN END) 'NAME5',
max(CASE WHEN TOKEN='PRINCE' THEN TOKEN END) 'NAME6'
FROM #Table1 as t1
CROSS APPLY [dbo].UDF_SPLITSTRING(name,',') as t2

output

NAME1   NAME2   NAME3   NAME4   NAME5   NAME6
clar alwin antony rinu dami prince

How to split comma separated text into columns on pandas dataframe?

Maybe you can try this without pivot.

Create the dataframe.

import pandas as pd
import io

s = '''Data
a,b,c
a,c,d
d,e
a,e
a,b,c,d,e'''

df = pd.read_csv(io.StringIO(s), sep = "\s+")

We can use pandas.Series.str.split with expand argument equals to True. And value_counts each rows with axis = 1.

Finally fillna with zero and change the data into integer with astype(int).

df["Data"].str.split(pat = ",", expand=True).apply(lambda x : x.value_counts(), axis = 1).fillna(0).astype(int)

#
a b c d e
0 1 1 1 0 0
1 1 0 1 1 0
2 0 0 0 1 1
3 1 0 0 0 1
4 1 1 1 1 1

And then merge it with the original column.

new = df["Data"].str.split(pat = ",", expand=True).apply(lambda x : x.value_counts(), axis = 1).fillna(0).astype(int)
pd.concat([df, new], axis = 1)

#
Data a b c d e
0 a,b,c 1 1 1 0 0
1 a,c,d 1 0 1 1 0
2 d,e 0 0 0 1 1
3 a,e 1 0 0 0 1
4 a,b,c,d,e 1 1 1 1 1

How to split comma separated strings in a column into different columns if they're not of same length using python or pandas in jupyter notebook

We can use a regular expression pattern to find all the matching key-value pairs from each row of column_A , then map the list of pairs from each row to dictionary in order to create records then construct a dataframe from these records

pd.DataFrame(map(dict, df['column_A'].str.findall(r'\s*([^:,]+):\s*([^,]+)')))

See the online regex demo

        Garbage Organics          Recycle   Junk
0 Tissues Milk Cardboards NaN
1 Paper Towels Eggs Glass Feces
2 cups NaN Plastic bottles NaN

Here is an alternate approach in case you don't want to use regular expression patterns

df['column_A'].str.split(', ').explode()\
.str.split(': ', expand=True)\
.set_index(0, append=True)[1].unstack()

How to split a comma separated value to columns together other columns

You may try with the next approach, using LEFT(), RIGHT(), LEN() and CHARINDEX() functions:

Table:

CREATE TABLE Data (
AccountID varchar(7),
GEO varchar(50)
)
INSERT INTO Data
(AccountID, GEO)
VALUES
('CT-2000', '9.9582925,-84.19607')

Statement:

SELECT 
AccountID,
LEFT(GEO, CHARINDEX(',', GEO) - 1) AS Lat,
RIGHT(GEO, LEN(GEO) - CHARINDEX(',', GEO)) AS Long
FROM Data

Result:

AccountID   Lat         Long
CT-2000 9.9582925 -84.19607

Split comma separated values into target table with fixed number of columns

It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.

While stuck with your current situation ...

For known small maximum number of elements

A simple solution without trickery or recursion will do:

SELECT id, 1 AS rnk
, split_part(csv, ', ', 1) AS c1
, split_part(csv, ', ', 2) AS c2
, split_part(csv, ', ', 3) AS c3
, split_part(csv, ', ', 4) AS c4
, split_part(csv, ', ', 5) AS c5
FROM tbl
WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows

UNION ALL
SELECT id, 2
, split_part(csv, ', ', 6)
, split_part(csv, ', ', 7)
, split_part(csv, ', ', 8)
, split_part(csv, ', ', 9)
, split_part(csv, ', ', 10)
FROM tbl
WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows

-- three more blocks to cover a maximum "around 20"

ORDER BY id, rnk;

db<>fiddle here

id being the PK of the original table.

This assumes ', ' as separator, obviously.

You can adapt easily.

Related:

  • Split comma separated column data into additional columns

For unknown number of elements

Various ways. One way use regexp_replace() to replace every fifth separator before unnesting ...

-- for any number of elements
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 1) AS c1
, split_part(c.csv5, ', ', 2) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 4) AS c4
, split_part(c.csv5, ', ', 5) AS c5
FROM tbl t
, unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '\1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER BY t.id, c.rnk;

db<>fiddle here

This assumes that the chosen separator ; never appears in your strings. (Just like , can never appear.)

The regular expression pattern is the key: '((?:.*?,){4}.*?),'

(?:) ... “non-capturing” set of parentheses

() ... “capturing” set of parentheses

*? ... non-greedy quantifier

{4}? ... sequence of exactly 4 matches

The replacement '\1;' contains the back-reference \1.

'g' as fourth function parameter is required for repeated replacement.

Further reading:

  • PostgreSQL & regexp_split_to_array + unnest
  • Apply `trim()` and `regexp_replace()` on text array
  • PostgreSQL unnest() with element number

Other ways to solve this include a recursive CTE or a set-returning function ...

Fill from right to left

(Like you added in How to put values starting from the right side into columns?)

Simply count down numbers like:

SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 5) AS c1
, split_part(c.csv5, ', ', 4) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 2) AS c4
, split_part(c.csv5, ', ', 1) AS c5
FROM ...

db<>fiddle here



Related Topics



Leave a reply



Submit