Split Comma Separated Column Data into Additional Columns

Split comma separated column data into additional columns

If the number of fields in the CSV is constant then you could do something like this:

select a[1], a[2], a[3], a[4]
from (
select regexp_split_to_array('a,b,c,d', ',')
) as dt(a)

For example:

=> select a[1], a[2], a[3], a[4] from (select regexp_split_to_array('a,b,c,d', ',')) as dt(a);
a | a | a | a
---+---+---+---
a | b | c | d
(1 row)

If the number of fields in the CSV is not constant then you could get the maximum number of fields with something like this:

select max(array_length(regexp_split_to_array(csv, ','), 1))
from your_table

and then build the appropriate a[1], a[2], ..., a[M] column list for your query. So if the above gave you a max of 6, you'd use this:

select a[1], a[2], a[3], a[4], a[5], a[6]
from (
select regexp_split_to_array(csv, ',')
from your_table
) as dt(a)

You could combine those two queries into a function if you wanted.

For example, give this data (that's a NULL in the last row):

=> select * from csvs;
csv
-------------
1,2,3
1,2,3,4
1,2,3,4,5,6

(4 rows)

=> select max(array_length(regexp_split_to_array(csv, ','), 1)) from csvs;
max
-----
6
(1 row)

=> select a[1], a[2], a[3], a[4], a[5], a[6] from (select regexp_split_to_array(csv, ',') from csvs) as dt(a);
a | a | a | a | a | a
---+---+---+---+---+---
1 | 2 | 3 | | |
1 | 2 | 3 | 4 | |
1 | 2 | 3 | 4 | 5 | 6
| | | | |
(4 rows)

Since your delimiter is a simple fixed string, you could also use string_to_array instead of regexp_split_to_array:

select ...
from (
select string_to_array(csv, ',')
from csvs
) as dt(a);

Thanks to Michael for the reminder about this function.

You really should redesign your database schema to avoid the CSV column if at all possible. You should be using an array column or a separate table instead.

Split comma separated values into target table with fixed number of columns

It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.

While stuck with your current situation ...

For known small maximum number of elements

A simple solution without trickery or recursion will do:

SELECT id, 1 AS rnk
, split_part(csv, ', ', 1) AS c1
, split_part(csv, ', ', 2) AS c2
, split_part(csv, ', ', 3) AS c3
, split_part(csv, ', ', 4) AS c4
, split_part(csv, ', ', 5) AS c5
FROM tbl
WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows

UNION ALL
SELECT id, 2
, split_part(csv, ', ', 6)
, split_part(csv, ', ', 7)
, split_part(csv, ', ', 8)
, split_part(csv, ', ', 9)
, split_part(csv, ', ', 10)
FROM tbl
WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows

-- three more blocks to cover a maximum "around 20"

ORDER BY id, rnk;

db<>fiddle here

id being the PK of the original table.

This assumes ', ' as separator, obviously.

You can adapt easily.

Related:

  • Split comma separated column data into additional columns

For unknown number of elements

Various ways. One way use regexp_replace() to replace every fifth separator before unnesting ...

-- for any number of elements
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 1) AS c1
, split_part(c.csv5, ', ', 2) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 4) AS c4
, split_part(c.csv5, ', ', 5) AS c5
FROM tbl t
, unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '\1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER BY t.id, c.rnk;

db<>fiddle here

This assumes that the chosen separator ; never appears in your strings. (Just like , can never appear.)

The regular expression pattern is the key: '((?:.*?,){4}.*?),'

(?:) ... “non-capturing” set of parentheses

() ... “capturing” set of parentheses

*? ... non-greedy quantifier

{4}? ... sequence of exactly 4 matches

The replacement '\1;' contains the back-reference \1.

'g' as fourth function parameter is required for repeated replacement.

Further reading:

  • PostgreSQL & regexp_split_to_array + unnest
  • Apply `trim()` and `regexp_replace()` on text array
  • PostgreSQL unnest() with element number

Other ways to solve this include a recursive CTE or a set-returning function ...

Fill from right to left

(Like you added in How to put values starting from the right side into columns?)

Simply count down numbers like:

SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 5) AS c1
, split_part(c.csv5, ', ', 4) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 2) AS c4
, split_part(c.csv5, ', ', 1) AS c5
FROM ...

db<>fiddle here

How to split a comma-separated value to columns

CREATE FUNCTION [dbo].[fn_split_string_to_column] (
@string NVARCHAR(MAX),
@delimiter CHAR(1)
)
RETURNS @out_put TABLE (
[column_id] INT IDENTITY(1, 1) NOT NULL,
[value] NVARCHAR(MAX)
)
AS
BEGIN
DECLARE @value NVARCHAR(MAX),
@pos INT = 0,
@len INT = 0

SET @string = CASE
WHEN RIGHT(@string, 1) != @delimiter
THEN @string + @delimiter
ELSE @string
END

WHILE CHARINDEX(@delimiter, @string, @pos + 1) > 0
BEGIN
SET @len = CHARINDEX(@delimiter, @string, @pos + 1) - @pos
SET @value = SUBSTRING(@string, @pos, @len)

INSERT INTO @out_put ([value])
SELECT LTRIM(RTRIM(@value)) AS [column]

SET @pos = CHARINDEX(@delimiter, @string, @pos + @len) + 1
END

RETURN
END

Split Comma Separated values into multiple column

Your sample data may not need any splitting. You want to move the data to a column based on the value it finds. You can do this a bit simpler than splitting the data. This works just fine for your sample data.

declare @Something table
(
Combined_Column varchar(10)
)

insert @Something values
('1,2,3')
, ('2')
, ('1,3')
, ('1,2,3,4')
, ('1,3,4')
, ('1')
, ('4')

select *
, col1 = case when charindex('1', s.Combined_Column) > 0 then 1 end
, col2 = case when charindex('2', s.Combined_Column) > 0 then 2 end
, col3 = case when charindex('3', s.Combined_Column) > 0 then 3 end
, col4 = case when charindex('4', s.Combined_Column) > 0 then 4 end
from @Something s

How to split comma separated strings in a column into different columns if they're not of same length using python or pandas in jupyter notebook

We can use a regular expression pattern to find all the matching key-value pairs from each row of column_A , then map the list of pairs from each row to dictionary in order to create records then construct a dataframe from these records

pd.DataFrame(map(dict, df['column_A'].str.findall(r'\s*([^:,]+):\s*([^,]+)')))

See the online regex demo

        Garbage Organics          Recycle   Junk
0 Tissues Milk Cardboards NaN
1 Paper Towels Eggs Glass Feces
2 cups NaN Plastic bottles NaN

Here is an alternate approach in case you don't want to use regular expression patterns

df['column_A'].str.split(', ').explode()\
.str.split(': ', expand=True)\
.set_index(0, append=True)[1].unstack()

Split data frame string column into multiple columns (comma separated characters)

One option using str_split, unnest_longer and table

subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")

df1 <- data.frame(subject, letters)


library(tidyverse)

df1 %>%
mutate(letters = str_split(letters, ', ')) %>%
unnest_longer(letters) %>%
table
#> letters
#> subject a b f g l m z
#> 1 1 1 1 1 0 0 0
#> 2 0 1 0 1 1 1 0
#> 3 0 0 0 1 0 1 1

Created on 2022-02-10 by the reprex package (v2.0.0)


Seeing some of the other answers, separate_rows is a better solution here

df1 %>%
separate_rows(letters) %>%
table

How to Split a column into two by comma delimiter, and put a value without comma in second column and not in first?

We can try using str.extract here:

df["Location"] = df["Origin"].str.extract(r'(.*),')
df["Country"] = df["Origin"].str.extract(r'(\w+(?: \w+)*)$')


Related Topics



Leave a reply



Submit