Split comma separated column data into additional columns
If the number of fields in the CSV is constant then you could do something like this:
select a[1], a[2], a[3], a[4]
from (
select regexp_split_to_array('a,b,c,d', ',')
) as dt(a)
For example:
=> select a[1], a[2], a[3], a[4] from (select regexp_split_to_array('a,b,c,d', ',')) as dt(a);
a | a | a | a
---+---+---+---
a | b | c | d
(1 row)
If the number of fields in the CSV is not constant then you could get the maximum number of fields with something like this:
select max(array_length(regexp_split_to_array(csv, ','), 1))
from your_table
and then build the appropriate a[1], a[2], ..., a[M]
column list for your query. So if the above gave you a max of 6, you'd use this:
select a[1], a[2], a[3], a[4], a[5], a[6]
from (
select regexp_split_to_array(csv, ',')
from your_table
) as dt(a)
You could combine those two queries into a function if you wanted.
For example, give this data (that's a NULL in the last row):
=> select * from csvs;
csv
-------------
1,2,3
1,2,3,4
1,2,3,4,5,6
(4 rows)
=> select max(array_length(regexp_split_to_array(csv, ','), 1)) from csvs;
max
-----
6
(1 row)
=> select a[1], a[2], a[3], a[4], a[5], a[6] from (select regexp_split_to_array(csv, ',') from csvs) as dt(a);
a | a | a | a | a | a
---+---+---+---+---+---
1 | 2 | 3 | | |
1 | 2 | 3 | 4 | |
1 | 2 | 3 | 4 | 5 | 6
| | | | |
(4 rows)
Since your delimiter is a simple fixed string, you could also use string_to_array
instead of regexp_split_to_array
:
select ...
from (
select string_to_array(csv, ',')
from csvs
) as dt(a);
Thanks to Michael for the reminder about this function.
You really should redesign your database schema to avoid the CSV column if at all possible. You should be using an array column or a separate table instead.
Split comma separated values into target table with fixed number of columns
It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.
While stuck with your current situation ...
For known small maximum number of elements
A simple solution without trickery or recursion will do:
SELECT id, 1 AS rnk
, split_part(csv, ', ', 1) AS c1
, split_part(csv, ', ', 2) AS c2
, split_part(csv, ', ', 3) AS c3
, split_part(csv, ', ', 4) AS c4
, split_part(csv, ', ', 5) AS c5
FROM tbl
WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows
UNION ALL
SELECT id, 2
, split_part(csv, ', ', 6)
, split_part(csv, ', ', 7)
, split_part(csv, ', ', 8)
, split_part(csv, ', ', 9)
, split_part(csv, ', ', 10)
FROM tbl
WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows
-- three more blocks to cover a maximum "around 20"
ORDER BY id, rnk;
db<>fiddle here
id
being the PK of the original table.
This assumes ', ' as separator, obviously.
You can adapt easily.
Related:
- Split comma separated column data into additional columns
For unknown number of elements
Various ways. One way use regexp_replace()
to replace every fifth separator before unnesting ...
-- for any number of elements
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 1) AS c1
, split_part(c.csv5, ', ', 2) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 4) AS c4
, split_part(c.csv5, ', ', 5) AS c5
FROM tbl t
, unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '\1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER BY t.id, c.rnk;
db<>fiddle here
This assumes that the chosen separator ;
never appears in your strings. (Just like ,
can never appear.)
The regular expression pattern is the key: '((?:.*?,){4}.*?),'
(?:)
... “non-capturing” set of parentheses()
... “capturing” set of parentheses*?
... non-greedy quantifier{4}?
... sequence of exactly 4 matches
The replacement '\1;'
contains the back-reference \1
.
'g'
as fourth function parameter is required for repeated replacement.
Further reading:
- PostgreSQL & regexp_split_to_array + unnest
- Apply `trim()` and `regexp_replace()` on text array
- PostgreSQL unnest() with element number
Other ways to solve this include a recursive CTE or a set-returning function ...
Fill from right to left
(Like you added in How to put values starting from the right side into columns?)
Simply count down numbers like:
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 5) AS c1
, split_part(c.csv5, ', ', 4) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 2) AS c4
, split_part(c.csv5, ', ', 1) AS c5
FROM ...
db<>fiddle here
How to split a comma-separated value to columns
CREATE FUNCTION [dbo].[fn_split_string_to_column] (
@string NVARCHAR(MAX),
@delimiter CHAR(1)
)
RETURNS @out_put TABLE (
[column_id] INT IDENTITY(1, 1) NOT NULL,
[value] NVARCHAR(MAX)
)
AS
BEGIN
DECLARE @value NVARCHAR(MAX),
@pos INT = 0,
@len INT = 0
SET @string = CASE
WHEN RIGHT(@string, 1) != @delimiter
THEN @string + @delimiter
ELSE @string
END
WHILE CHARINDEX(@delimiter, @string, @pos + 1) > 0
BEGIN
SET @len = CHARINDEX(@delimiter, @string, @pos + 1) - @pos
SET @value = SUBSTRING(@string, @pos, @len)
INSERT INTO @out_put ([value])
SELECT LTRIM(RTRIM(@value)) AS [column]
SET @pos = CHARINDEX(@delimiter, @string, @pos + @len) + 1
END
RETURN
END
Split Comma Separated values into multiple column
Your sample data may not need any splitting. You want to move the data to a column based on the value it finds. You can do this a bit simpler than splitting the data. This works just fine for your sample data.
declare @Something table
(
Combined_Column varchar(10)
)
insert @Something values
('1,2,3')
, ('2')
, ('1,3')
, ('1,2,3,4')
, ('1,3,4')
, ('1')
, ('4')
select *
, col1 = case when charindex('1', s.Combined_Column) > 0 then 1 end
, col2 = case when charindex('2', s.Combined_Column) > 0 then 2 end
, col3 = case when charindex('3', s.Combined_Column) > 0 then 3 end
, col4 = case when charindex('4', s.Combined_Column) > 0 then 4 end
from @Something s
How to split comma separated strings in a column into different columns if they're not of same length using python or pandas in jupyter notebook
We can use a regular expression pattern to find all the matching key-value pairs from each row of column_A
, then map
the list of pairs from each row to dictionary in order to create records then construct a dataframe from these records
pd.DataFrame(map(dict, df['column_A'].str.findall(r'\s*([^:,]+):\s*([^,]+)')))
See the online regex demo
Garbage Organics Recycle Junk
0 Tissues Milk Cardboards NaN
1 Paper Towels Eggs Glass Feces
2 cups NaN Plastic bottles NaN
Here is an alternate approach in case you don't want to use regular expression patterns
df['column_A'].str.split(', ').explode()\
.str.split(': ', expand=True)\
.set_index(0, append=True)[1].unstack()
Split data frame string column into multiple columns (comma separated characters)
One option using str_split
, unnest_longer
and table
subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")
df1 <- data.frame(subject, letters)
library(tidyverse)
df1 %>%
mutate(letters = str_split(letters, ', ')) %>%
unnest_longer(letters) %>%
table
#> letters
#> subject a b f g l m z
#> 1 1 1 1 1 0 0 0
#> 2 0 1 0 1 1 1 0
#> 3 0 0 0 1 0 1 1
Created on 2022-02-10 by the reprex package (v2.0.0)
Seeing some of the other answers, separate_rows
is a better solution here
df1 %>%
separate_rows(letters) %>%
table
How to Split a column into two by comma delimiter, and put a value without comma in second column and not in first?
We can try using str.extract
here:
df["Location"] = df["Origin"].str.extract(r'(.*),')
df["Country"] = df["Origin"].str.extract(r'(\w+(?: \w+)*)$')
Related Topics
Conversion Failed When Converting Date And/Or Time from Character String While Inserting Datetime
How Rownum Works in Pagination Query
Best Way to Select Random Rows Postgresql
How to Populate a Table With a Range of Dates
How to Use Case Statement in a Join Condition
Postgresql "Column Does Not Exist" But It Actually Does
Difference Between Scope_Identity(), Identity(), @@Identity, and Ident_Current()
Any Downsides of Using Data Type "Text" For Storing Strings
SQL Left Join VS Multiple Tables on from Line
How to Delete from Multiple Tables in MySQL
How to List All Foreign Keys Referencing a Given Table in SQL Server
Error: Tcp Provider: Error Code 0X2746. During the SQL Setup in Linux Through Terminal