Trying to split one column to multiple columns using Snowflake sql
It seems the issue is related to handling the "(s)" parts:
with XYZ as (
select 'Attachments: 1, Circuit: North America, Language: English, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language(s): English,Spanish, Words: 500' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: Biotechnology Newsline [National], Language(s): English, Words: 600' as SALES_ORDER_ITEM_DESCRIPTION
)
SELECT
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Circuit:\\s([a-zA-Z0-9 \\[\\]]+)(,\\s|$)', 1, 1, 'e') AS "Circuit",
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Language[()s]*:\\s([a-zA-Z0-9\\, ]+)(,\\s|$)', 1, 1, 'e') AS "Language"
FROM XYZ;
+-----------------------------------+-----------------+
| Circuit | Language |
+-----------------------------------+-----------------+
| North America | English |
| North America | English,Spanish |
| Biotechnology Newsline [National] | English |
+-----------------------------------+-----------------+
Snowflake - how to split a single field (VARIANT) into multiple columns
Since that's stored as a string (per the question title) you can use PARSE_JSON to turn it into a variant and refer to the field names:
with X as
(
select parse_json($$
{
"FIELD1_CD": "M1",
"FIELD1_CD_DESCR": "MEDICAL1",
"FIELD2_SRC_CD": "M2",
"FIELD2_SRC_CD_DESCR": "MEDICAL2",
"FIELD3_CD": "M3",
"FIELD3_CD_DESCR": "MEDICAL3",
"FIELD4_SRC_CD": "M4",
"FIELD4_SRC_CD_DESCR": "MEDICAL4",
"FIELD5_CD": "MN5",
"FIELD5_CD_DESCR": "STATE5",
"FIELD6_ST_SRC_CD": "MN6",
"FIELD6_ST_SRC_CD_DESCR": "STATE6"
}
$$) as JSON
)
select JSON:FIELD1_CD::string as FIELD1_CD
,JSON:FIELD1_CD_DESCR::string as FIELD2_DESCR
from X
If that column is defined as a string, you should consider changing the column type to variant. This will internally columnarize the JSON fields for better performance, and there will be no parsing required using the PARSE_JSON function.
Edit: From the question update, since it's already a variant you can refer to it as shown in the final select statement here:
create or replace table T1(V variant);
insert into T1
select parse_json($$
{
"FIELD1_CD": "M1",
"FIELD1_CD_DESCR": "MEDICAL1",
"FIELD2_SRC_CD": "M2",
"FIELD2_SRC_CD_DESCR": "MEDICAL2",
"FIELD3_CD": "M3",
"FIELD3_CD_DESCR": "MEDICAL3",
"FIELD4_SRC_CD": "M4",
"FIELD4_SRC_CD_DESCR": "MEDICAL4",
"FIELD5_CD": "MN5",
"FIELD5_CD_DESCR": "STATE5",
"FIELD6_ST_SRC_CD": "MN6",
"FIELD6_ST_SRC_CD_DESCR": "STATE6"
}
$$) as JSON
;
select V:FIELD1_CD::string as FIELD1_CD
,V:FIELD1_CD_DESCR::string as FIELD1_CD_DESCR
from T1
Snowflake split string into columns
Assuming the maximum number of columns is known in advance STRTOK_TO_ARRAY could be used:
SELECT STRTOK_TO_ARRAY(DELIVERY_ID, '|') AS arr
,arr[0]::string AS delivery_item_0
,arr[1]::string AS delivery_item_1
,arr[2]::string AS delivery_item_2
,arr[3]::string AS delivery_item_3
--,...
FROM tab
Split the column to multiple column based on the three diff categories--new scenario
It seems things are getting complicated. To handle this new format, you need to do some changes on how to extract languages.
Instead of using one simple regexp_substr, I used two. One is for handling "Language(s): xxxx" and the other one is to handle "Language: x,y,z" (including single language), and combine them using NVL.
with XYZ as (
select 'Attachments: 1, Circuit: North America, Language: English, Language: Dutch, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language: English, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language(s): English,Spanish, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language: English, Language: Dutch, Language: German, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
)
SELECT
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Circuit:\\s([a-zA-Z0-9 \\[\\]]+)(,\\s|$)', 1, 1, 'e') AS "Circuit",
NVL( regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Language\\(s\\):\\s([a-zA-Z0-9\\, ]+)(,\\s|$)', 1, 1, 'e'),
REGEXP_REPLACE( regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, '(Language:.*), Words', 1, 1, 'e'), '[ ]*Language: ' )) AS "LanguageString"
FROM XYZ;
+---------------+----------------------+
| Circuit | LanguageString |
+---------------+----------------------+
| North America | English,Dutch |
| North America | English |
| North America | English,Spanish |
| North America | English,Dutch,German |
+---------------+----------------------+
Splitting comma separated values to multiple columns
You are looking for split_part
SPLIT_PART(<string>, <delimiter>, <partNumber>)
Just be careful with column names though. Some of them could be reserved keywords in SQL and might throw an error
with your_table as
(select 'date,time,size,r_version,r_arch,r_os,package,version,country,ip_id' as col)
select split_part(col,',',1) as date,
split_part(col,',',2) as time,
split_part(col,',',3) as size,
split_part(col,',',4) as r_version,
split_part(col,',',5) as r_arch,
split_part(col,',',6) as r_os,
split_part(col,',',7) as package,
split_part(col,',',8) as version,
split_part(col,',',9) as country,
split_part(col,',',10) as ip_id
from your_table;
Split Address Column value using regular expression in snowflake sql
You can use REGEXP_SUBSTR function to extract the subgroups from the matched expression:
select
REGEXP_SUBSTR( address, '([0-9]+) (.*) (APT|#|UNIT|NBR|STE|SUITE|BLDG|BUILDING) (.*)',1,1,'e',1 ) StreetNum,
REGEXP_SUBSTR( address, '([0-9]+) (.*) (APT|#|UNIT|NBR|STE|SUITE|BLDG|BUILDING) (.*)',1,1,'e',2 ) StreetName,
REGEXP_SUBSTR( address, '([0-9]+) (.*) (APT|#|UNIT|NBR|STE|SUITE|BLDG|BUILDING) (.*)',1,1,'e',3 ) UnitType,
REGEXP_SUBSTR( address, '([0-9]+) (.*) (APT|#|UNIT|NBR|STE|SUITE|BLDG|BUILDING) (.*)',1,1,'e',4 ) Unit
from values
('616 NE CHERY DR UNIT A1008'),
('740 NE 3RD ST # 3-1999'),
('13456 SW HAKS BTPD ST APT 1052'),
('460 MAIN ST BUILDING C STE 480') tmp (address);
+-----------+--------------------+----------+--------+
| STREETNUM | STREETNAME | UNITTYPE | UNIT |
+-----------+--------------------+----------+--------+
| 616 | NE CHERY DR | UNIT | A1008 |
| 740 | NE 3RD ST | # | 3-1999 |
| 13456 | SW HAKS BTPD ST | APT | 1052 |
| 460 | MAIN ST BUILDING C | STE | 480 |
+-----------+--------------------+----------+--------+
REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html
Related Topics
Find Most Common Elements in Array with a Group By
How to Perform the Same Aggregation on Every Column, Without Listing the Columns
Create Users Dynamic Names and Assgin Roles
Find Only Capital Letters in Word Through in SQL Server Query
Oracle (11.2.0.1.0) - Recursive Cte with a Date Expression
Grant Access to Just One Schema in Postgresql
Date Calculation with Parameter in Ssis Is Not Giving the Correct Result
Conditions in Left Join (Outer Join) VS Inner Join
From Keyword Not Found Where Expected (Oracle SQL)
Select * from Table or Select Id,Field1, Field2, Field3 from Table - Best Practice
Odd Inner Join Syntax and Encapsulation
How to Use a Calculated Column by Another Calculated Column
Why Is My Case Expression Non-Deterministic
St_Hexagongrid Geom Vector to Find All Points