Trying to Split One Column to Multiple Columns Using Snowflake SQL

Trying to split one column to multiple columns using Snowflake sql

It seems the issue is related to handling the "(s)" parts:

with XYZ as (
select 'Attachments: 1, Circuit: North America, Language: English, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language(s): English,Spanish, Words: 500' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: Biotechnology Newsline [National], Language(s): English, Words: 600' as SALES_ORDER_ITEM_DESCRIPTION
)
SELECT
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Circuit:\\s([a-zA-Z0-9 \\[\\]]+)(,\\s|$)', 1, 1, 'e') AS "Circuit",
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Language[()s]*:\\s([a-zA-Z0-9\\, ]+)(,\\s|$)', 1, 1, 'e') AS "Language"
FROM XYZ;

+-----------------------------------+-----------------+
| Circuit | Language |
+-----------------------------------+-----------------+
| North America | English |
| North America | English,Spanish |
| Biotechnology Newsline [National] | English |
+-----------------------------------+-----------------+

Snowflake - how to split a single field (VARIANT) into multiple columns

Since that's stored as a string (per the question title) you can use PARSE_JSON to turn it into a variant and refer to the field names:

with X as
(
select parse_json($$
{
"FIELD1_CD": "M1",
"FIELD1_CD_DESCR": "MEDICAL1",
"FIELD2_SRC_CD": "M2",
"FIELD2_SRC_CD_DESCR": "MEDICAL2",
"FIELD3_CD": "M3",
"FIELD3_CD_DESCR": "MEDICAL3",
"FIELD4_SRC_CD": "M4",
"FIELD4_SRC_CD_DESCR": "MEDICAL4",
"FIELD5_CD": "MN5",
"FIELD5_CD_DESCR": "STATE5",
"FIELD6_ST_SRC_CD": "MN6",
"FIELD6_ST_SRC_CD_DESCR": "STATE6"
}
$$) as JSON
)
select JSON:FIELD1_CD::string as FIELD1_CD
,JSON:FIELD1_CD_DESCR::string as FIELD2_DESCR
from X

If that column is defined as a string, you should consider changing the column type to variant. This will internally columnarize the JSON fields for better performance, and there will be no parsing required using the PARSE_JSON function.

Edit: From the question update, since it's already a variant you can refer to it as shown in the final select statement here:

create or replace table T1(V variant);

insert into T1
select parse_json($$
{
"FIELD1_CD": "M1",
"FIELD1_CD_DESCR": "MEDICAL1",
"FIELD2_SRC_CD": "M2",
"FIELD2_SRC_CD_DESCR": "MEDICAL2",
"FIELD3_CD": "M3",
"FIELD3_CD_DESCR": "MEDICAL3",
"FIELD4_SRC_CD": "M4",
"FIELD4_SRC_CD_DESCR": "MEDICAL4",
"FIELD5_CD": "MN5",
"FIELD5_CD_DESCR": "STATE5",
"FIELD6_ST_SRC_CD": "MN6",
"FIELD6_ST_SRC_CD_DESCR": "STATE6"
}
$$) as JSON
;

select V:FIELD1_CD::string as FIELD1_CD
,V:FIELD1_CD_DESCR::string as FIELD1_CD_DESCR
from T1

Snowflake split string into columns

Assuming the maximum number of columns is known in advance STRTOK_TO_ARRAY could be used:

SELECT STRTOK_TO_ARRAY(DELIVERY_ID, '|') AS arr
,arr[0]::string AS delivery_item_0
,arr[1]::string AS delivery_item_1
,arr[2]::string AS delivery_item_2
,arr[3]::string AS delivery_item_3
--,...
FROM tab

Split the column to multiple column based on the three diff categories--new scenario

It seems things are getting complicated. To handle this new format, you need to do some changes on how to extract languages.

Instead of using one simple regexp_substr, I used two. One is for handling "Language(s): xxxx" and the other one is to handle "Language: x,y,z" (including single language), and combine them using NVL.

with XYZ as (
select 'Attachments: 1, Circuit: North America, Language: English, Language: Dutch, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language: English, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language(s): English,Spanish, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language: English, Language: Dutch, Language: German, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
)
SELECT
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Circuit:\\s([a-zA-Z0-9 \\[\\]]+)(,\\s|$)', 1, 1, 'e') AS "Circuit",
NVL( regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Language\\(s\\):\\s([a-zA-Z0-9\\, ]+)(,\\s|$)', 1, 1, 'e'),
REGEXP_REPLACE( regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, '(Language:.*), Words', 1, 1, 'e'), '[ ]*Language: ' )) AS "LanguageString"
FROM XYZ;

+---------------+----------------------+
| Circuit | LanguageString |
+---------------+----------------------+
| North America | English,Dutch |
| North America | English |
| North America | English,Spanish |
| North America | English,Dutch,German |
+---------------+----------------------+

Splitting comma separated values to multiple columns

You are looking for split_part

SPLIT_PART(<string>, <delimiter>, <partNumber>)

Just be careful with column names though. Some of them could be reserved keywords in SQL and might throw an error

with your_table as 
(select 'date,time,size,r_version,r_arch,r_os,package,version,country,ip_id' as col)

select split_part(col,',',1) as date,
split_part(col,',',2) as time,
split_part(col,',',3) as size,
split_part(col,',',4) as r_version,
split_part(col,',',5) as r_arch,
split_part(col,',',6) as r_os,
split_part(col,',',7) as package,
split_part(col,',',8) as version,
split_part(col,',',9) as country,
split_part(col,',',10) as ip_id
from your_table;

Split Address Column value using regular expression in snowflake sql

You can use REGEXP_SUBSTR function to extract the subgroups from the matched expression:

select 
REGEXP_SUBSTR( address, '([0-9]+) (.*) (APT|#|UNIT|NBR|STE|SUITE|BLDG|BUILDING) (.*)',1,1,'e',1 ) StreetNum,
REGEXP_SUBSTR( address, '([0-9]+) (.*) (APT|#|UNIT|NBR|STE|SUITE|BLDG|BUILDING) (.*)',1,1,'e',2 ) StreetName,
REGEXP_SUBSTR( address, '([0-9]+) (.*) (APT|#|UNIT|NBR|STE|SUITE|BLDG|BUILDING) (.*)',1,1,'e',3 ) UnitType,
REGEXP_SUBSTR( address, '([0-9]+) (.*) (APT|#|UNIT|NBR|STE|SUITE|BLDG|BUILDING) (.*)',1,1,'e',4 ) Unit
from values
('616 NE CHERY DR UNIT A1008'),
('740 NE 3RD ST # 3-1999'),
('13456 SW HAKS BTPD ST APT 1052'),
('460 MAIN ST BUILDING C STE 480') tmp (address);

+-----------+--------------------+----------+--------+
| STREETNUM | STREETNAME | UNITTYPE | UNIT |
+-----------+--------------------+----------+--------+
| 616 | NE CHERY DR | UNIT | A1008 |
| 740 | NE 3RD ST | # | 3-1999 |
| 13456 | SW HAKS BTPD ST | APT | 1052 |
| 460 | MAIN ST BUILDING C | STE | 480 |
+-----------+--------------------+----------+--------+

REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html



Related Topics



Leave a reply



Submit