Parser for Oracle SQL
Have you considered General SQL Parser? I don't have any experience with it myself but browsing their website it has potential. Personally I have rolled my own built on the parser in Eclipse Data Tools Platform (sorry I can't share, it's proprietary), but now I will have to evaluate the one I linked above because it claims to have more coverage of Oracle SQL than my parser does.
HTML Parser in Oracle
the above is working for some values but it is breaking giving the error ORA-06502: PL/SQL: numeric or value error ORA-06512: at "SYS.XMLTYPE", line 272
It will get that error if you have any NOTE
values which do not have a <br>
tag, because this:
SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)
will then be null, and xmltype()
then throw that error.
If - and it's a big if - all of the notes start with a simple table with no embedded problematic tags, and that may or may not be followed be a line break tag, then you can use a case expression to only do the substr when needed:
with tbl as
(
SELECT ROW_ID,
xmltype(
CASE
WHEN INSTR(qe.NOTE, '<br>') > 0
THEN SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)
ELSE qe.NOTE
END
) xml_data
FROM MY_Table qe
WHERE EVENT='note'
)
...
Or perhaps slightly more robustly, look for and extract a table:
with tbl as
(
SELECT ROW_ID,
xmltype(SUBSTR(qe.NOTE, INSTR(qe.NOTE, '<table>'), INSTR(qe.NOTE, '</table>') + 7)) xml_data
FROM MY_Table qe
WHERE EVENT='note'
AND INSTR(qe.NOTE, '<table>') > 0
)
...
db<>fiddle
But as you already know, this kind of approach is fraught with problems.
Parse Json using Oracle SQL
I suppose "nested" will do the trick
select * from json_Table('{"Rownum": "1", "Name": "John", "AddressArray":["Address1", "Address2"], "TextObj":[{"mName" : "Carol","lName" : "Cena",}]}', '$' columns (rownr number path '$.Rownum',
name varchar2(100) path '$.Name',
mName varchar2(100) path '$.TextObj[*].mName',
lName varchar2(100) path '$.TextObj[*].lName',
nested path '$.AddressArray[*]' columns(AddressArray varchar2(100) path '$')
));
My output:
ROWNR | NAME | MNAME | LNAME | ADDRESSARRAY |
---|---|---|---|---|
1 | John | Carol | Cena | Address1 |
1 | John | Carol | Cena | Address2 |
Oracle SQL: Parse JSON in PL/SQL to table or array
From Oracle 12, you can do all the parsing in an SQL query. Your main issue is not the JSON but that you have your data in strings and you will need to split those into lines and then into value and correlate the values with the headers:
SELECT p.*
FROM (
SELECT l.lineno,
kv.key,
kv.value
FROM table_name t
CROSS APPLY JSON_TABLE(
t.value,
'$.result.optimizationData[*]?(@.name == "unit_out")'
COLUMNS
content CLOB PATH '$.content'
) j
CROSS JOIN LATERAL (
SELECT LEVEL AS lineno,
REGEXP_SUBSTR(j.content, '.+', 1, 1 ) AS header,
REGEXP_SUBSTR(j.content, '.+', 1, LEVEL ) AS line
FROM DUAL
WHERE LEVEL > 1
CONNECT BY LEVEL <= REGEXP_COUNT(j.content, '.+')
) l
CROSS JOIN LATERAL (
SELECT CAST(REGEXP_SUBSTR(header, '[^;]+', 1, LEVEL) AS VARCHAR2(4000))
AS key,
CAST(REGEXP_SUBSTR(line, '[^;]+', 1, LEVEL) AS VARCHAR2(4000))
AS value
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(header, '[^;]+')
) kv
) pt
PIVOT (
MAX(value)
FOR key IN (
'ID_LIMIT' AS id_limit,
'CODE_UNIT' AS code_limit,
'TIME_STAMP_FROM' AS time_stamp_from,
'TIME_STAMP_TO' AS time_stamp_to,
'VARIABLE' AS variable,
'VALUE' AS value
)
) p
Which, for the sample data:
CREATE TABLE table_name (value BLOB CHECK (value IS JSON));
INSERT INTO table_name (value) VALUES (
'{
"statusCode": 200,
"isValid": true,
"errors": [],
"result": {
"optimizationData": [
{
"name": "out",
"content": "ID_LIMIT;TIME_STAMP_FROM;DIRECTION;ID_MODEL_CONSTRAINT\n1;202109222200;G;2_7_1_G\n1;202109232200;G;2_3_1_G\n2;202109222200;G;2_3_1_G\n3;202109222200;G;3_3_1_P\n"
},
{
"name": "unit_out",
"content": "ID_LIMIT;CODE_UNIT;TIME_STAMP_FROM;TIME_STAMP_TO;VARIABLE;VALUE\n1;BEL 2-02;202109222200;202109232200;RelaxationPlus;10\n1;BEL 2-05;202109222200;202109232200;RelaxationPlus;10\n2;WLO 1-01;202109222200;202109232200;RelaxationMinus;10\n"
}
]
}
}'
);
Outputs:
LINENO ID_LIMIT CODE_LIMIT TIME_STAMP_FROM TIME_STAMP_TO VARIABLE VALUE 2 1 BEL 2-02 202109222200 202109232200 RelaxationPlus 10 3 1 BEL 2-05 202109222200 202109232200 RelaxationPlus 10 4 2 WLO 1-01 202109222200 202109232200 RelaxationMinus 10 Parse HTML table with Oracle
Your
path
is looking for atd
under thetr
; but there are two, hence the "got multi-item sequence" error you're seeing. You can reference eachtd
tag by its position, astd[1]
etc. It's very reliant on the table structure being as expected though.With this specific example you can do:
with tbl as
(
select xmltype('
<table>
<tbody>
<tr class="blue"><td>code</td><td>rate</td></tr>
<tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr>
<tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr>
<tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr>
<tr class="blue"><td>some comment</td><td>some comment</td></tr>
<tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr>
</tbody>
</table>
') xml_data from dual
)
select
x.class, x.currency, x.amount, to_number(x.rate) as rate
from
tbl
cross join
xmltable('/table/tbody/tr'
passing tbl.xml_data
columns
class varchar2(10) path '@class',
currency varchar2(3) path 'td[1]/span',
amount number path 'td[1]/em',
rate varchar2(50) path 'td[2]'
) x
where
x.currency is not nullwhich gets:
CLASS CUR AMOUNT RATE
---------- --- ---------- ----------
gray_1 USD 1 476.16
gray_2 AUD 1 327.65
gray_9 IRR 100 1.13
gray_1 EUR 1 526.54It won't take much variation in the HTML to break it though. See this answer for some reasons it is fragile, and why it generally considered unwise to try to parse HTML as XML.
How to parse XML in Oracle
What you've shown, with balanced parentheses, works:
Select PhotoNumber, extract(xmltype(photoinfo,1), '(/DataIM/PhotoChosen/Photo/jpg)[1]') "photoinfo"
From ProfilePictures
Where photosourcetype = 10PHOTONUMBER | photoinfo
----------- | -----------------------
42 | <jpg>my_photo.jpg</jpg>though I'm not sure what the parentheses are doing, it works without too. You probably only want the actual string, not the
jpg
node and its contents; so you can add/text()
to the XPath to get that.But
extract
is deprecated, so it would be better to use XMLQuery:Select PhotoNumber,
XMLQuery(
'/DataIM/PhotoChosen/Photo/jpg[1]/text()'
passing xmltype(photoinfo, 1)
returning content
).getstringval() "photoinfo"
From ProfilePictures
Where photosourcetype = 10PHOTONUMBER | photoinfo
----------- | ------------
42 | my_photo.jpgdb<>fiddle
I've kept your quoted identifier, but I'd suggest you avoid those unless really necessary.
You are converting the BLOB to XML using character set identifier 1. Hopefully that is correct and what you want. You can see which charcacter set that actually is with:
select nls_charset_name(1) from dual;
which is reported as US7ASCII on my system, and I suspect everywhere as that's the old default.
Passing 0 instead uses the database character set. You need to know what character set the BLOB data uses though. If you know that you can look up the correct number to use with, for example:
select nls_charset_id('UTF8') from dual;
or pass it in directly in the
xmltype()
call:...
passing xmltype(photoinfo, nls_charset_id('UTF8'))
...Why is there no decent sql parser?
Good parsers are hard to write. That starts with the code generator for the parser code (which usually eats some (E)BNF-like syntax which has its own limitations).
Error handling in parsers is a research topic of its own. This is not only about detecting errors but also giving useful information what could be wrong and how to solve it. Some parsers don't even offer location information ("error happened at line/column").
Next, you have SQL which means "Structured Query Language", not "Standard Query Language". There is a SQL standard, even several, but you won't find a single database which implements any of them.
Oracle grudgingly offers VARCHAR but you better use VARCHAR2. Some databases offer recursive/tree-like queries. All of them use their own, special syntax for this. Joining is defined pretty clearly in the standard (
join
,left join
, ...) but why bother if you can use+
?On top of that, for every database version, new features are added to the grammar.
So while you could write a parser that can read the standard cases, writing a parser that can support all the features which all the databases around the globe offer, is nearly impossible. And I'm not even talking about the bugs which you can encounter in these parsers.
One solution would be if all database vendors would publish the grammar files. But these are crown jewels (IP). So you should be happy that you can use them without having to pay a license fee per parsed character * number of CPUs.
Related Topics
SQL Server: Two-Level Group by with Xml Output
Fast Way to Generate Concatenated Strings in Oracle
How to Find Left Outer Join or Right Outer Join with Oracle Join (+)
How to Escape Square Brackets Inside Square Brackets for Field Name
Is There an Oracle Equivalent to SQL Server's Output Inserted.*
Why Is Rand() Not Producing Random Numbers
Fetch Records That Are Non Zero After the Decimal Point in Postgresql
Can't Create Stored Procedure with Table Output Parameter
How to Use a Ring Data Structure in Window Functions
How to Exclude a Column from Select Query
Ora-00933: SQL Command Not Properly Ended
Sorting on the Server or on the Client
Bulk Insert into Oracle Database: Which Is Better: for Cursor Loop or a Simple Select
Connect by Clause in Regex_Substr