Parser for Oracle SQL

Parser for Oracle SQL

Have you considered General SQL Parser? I don't have any experience with it myself but browsing their website it has potential. Personally I have rolled my own built on the parser in Eclipse Data Tools Platform (sorry I can't share, it's proprietary), but now I will have to evaluate the one I linked above because it claims to have more coverage of Oracle SQL than my parser does.

HTML Parser in Oracle

the above is working for some values but it is breaking giving the error ORA-06502: PL/SQL: numeric or value error ORA-06512: at "SYS.XMLTYPE", line 272

It will get that error if you have any NOTE values which do not have a <br> tag, because this:

SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)

will then be null, and xmltype() then throw that error.

If - and it's a big if - all of the notes start with a simple table with no embedded problematic tags, and that may or may not be followed be a line break tag, then you can use a case expression to only do the substr when needed:

with tbl as
(
SELECT ROW_ID,
xmltype(
CASE
WHEN INSTR(qe.NOTE, '<br>') > 0
THEN SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)
ELSE qe.NOTE
END
) xml_data
FROM MY_Table qe
WHERE EVENT='note'
)
...

Or perhaps slightly more robustly, look for and extract a table:

with tbl as
(
SELECT ROW_ID,
xmltype(SUBSTR(qe.NOTE, INSTR(qe.NOTE, '<table>'), INSTR(qe.NOTE, '</table>') + 7)) xml_data
FROM MY_Table qe
WHERE EVENT='note'
AND INSTR(qe.NOTE, '<table>') > 0
)
...

db<>fiddle

But as you already know, this kind of approach is fraught with problems.

Parse Json using Oracle SQL

I suppose "nested" will do the trick

select * from json_Table('{"Rownum": "1", "Name": "John", "AddressArray":["Address1", "Address2"], "TextObj":[{"mName" : "Carol","lName" : "Cena",}]}', '$' columns (rownr number path '$.Rownum',
name varchar2(100) path '$.Name',
mName varchar2(100) path '$.TextObj[*].mName',
lName varchar2(100) path '$.TextObj[*].lName',
nested path '$.AddressArray[*]' columns(AddressArray varchar2(100) path '$')
));

My output:




























ROWNRNAMEMNAMELNAMEADDRESSARRAY
1JohnCarolCenaAddress1
1JohnCarolCenaAddress2

Oracle SQL: Parse JSON in PL/SQL to table or array

From Oracle 12, you can do all the parsing in an SQL query. Your main issue is not the JSON but that you have your data in strings and you will need to split those into lines and then into value and correlate the values with the headers:

SELECT p.*
FROM (
SELECT l.lineno,
kv.key,
kv.value
FROM table_name t
CROSS APPLY JSON_TABLE(
t.value,
'$.result.optimizationData[*]?(@.name == "unit_out")'
COLUMNS
content CLOB PATH '$.content'
) j
CROSS JOIN LATERAL (
SELECT LEVEL AS lineno,
REGEXP_SUBSTR(j.content, '.+', 1, 1 ) AS header,
REGEXP_SUBSTR(j.content, '.+', 1, LEVEL ) AS line
FROM DUAL
WHERE LEVEL > 1
CONNECT BY LEVEL <= REGEXP_COUNT(j.content, '.+')
) l
CROSS JOIN LATERAL (
SELECT CAST(REGEXP_SUBSTR(header, '[^;]+', 1, LEVEL) AS VARCHAR2(4000))
AS key,
CAST(REGEXP_SUBSTR(line, '[^;]+', 1, LEVEL) AS VARCHAR2(4000))
AS value
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(header, '[^;]+')
) kv
) pt
PIVOT (
MAX(value)
FOR key IN (
'ID_LIMIT' AS id_limit,
'CODE_UNIT' AS code_limit,
'TIME_STAMP_FROM' AS time_stamp_from,
'TIME_STAMP_TO' AS time_stamp_to,
'VARIABLE' AS variable,
'VALUE' AS value
)
) p

Which, for the sample data:

CREATE TABLE table_name (value BLOB CHECK (value IS JSON));

INSERT INTO table_name (value) VALUES (
'{
"statusCode": 200,
"isValid": true,
"errors": [],
"result": {
"optimizationData": [
{
"name": "out",
"content": "ID_LIMIT;TIME_STAMP_FROM;DIRECTION;ID_MODEL_CONSTRAINT\n1;202109222200;G;2_7_1_G\n1;202109232200;G;2_3_1_G\n2;202109222200;G;2_3_1_G\n3;202109222200;G;3_3_1_P\n"
},
{
"name": "unit_out",
"content": "ID_LIMIT;CODE_UNIT;TIME_STAMP_FROM;TIME_STAMP_TO;VARIABLE;VALUE\n1;BEL 2-02;202109222200;202109232200;RelaxationPlus;10\n1;BEL 2-05;202109222200;202109232200;RelaxationPlus;10\n2;WLO 1-01;202109222200;202109232200;RelaxationMinus;10\n"
}
]
}
}'
);

Outputs:











































LINENOID_LIMITCODE_LIMITTIME_STAMP_FROMTIME_STAMP_TOVARIABLEVALUE
21BEL 2-02202109222200202109232200RelaxationPlus10
31BEL 2-05202109222200202109232200RelaxationPlus10
42WLO 1-01202109222200202109232200RelaxationMinus10

Parse HTML table with Oracle

Your path is looking for a td under the tr; but there are two, hence the "got multi-item sequence" error you're seeing. You can reference each td tag by its position, as td[1] etc. It's very reliant on the table structure being as expected though.

With this specific example you can do:

with tbl as
(
select xmltype('
<table>
<tbody>
<tr class="blue"><td>code</td><td>rate</td></tr>
<tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr>
<tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr>
<tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr>
<tr class="blue"><td>some comment</td><td>some comment</td></tr>
<tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr>
</tbody>
</table>
') xml_data from dual
)
select
x.class, x.currency, x.amount, to_number(x.rate) as rate
from
tbl
cross join
xmltable('/table/tbody/tr'
passing tbl.xml_data
columns
class varchar2(10) path '@class',
currency varchar2(3) path 'td[1]/span',
amount number path 'td[1]/em',
rate varchar2(50) path 'td[2]'
) x
where
x.currency is not null

which gets:

CLASS      CUR     AMOUNT       RATE
---------- --- ---------- ----------
gray_1 USD 1 476.16
gray_2 AUD 1 327.65
gray_9 IRR 100 1.13
gray_1 EUR 1 526.54

It won't take much variation in the HTML to break it though. See this answer for some reasons it is fragile, and why it generally considered unwise to try to parse HTML as XML.

How to parse XML in Oracle

What you've shown, with balanced parentheses, works:

Select PhotoNumber, extract(xmltype(photoinfo,1), '(/DataIM/PhotoChosen/Photo/jpg)[1]') "photoinfo"
From ProfilePictures
Where photosourcetype = 10
PHOTONUMBER | photoinfo              
----------- | -----------------------
42 | <jpg>my_photo.jpg</jpg>

though I'm not sure what the parentheses are doing, it works without too. You probably only want the actual string, not the jpg node and its contents; so you can add /text() to the XPath to get that.

But extract is deprecated, so it would be better to use XMLQuery:

Select PhotoNumber,
XMLQuery(
'/DataIM/PhotoChosen/Photo/jpg[1]/text()'
passing xmltype(photoinfo, 1)
returning content
).getstringval() "photoinfo"
From ProfilePictures
Where photosourcetype = 10
PHOTONUMBER | photoinfo   
----------- | ------------
42 | my_photo.jpg

db<>fiddle

I've kept your quoted identifier, but I'd suggest you avoid those unless really necessary.

You are converting the BLOB to XML using character set identifier 1. Hopefully that is correct and what you want. You can see which charcacter set that actually is with:

select nls_charset_name(1) from dual;

which is reported as US7ASCII on my system, and I suspect everywhere as that's the old default.

Passing 0 instead uses the database character set. You need to know what character set the BLOB data uses though. If you know that you can look up the correct number to use with, for example:

select nls_charset_id('UTF8') from dual;

or pass it in directly in the xmltype() call:

...
passing xmltype(photoinfo, nls_charset_id('UTF8'))
...

Why is there no decent sql parser?

Good parsers are hard to write. That starts with the code generator for the parser code (which usually eats some (E)BNF-like syntax which has its own limitations).

Error handling in parsers is a research topic of its own. This is not only about detecting errors but also giving useful information what could be wrong and how to solve it. Some parsers don't even offer location information ("error happened at line/column").

Next, you have SQL which means "Structured Query Language", not "Standard Query Language". There is a SQL standard, even several, but you won't find a single database which implements any of them.

Oracle grudgingly offers VARCHAR but you better use VARCHAR2. Some databases offer recursive/tree-like queries. All of them use their own, special syntax for this. Joining is defined pretty clearly in the standard (join, left join, ...) but why bother if you can use +?

On top of that, for every database version, new features are added to the grammar.

So while you could write a parser that can read the standard cases, writing a parser that can support all the features which all the databases around the globe offer, is nearly impossible. And I'm not even talking about the bugs which you can encounter in these parsers.

One solution would be if all database vendors would publish the grammar files. But these are crown jewels (IP). So you should be happy that you can use them without having to pay a license fee per parsed character * number of CPUs.



Related Topics



Leave a reply



Submit