Oracle: Loading a Large Xml File

Oracle: loading a large xml file?

You can access the XML files on the server via SQL. With your data in the /tmp/tmp.xml, you would first declare the directory:

SQL> create directory d as '/tmp';

Directory created

You could then query your XML File directly:

SQL> SELECT XMLTYPE(bfilename('D', 'tmp.xml'), nls_charset_id('UTF8')) xml_data
2 FROM dual;

XML_DATA
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<badges>
[...]

To access the fields in your file, you could use the method described in another SO for example:

SQL> SELECT UserId, Name, to_timestamp(dt, 'YYYY-MM-DD"T"HH24:MI:SS.FF3') dt
2 FROM (SELECT XMLTYPE(bfilename('D', 'tmp.xml'),
nls_charset_id('UTF8')) xml_data
3 FROM dual),
4 XMLTable('for $i in /badges/row
5 return $i'
6 passing xml_data
7 columns UserId NUMBER path '@UserId',
8 Name VARCHAR2(50) path '@Name',
9 dt VARCHAR2(25) path '@Date');

USERID NAME DT
---------- ---------- ---------------------------
3718 Teacher 2008-09-15 08:55:03.923
994 Teacher 2008-09-15 08:55:03.957

How to load large XML file ( 100 MB) to an XMLType column in Oracle

If you query a large XMLType, the client may only show you the first 'bit' of the stored value.
Assuming the column is XMLType, you can be sure that well-formed XML is being stored, so if the small chunk that is visible is not well-formed, it will be the client's fault.

You could use the various functions to count the nodes to see if it matches what you expect.

select count(*)
from xdb.path_view p, table(xmlsequence(extract(p.res,'/*/*'))) y
where p.path= '/sys';

Inserting large XML document into Oracle

In SQL, concatenate 4000 byte strings to EMPTY_CLOB():

INSERT INTO foo (id, document)
VALUES (1, XMLTYPE(
EMPTY_CLOB()
|| 'first 4000 bytes...'
|| 'second 4000 bytes...'
|| 'etc.'
));

In PL/SQL, the limit for strings is 32,000 bytes:

DECLARE
v_id NUMBER := 1;
v_xml VARCHAR2(32000) := 'your 32k XML string';
BEGIN
INSERT INTO foo(id, document) VALUES (v_id, XMLTYPE(v_xml));
END;
/

Otherwise, you can use the same technique as the SQL answer in PL/SQL:

DECLARE
v_id NUMBER := 1;
v_xml CLOB := EMPTY_CLOB()
|| 'first 32k XML string'
|| 'second 32k XML string'
|| 'etc.';
BEGIN
INSERT INTO foo(id, document) VALUES (v_id, XMLTYPE(v_xml));
END;
/

Parsing large XML file with PL/SQL

You are reading the file line by line, but overwritting your xmlClob with each line, not appending. You could build up the CLOB by reading into a varchar2 buffer and appending, but you can also use the DBMS_LOB built-in procedures to do it for you:

DECLARE
xmlClob CLOB;
xmlFile BFILE;
x XMLType;

src_offset number := 1 ;
dest_offset number := 1 ;
lang_ctx number := DBMS_LOB.DEFAULT_LANG_CTX;
warning integer;
BEGIN
xmlFile := BFILENAME('XMLPARSERADRESYCUZK', 'pokus.xml');
DBMS_LOB.CREATETEMPORARY(xmlClob, true);
DBMS_LOB.FILEOPEN(xmlFile, DBMS_LOB.FILE_READONLY);
DBMS_LOB.LOADCLOBFROMFILE(xmlClob, xmlFile, DBMS_LOB.LOBMAXSIZE, src_offset,
dest_offset, DBMS_LOB.DEFAULT_CSID, lang_ctx, warning);
x := XMLType.createXML(xmlClob);
DBMS_LOB.FILECLOSEALL();
DBMS_LOB.FREETEMPORARY(xmlClob);
FOR r IN (
...

When I use that and load your file I get the output:

CUZK Pod smdli.t.m 1800/9

You probably want some error checkign around the DBMS_LOB calls, this is just a simple demo.

Insert large xml file as as blob in oracle table

You may convert the xml file into an sql script executing a suitably crafted anonymous plsql block. Loading this script into the db will populate the blob.

The basic idea is to split the xml file into chunks of 2000 characters. The first chunk may be inserted into the target table's blob column directly. Each other will be added by an update statement taking advantage of the dbms_lob.fragment_insert package procedure. !!! WARNING: This is not recommended practice !. Better get a dba to load it for you!

Example:

  • Assumptions:

    • The target table has 2 columns, the pk and the blob.
    • the pk is 42.
    • 2000 is a sample chunk size deemed suitable. Technically, dbms_lob.fragment_insert handles up to 32767, however, other tools involved (eg. sqlplus) might have tighter bounds on line length.
  • Code:

    declare
    l_b BLOB;
    begin
    insert
    into
    t_target ( c_pk, c_blob )
    values ( 42, utl_raw.cast_to_raw('<This literal contains the first 2000 (chunksize) characters of the xml file>') )
    returning c_blob
    into l_b
    ;

    dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset 2000>'));
    dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset 4000>'));
    dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset 6000>'));
    ...
    dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains the last chunk>'));

    commit;
    end;
    /
    show err

Preparatory work

  1. You need to make sure that no single quote occurs inside your xml file.
    Otherwise the generated plsql code will contain syntax errors.

    If single quotes aren't used as attribute value delimiters, simply replace them with the numerical entity &x#28;.

  2. Create the bulk of the anonymous plsql

Methods for inserting data into a file at regular intervals are presented in this SO question, the most flexible approach being outlined in this answer. Instead of newlines, insert the following string inserted:

"'));\n     dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('"

The remainder of the anonymous plsql can be copied/written by hand.

Caveat

As is, the script size will be of the same magnitude as the original xml and the plsql block will contain 200k+ lines. Very likely you will run into some limitations of the tools involved. However, the script can be split into an arbitrary number of chunks as follows:

declare
l_b BLOB;
begin
select c_blob
into l_b
from t_target
where c_pk = 42
;

dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset <k>*2000>'));
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset (<k>+1)*2000>'));
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset (<k>+2)*2000>'));
...
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset (<k>+<n_k>)*2000>'));
end;
/
show err

And once again: !!! WARNING: This is not recommended practice !. Better get a dba to load it for you!

How to write large raw XML file to Oracle db using Blob object?

Use either CallableStatement.setBlob(int, InputStream) or Blob.setBinaryStream(long). Both methods will let work with InputStream or OutputStream objects and avoid creating byte[] array in the memory. Example is show in Adding Large Object Type Object to Database docs.

This should work as long as JDBC driver is smart enough not to create byte[] for the entire blob somewhere internally.



Related Topics



Leave a reply



Submit