Oracle: loading a large xml file?
You can access the XML files on the server via SQL. With your data in the /tmp/tmp.xml, you would first declare the directory:
SQL> create directory d as '/tmp';
Directory created
You could then query your XML File directly:
SQL> SELECT XMLTYPE(bfilename('D', 'tmp.xml'), nls_charset_id('UTF8')) xml_data
2 FROM dual;
XML_DATA
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<badges>
[...]
To access the fields in your file, you could use the method described in another SO for example:
SQL> SELECT UserId, Name, to_timestamp(dt, 'YYYY-MM-DD"T"HH24:MI:SS.FF3') dt
2 FROM (SELECT XMLTYPE(bfilename('D', 'tmp.xml'),
nls_charset_id('UTF8')) xml_data
3 FROM dual),
4 XMLTable('for $i in /badges/row
5 return $i'
6 passing xml_data
7 columns UserId NUMBER path '@UserId',
8 Name VARCHAR2(50) path '@Name',
9 dt VARCHAR2(25) path '@Date');
USERID NAME DT
---------- ---------- ---------------------------
3718 Teacher 2008-09-15 08:55:03.923
994 Teacher 2008-09-15 08:55:03.957
How to load large XML file ( 100 MB) to an XMLType column in Oracle
If you query a large XMLType, the client may only show you the first 'bit' of the stored value.
Assuming the column is XMLType, you can be sure that well-formed XML is being stored, so if the small chunk that is visible is not well-formed, it will be the client's fault.
You could use the various functions to count the nodes to see if it matches what you expect.
select count(*)
from xdb.path_view p, table(xmlsequence(extract(p.res,'/*/*'))) y
where p.path= '/sys';
Inserting large XML document into Oracle
In SQL, concatenate 4000 byte strings to EMPTY_CLOB()
:
INSERT INTO foo (id, document)
VALUES (1, XMLTYPE(
EMPTY_CLOB()
|| 'first 4000 bytes...'
|| 'second 4000 bytes...'
|| 'etc.'
));
In PL/SQL, the limit for strings is 32,000 bytes:
DECLARE
v_id NUMBER := 1;
v_xml VARCHAR2(32000) := 'your 32k XML string';
BEGIN
INSERT INTO foo(id, document) VALUES (v_id, XMLTYPE(v_xml));
END;
/
Otherwise, you can use the same technique as the SQL answer in PL/SQL:
DECLARE
v_id NUMBER := 1;
v_xml CLOB := EMPTY_CLOB()
|| 'first 32k XML string'
|| 'second 32k XML string'
|| 'etc.';
BEGIN
INSERT INTO foo(id, document) VALUES (v_id, XMLTYPE(v_xml));
END;
/
Parsing large XML file with PL/SQL
You are reading the file line by line, but overwritting your xmlClob
with each line, not appending. You could build up the CLOB by reading into a varchar2
buffer and appending, but you can also use the DBMS_LOB
built-in procedures to do it for you:
DECLARE
xmlClob CLOB;
xmlFile BFILE;
x XMLType;
src_offset number := 1 ;
dest_offset number := 1 ;
lang_ctx number := DBMS_LOB.DEFAULT_LANG_CTX;
warning integer;
BEGIN
xmlFile := BFILENAME('XMLPARSERADRESYCUZK', 'pokus.xml');
DBMS_LOB.CREATETEMPORARY(xmlClob, true);
DBMS_LOB.FILEOPEN(xmlFile, DBMS_LOB.FILE_READONLY);
DBMS_LOB.LOADCLOBFROMFILE(xmlClob, xmlFile, DBMS_LOB.LOBMAXSIZE, src_offset,
dest_offset, DBMS_LOB.DEFAULT_CSID, lang_ctx, warning);
x := XMLType.createXML(xmlClob);
DBMS_LOB.FILECLOSEALL();
DBMS_LOB.FREETEMPORARY(xmlClob);
FOR r IN (
...
When I use that and load your file I get the output:
CUZK Pod smdli.t.m 1800/9
You probably want some error checkign around the DBMS_LOB
calls, this is just a simple demo.
Insert large xml file as as blob in oracle table
You may convert the xml file into an sql script executing a suitably crafted anonymous plsql block. Loading this script into the db will populate the blob.
The basic idea is to split the xml file into chunks of 2000 characters. The first chunk may be inserted into the target table's blob column directly. Each other will be added by an update statement taking advantage of the dbms_lob.fragment_insert
package procedure. !!! WARNING: This is not recommended practice !. Better get a dba to load it for you!
Example:
Assumptions:
- The target table has 2 columns, the pk and the blob.
- the pk is 42.
- 2000 is a sample chunk size deemed suitable. Technically,
dbms_lob.fragment_insert
handles up to 32767, however, other tools involved (eg. sqlplus) might have tighter bounds on line length.
Code:
declare
l_b BLOB;
begin
insert
into
t_target ( c_pk, c_blob )
values ( 42, utl_raw.cast_to_raw('<This literal contains the first 2000 (chunksize) characters of the xml file>') )
returning c_blob
into l_b
;
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset 2000>'));
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset 4000>'));
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset 6000>'));
...
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains the last chunk>'));
commit;
end;
/
show err
Preparatory work
You need to make sure that no single quote occurs inside your xml file.
Otherwise the generated plsql code will contain syntax errors.If single quotes aren't used as attribute value delimiters, simply replace them with the numerical entity
&x#28;
.Create the bulk of the anonymous plsql
Methods for inserting data into a file at regular intervals are presented in this SO question, the most flexible approach being outlined in this answer. Instead of newlines, insert the following string inserted:
"'));\n dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('"
The remainder of the anonymous plsql can be copied/written by hand.
Caveat
As is, the script size will be of the same magnitude as the original xml and the plsql block will contain 200k+ lines. Very likely you will run into some limitations of the tools involved. However, the script can be split into an arbitrary number of chunks as follows:
declare
l_b BLOB;
begin
select c_blob
into l_b
from t_target
where c_pk = 42
;
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset <k>*2000>'));
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset (<k>+1)*2000>'));
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset (<k>+2)*2000>'));
...
dbms_lob.fragment_insert ( l_b, 2000, 1+dbms_lob.getlength(l_b), utl_raw.cast_to_raw('<This literal contains 2000 characters starting at [0-based] offset (<k>+<n_k>)*2000>'));
end;
/
show err
And once again: !!! WARNING: This is not recommended practice !. Better get a dba to load it for you!
How to write large raw XML file to Oracle db using Blob object?
Use either CallableStatement.setBlob(int, InputStream)
or Blob.setBinaryStream(long)
. Both methods will let work with InputStream
or OutputStream
objects and avoid creating byte[]
array in the memory. Example is show in Adding Large Object Type Object to Database docs.
This should work as long as JDBC driver is smart enough not to create byte[]
for the entire blob somewhere internally.
Related Topics
List Columns with Indexes in Postgresql
How to Check Which Locks Are Held on a Table
How to Drop SQL Default Constraint Without Knowing Its Name
How to Create a Foreign Key in SQL Server
How to Find Duplicates Across Multiple Columns
How to Find Current Transaction Level
Sql: Parse the First, Middle and Last Name from a Fullname Field
@@Identity, Scope_Identity(), Output and Other Methods of Retrieving Last Identity
Excel Function to Make SQL-Like Queries on Worksheet Data
MySQL - How to Front Pad Zip Code with "0"
Including Null Values in an Apache Spark Join
Return Pre-Update Column Values Using SQL Only
Sorting Null Values After All Others, Except Special
Finding Duplicate Rows in SQL Server
Partition Function Count() Over Possible Using Distinct
What's the Difference Between Rank() and Dense_Rank() Functions in Oracle