Java Parsing Xml Document Gives "Content Not Allowed in Prolog." Error

Content is not allowed in prolog when parsing perfectly valid XML on GAE

The encoding in your XML and XSD (or DTD) are different.

XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-16'?>

Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:

helloworld<?xml version="1.0" encoding="utf-8"?>  

or even a space or special character.

There are some special characters called byte order markers that could be in the buffer.
Before passing the buffer to the Parser do this...

String xml = "<?xml ...";
xml = xml.trim().replaceFirst("^([\\W]+)<","<");

Content is not allowed in prolog error yet nothing before XML declaration

Elaborating on what @MartinHonnen has already helpfully commented...

The error,

Content is not allowed in prolog.

arises because the XML prolog, which is everything before the root element in an XML document, has textual content that is not allowed. The error does not necessarily have to have occurred before the XML declaration.

Specifically, the prolog in XML is defined in the context of an XML document:

[1] document      ::= prolog element Misc*

Note that prolog precedes element, the single root element of the XML document.

Most answers focus on the problem where there is text (visible or invisible) at the beginning of the prolog, before the XML declaration, but note that non-whitespace text cannot appear anywhere within or after the prolog either:

[22] prolog      ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25] Eq ::= S? '=' S?
[26] VersionNum ::= '1.' [0-9]+
[27] Misc ::= Comment | PI | S

In your case, you have Test material... text content appearing between the XML declaration (XMLDecl) and the root element (element). A comment, processing instruction, or whitespace can appear there, but not text.

Fatal Error :1:1: Content is not allowed in prolog

I'm turning my comment to an answer, so it can be accepted and this question no longer remains unanswered.

The most likely cause of this is a malformed response, which includes characters before the initial <?xml …>. So please have a look at the document as transferred over HTTP, and fix this on the server side.

org.xml.sax.SAXParseException: Content is not allowed in prolog

This is often caused by a white space before the XML declaration, but it could be any text, like a dash or any character. I say often caused by white space because people assume white space is always ignorable, but that's not the case here.


Another thing that often happens is a UTF-8 BOM (byte order mark), which is allowed before the XML declaration can be treated as whitespace if the document is handed as a stream of characters to an XML parser rather than as a stream of bytes.

The same can happen if schema files (.xsd) are used to validate the xml file and one of the schema files has an UTF-8 BOM.

Fatal Error :1:1: Content is not allowed in prolog. org.xml.sax.SAXParseException

StringReader("myfile.xml") takes a string argument that must be XML, not a filename. The parser is reading the string literal, myfile.xml, (not the file contents of myfile.xml) and failing immediately because an XML document may not begin with an m character.

Change

Document doc = dBuilder.parse(new InputSource(new StringReader("myfile.xml")));

to

Document doc = dBuilder.parse(new InputSource("myfile.xml"));


Related Topics



Leave a reply



Submit