Make Documentbuilder.Parse Ignore Dtd References

Make DocumentBuilder.parse ignore DTD references

A similar approach to the one suggested by @anjanb

    builder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("foo.dtd")) {
return new InputSource(new StringReader(""));
} else {
return null;
}
}
});

I found that simply returning an empty InputSource worked just as well?

non-validating DocumentBuilder trying to read DTD file

In order to ignore DTD instructions and references, you must set some more flags:

factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

If you are building web application, I suggest you to globally dissable resolving DTD entities, because it's potential security vuilnerable.

For example:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo>

will cause your server to crash, while trying to insert content from /dev/random into &xxe.

Ignore DTD element in a XOM Parser to avoid no File found exception

I imported SAX libraries

import java.io.File;
import java.io.PrintWriter;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

Created SAX XML Reader

XMLReader xmlReader = XMLReaderFactory.createXMLReader();

Set the feature to false

    xmlReader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

Created a builder using the above XMLReader

Builder builder = new Builder(xmlReader);

Parsed it using XOM parser

nu.xom.Document doc = builder.build(fXmlFile);

how to disable dtd at runtime in java's xpath?

You should be able to specify your own EntityResolver, or use specific features of your parser? See here for some approaches.

A more complete example:

<?xml version="1.0"?>
<!DOCTYPE foo PUBLIC "//FOO//" "foo.dtd">
<foo>
<bar>Value</bar>
</foo>

And xpath usage:

import java.io.File;
import java.io.IOException;
import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class Main {

public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

builder.setEntityResolver(new EntityResolver() {

@Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
System.out.println("Ignoring " + publicId + ", " + systemId);
return new InputSource(new StringReader(""));
}
});
Document document = builder.parse(new File("src/foo.xml"));
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String content = xpath.evaluate("/foo/bar/text()", document
.getDocumentElement());
System.out.println(content);
}
}

Hope this helps...

Java's XML DocumentBuilder fails with parse time out?

Can you post the exactly error message and can you post the content of the response to have a look what is in it?

But for me this sounds like an issue in the network/ security setup.

Parhaps the DocumentBuilder is unsuccessfully trying to access a DTD via a network socket for your XML document? If there are DTD references in the XML document, try editing them out to prove the cause.

How to avoid reading of DTD when parsing XML file in Java?

You need to use Entity Resolver

 myBuilder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("pdf2xml.dtd")) {
return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
} else
return null;
}
});

when the parser reaches the condition - "pdf2xml.dtd", the entity resolver is called, which returns an empty XML doc.

How to read well formed XML in Java, but skip the schema?

The reference is not for Schema, but for a DTD.

DTD files can contain more than just structural rules. They can also contain entity references. XML parsers are obliged to load and parse DTD references, because they could contain entity references that might affect how the document is parsed and the content of the file(you could have an entity reference for characters or even whole phrases of text).

If you want to want to avoid loading and parsing the referenced DTD, you can provide your own EntityResolver and test for the referenced DTD and decide whether load a local copy of the DTD file or just return null.

Code sample from the referenced answer on custom EntityResolvers:

   builder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("foo.dtd")) {
return new InputSource(new StringReader(""));
} else {
return null;
}
}
});


Related Topics



Leave a reply



Submit