Which Is the Best Library for Xml Parsing in Java

Which is the best library for XML parsing in java

Actually Java supports 4 methods to parse XML out of the box:

DOM Parser/Builder: The whole XML structure is loaded into memory and you can use the well known DOM methods to work with it. DOM also allows you to write to the document with Xslt transformations.
Example:

public static void parse() throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
File file = new File("test.xml");
Document doc = builder.parse(file);
// Do something with the document here.
}

SAX Parser: Solely to read a XML document. The Sax parser runs through the document and calls callback methods of the user. There are methods for start/end of a document, element and so on. They're defined in org.xml.sax.ContentHandler and there's an empty helper class DefaultHandler.

public static void parse() throws ParserConfigurationException, SAXException {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
SAXParser saxParser = factory.newSAXParser();
File file = new File("test.xml");
saxParser.parse(file, new ElementHandler()); // specify handler
}

StAx Reader/Writer: This works with a datastream oriented interface. The program asks for the next element when it's ready just like a cursor/iterator. You can also create documents with it.
Read document:

public static void parse() throws XMLStreamException, IOException {
try (FileInputStream fis = new FileInputStream("test.xml")) {
XMLInputFactory xmlInFact = XMLInputFactory.newInstance();
XMLStreamReader reader = xmlInFact.createXMLStreamReader(fis);
while(reader.hasNext()) {
reader.next(); // do something here
}
}
}

Write document:

public static void parse() throws XMLStreamException, IOException {
try (FileOutputStream fos = new FileOutputStream("test.xml")){
XMLOutputFactory xmlOutFact = XMLOutputFactory.newInstance();
XMLStreamWriter writer = xmlOutFact.createXMLStreamWriter(fos);
writer.writeStartDocument();
writer.writeStartElement("test");
// write stuff
writer.writeEndElement();
}
}

JAXB: The newest implementation to read XML documents: Is part of Java 6 in v2. This allows us to serialize java objects from a document. You read the document with a class that implements a interface to javax.xml.bind.Unmarshaller (you get a class for this from JAXBContext.newInstance). The context has to be initialized with the used classes, but you just have to specify the root classes and don't have to worry about static referenced classes.
You use annotations to specify which classes should be elements (@XmlRootElement) and which fields are elements(@XmlElement) or attributes (@XmlAttribute, what a surprise!)

public static void parse() throws JAXBException, IOException {
try (FileInputStream adrFile = new FileInputStream("test")) {
JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
Unmarshaller um = ctx.createUnmarshaller();
RootElementClass rootElement = (RootElementClass) um.unmarshal(adrFile);
}
}

Write document:

public static void parse(RootElementClass out) throws IOException, JAXBException {
try (FileOutputStream adrFile = new FileOutputStream("test.xml")) {
JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
Marshaller ma = ctx.createMarshaller();
ma.marshal(out, adrFile);
}
}

Examples shamelessly copied from some old lecture slides ;-)

Edit: About "which API should I use?". Well it depends - not all APIs have the same capabilities as you see, but if you have control over the classes you use to map the XML document JAXB is my personal favorite, really elegant and simple solution (though I haven't used it for really large documents, it could get a bit complex). SAX is pretty easy to use too and just stay away from DOM if you don't have a really good reason to use it - old, clunky API in my opinion. I don't think there are any modern 3rd party libraries that feature anything especially useful that's missing from the STL and the standard libraries have the usual advantages of being extremely well tested, documented and stable.

Best XML parser for Java

If speed and memory is no problem, dom4j is a really good option. If you need speed, using a StAX parser like Woodstox is the right way, but you have to write more code to get things done and you have to get used to process XML in streams.

Where I can find a detailed comparison of Java XML frameworks?

As Blaise pointed out stick with the standards. But there are multiple standards created over the period to solve different problems/usecases. Which one to choose completely depends upon your requirement. I hope the below comparison can help you choose the right one.

Now there are two things you have to choose. API and the implementations of the API (there are many)

API

SAX: Pros

  • event based
  • memory efficient
  • faster than DOM
  • supports schema validation

SAX: Cons

  • No object model, you have to tap into
    the events and create your self
  • Single parse of the xml and can only
    go forward
  • read only api
  • no xpath support
  • little bit harder to use

DOM: Pros

  • in-memory object model
  • preserves element order
  • bi-directional
  • read and write api
  • xml MANIPULATION
  • simple to use
  • supports schema validation

DOM: Cons

  • memory hog for larger XML documents
    (typically used for XML documents
    less than 10 mb)
  • slower
  • generic model i.e. you work with Nodes

Stax: Pros

  • Best of SAX and DOM i.e. Ease of DOM
    and efficiency of SAX
  • memory efficient
  • Pull model
  • read and write api
  • supports subparsing
  • can read multiple documents same time
    in one single thread
  • parallel processing of XML is easier

Stax: Cons

  • no schema validation support (as far
    as I remember, not sure if they have
    added it now)
  • can only go forward like sax
  • no xml MANIPULATION

JAXB: Pros

  • allows you to access and process XML
    data without having to know XML
  • bi-directional
  • more memory efficient than DOM
  • SAX and DOM are generic parsers where
    as JAXB creates a parser specific to
    your XML Schmea
  • data conversion: JAXB can convert xml
    to java types
  • supports XML MANIPULATION via object
    API

JAXB: Cons

  • can only parse valid XML

Trax: For transforming XML from 1 form to another form using XSLT

Implementations

SAX, DOM, Stax, JAXB are just specifications. There are many open source and commercial implementations of these specifications. Most of the time you can just stick with what comes with JDK or your application server. But sometimes you need to use a different implementation that provided by default. And this is where you can appreciate the JAXP wrapper api. JAXP allows you to switch implementations through configuration without the need to modify your code. It also provides a parser/spec independent api for parsing, transformation, validation and querying XML documents.

Performance and other comparisons of various implementations

  • Stax:
    http://java.sun.com/performance/reference/whitepapers/StAX-1_0.pdf
  • Most of the API:
    http://www.ibm.com/developerworks/xml/library/x-databdopt2/
  • JAXB versus XmlBeans discussion on SO: JAXB vs Apache XMLBeans
  • http://www.ibm.com/developerworks/xml/library/x-injava/

Now standards are good but once in a while you encounter this crazy usecase where you have to support parsing of XML document that is 100 gigabytes of size or you need ultra fast processing of XML (may be your are implementing a XML parser chip) and this is when you need to dump the standards and look for a different way of doing things. Its about using the right tool for the right job! And this is where I suggest you to have a look at vtd-xml

During the initial days of SAX and DOM, people wanted simpler API's than provided by either of them. JDOM, dom4j, XmlBeans, JiBX, Castor are the ones I know that became popular.

Best way for XML parsing in java

This question is excessively broad, so I had to downvote it. I have no idea what the circumstances of your XML interpretation are, so this answer will be limited.

However, I can tell you that classically SAX and JAXP have been used; they don't strictly require a DTD, and with some clever enumerations you can parse just about anything.

JSoup, as mentioned by Rafael Cardoso, is generally an HTML parser, not an HTML-in-XML parser; but it may work for you. If all you're looking for are the attributes to a specific tag, along with (presumably) associated data, then the JDK may have all that you need.

We also have JDOM, DOM4J, and a bunch of others, all of which have their strengths and weaknesses. This question, thus, isn't particularly constructive, and is basically a duplicate of this one; which you might take a look at.

I recommend looking at this tutorial; which explains how to build a parser with the standard library.

In the future, if possible please specify the conditions that your program is operating under, provide us with an objective and clearly defined question, and research Stack Overflow a little more thoroughly first. All the same, I hope this does it for you. Good luck!

Android: Best XML Parsing Library?

Or you could use the org.xmlpull.v1.XmlPullParser - I've found it much easier to use than the SAX Parser and it has other benefits:

http://developer.android.com/reference/org/xmlpull/v1/XmlPullParser.html

http://www.bearcave.com/software/java/xml/xmlpull.html

Small, minimalistic and fast XML library for Java?

Have a look at NanoXML - download site

It is a very small DOM-based parser library, I've used it in the past and it worked well. It is not necessarily efficient but it is tiny.



Related Topics



Leave a reply



Submit