Best Xml Parser For Java

Best XML parser for Java

If speed and memory is no problem, dom4j is a really good option. If you need speed, using a StAX parser like Woodstox is the right way, but you have to write more code to get things done and you have to get used to process XML in streams.

Which is the best library for XML parsing in java

Actually Java supports 4 methods to parse XML out of the box:

DOM Parser/Builder: The whole XML structure is loaded into memory and you can use the well known DOM methods to work with it. DOM also allows you to write to the document with Xslt transformations.
Example:

public static void parse() throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
File file = new File("test.xml");
Document doc = builder.parse(file);
// Do something with the document here.
}

SAX Parser: Solely to read a XML document. The Sax parser runs through the document and calls callback methods of the user. There are methods for start/end of a document, element and so on. They're defined in org.xml.sax.ContentHandler and there's an empty helper class DefaultHandler.

public static void parse() throws ParserConfigurationException, SAXException {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
SAXParser saxParser = factory.newSAXParser();
File file = new File("test.xml");
saxParser.parse(file, new ElementHandler()); // specify handler
}

StAx Reader/Writer: This works with a datastream oriented interface. The program asks for the next element when it's ready just like a cursor/iterator. You can also create documents with it.
Read document:

public static void parse() throws XMLStreamException, IOException {
try (FileInputStream fis = new FileInputStream("test.xml")) {
XMLInputFactory xmlInFact = XMLInputFactory.newInstance();
XMLStreamReader reader = xmlInFact.createXMLStreamReader(fis);
while(reader.hasNext()) {
reader.next(); // do something here
}
}
}

Write document:

public static void parse() throws XMLStreamException, IOException {
try (FileOutputStream fos = new FileOutputStream("test.xml")){
XMLOutputFactory xmlOutFact = XMLOutputFactory.newInstance();
XMLStreamWriter writer = xmlOutFact.createXMLStreamWriter(fos);
writer.writeStartDocument();
writer.writeStartElement("test");
// write stuff
writer.writeEndElement();
}
}

JAXB: The newest implementation to read XML documents: Is part of Java 6 in v2. This allows us to serialize java objects from a document. You read the document with a class that implements a interface to javax.xml.bind.Unmarshaller (you get a class for this from JAXBContext.newInstance). The context has to be initialized with the used classes, but you just have to specify the root classes and don't have to worry about static referenced classes.
You use annotations to specify which classes should be elements (@XmlRootElement) and which fields are elements(@XmlElement) or attributes (@XmlAttribute, what a surprise!)

public static void parse() throws JAXBException, IOException {
try (FileInputStream adrFile = new FileInputStream("test")) {
JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
Unmarshaller um = ctx.createUnmarshaller();
RootElementClass rootElement = (RootElementClass) um.unmarshal(adrFile);
}
}

Write document:

public static void parse(RootElementClass out) throws IOException, JAXBException {
try (FileOutputStream adrFile = new FileOutputStream("test.xml")) {
JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
Marshaller ma = ctx.createMarshaller();
ma.marshal(out, adrFile);
}
}

Examples shamelessly copied from some old lecture slides ;-)

Edit: About "which API should I use?". Well it depends - not all APIs have the same capabilities as you see, but if you have control over the classes you use to map the XML document JAXB is my personal favorite, really elegant and simple solution (though I haven't used it for really large documents, it could get a bit complex). SAX is pretty easy to use too and just stay away from DOM if you don't have a really good reason to use it - old, clunky API in my opinion. I don't think there are any modern 3rd party libraries that feature anything especially useful that's missing from the STL and the standard libraries have the usual advantages of being extremely well tested, documented and stable.

Fastest and optimized way to read the xml

Using ReadAndPrintXMLFileWithStAX below, when I compare with ReadAndPrintXMLFileWithSAX from the answer given by gontard the StAX approach is faster. My test involved running both sample code 500000 times on JDK 1.7.0_07 for the Mac.

ReadAndPrintXMLFileWithStAX:  103 seconds
ReadAndPrintXMLFileWithSAX: 125 seconds

ReadAndPrintXMLFileWithStAX (using Java SE 7)

Below is a more optimized StAX (JSR-173) example using XMLStreamReader instead of XMLEventReader.

import java.io.FileInputStream;
import java.io.InputStream;
import javax.xml.stream.*;

public class ReadAndPrintXMLFileWithStAX {

public static void main(String argv[]) throws Exception {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream in = new FileInputStream("book.xml");
XMLStreamReader streamReader = inputFactory.createXMLStreamReader(in);
streamReader.nextTag(); // Advance to "book" element
streamReader.nextTag(); // Advance to "person" element

int persons = 0;
while (streamReader.hasNext()) {
if (streamReader.isStartElement()) {
switch (streamReader.getLocalName()) {
case "first": {
System.out.print("First Name : ");
System.out.println(streamReader.getElementText());
break;
}
case "last": {
System.out.print("Last Name : ");
System.out.println(streamReader.getElementText());
break;
}
case "age": {
System.out.print("Age : ");
System.out.println(streamReader.getElementText());
break;
}
case "person" : {
persons ++;
}
}
}
streamReader.next();
}
System.out.print(persons);
System.out.println(" persons");
}

}

Output

First Name : Kiran
Last Name : Pai
Age : 22
First Name : Bill
Last Name : Gates
Age : 46
First Name : Steve
Last Name : Jobs
Age : 40
3 persons

Better way to parse xml

Here's an example of using JAXB with StAX.

Input document:

<?xml version="1.0" encoding="UTF-8"?>
<Personlist xmlns="http://example.org">
<Person>
<Name>Name 1</Name>
<Address>
<StreetAddress>Somestreet</StreetAddress>
<PostalCode>00001</PostalCode>
<CountryName>Finland</CountryName>
</Address>
</Person>
<Person>
<Name>Name 2</Name>
<Address>
<StreetAddress>Someotherstreet</StreetAddress>
<PostalCode>43400</PostalCode>
<CountryName>Sweden</CountryName>
</Address>
</Person>
</Personlist>

Person.java:

@XmlRootElement(name = "Person", namespace = "http://example.org")
public class Person {
@XmlElement(name = "Name", namespace = "http://example.org")
private String name;
@XmlElement(name = "Address", namespace = "http://example.org")
private Address address;

public String getName() {
return name;
}

public Address getAddress() {
return address;
}
}

Address.java:

public class Address {
@XmlElement(name = "StreetAddress", namespace = "http://example.org")
private String streetAddress;
@XmlElement(name = "PostalCode", namespace = "http://example.org")
private String postalCode;
@XmlElement(name = "CountryName", namespace = "http://example.org")
private String countryName;

public String getStreetAddress() {
return streetAddress;
}

public String getPostalCode() {
return postalCode;
}

public String getCountryName() {
return countryName;
}
}

PersonlistProcessor.java:

public class PersonlistProcessor {
public static void main(String[] args) throws Exception {
new PersonlistProcessor().processPersonlist(PersonlistProcessor.class
.getResourceAsStream("personlist.xml"));
}

// TODO: Instead of throws Exception, all exceptions should be wrapped
// inside runtime exception
public void processPersonlist(InputStream inputStream) throws Exception {
JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
XMLStreamReader xss = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
// Create unmarshaller
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
// Go to next tag
xss.nextTag();
// Require Personlist
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Personlist");
// Go to next tag
while (xss.nextTag() == XMLStreamReader.START_ELEMENT) {
// Require Person
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Person");
// Unmarshall person
Person person = (Person)unmarshaller.unmarshal(xss);
// Process person
processPerson(person);
}
// Require Personlist
xss.require(XMLStreamReader.END_ELEMENT, "http://example.org", "Personlist");
}

private void processPerson(Person person) {
System.out.println(person.getName());
System.out.println(person.getAddress().getCountryName());
}
}

Best way to parse an XML String in Java?

To answer your question directly - to my knowledge, there is not a better way. The input source is used because it is more universal and can handle input from a file, a String or across the wire is my understanding.

You could also try using the SAX Xml parser - it is a little more basic, and uses the Visitor Pattern, but it gets the job done and for smallish data sets and simple XML schemas it is pretty easy to use. SAX is also included with the core JRE.

Java XML Parser for huge files

Aside the recommended SAX parsing, you could use the StAX API (kind of a SAX evolution), included in the JDK (package javax.xml.stream ).

  • StAX Project Home: http://stax.codehaus.org/Home
  • Brief introduction: http://www.xml.com/pub/a/2003/09/17/stax.html
  • Javadoc: https://docs.oracle.com/javase/8/docs/api/javax/xml/stream/package-summary.html

Is there an easier way to parse XML in Java?

There are two different types of processors for XML in Java (3 actually, but one is weird). What you have is a SAX parser and what you want is a DOM parser. Take a look at http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/ for how to use the DOM parser. DOM will create a tree which you can navigate pretty easily. SAX is best for large documents but DOM is much easier if slower and much more memory intensive.



Related Topics



Leave a reply



Submit