Normalization in Dom Parsing With Java - How Does It Work

Normalization in DOM parsing with java - how does it work?

The rest of the sentence is:

where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.

This basically means that the following XML element

<foo>hello 
wor
ld</foo>

could be represented like this in a denormalized node:

Element foo
    Text node: ""
    Text node: "Hello "
    Text node: "wor"
    Text node: "ld"

When normalized, the node will look like this

Element foo
    Text node: "Hello world"

And the same goes for attributes: <foo bar="Hello world"/>, comments, etc.

Normalization DOM same effect without normalize

The parser is already creating a normalized DOM tree.

The normalize() method is useful for when you're building/modifying the DOM, which might not result in a normalized tree, in which case the method will normalize it for you.

Common Helper

private static void printDom(String indent, Node node) {
    System.out.println(indent + node);
    for (Node child = node.getFirstChild(); child != null; child = child.getNextSibling())
        printDom(indent + "  ", child);
}

Example 1

public static void main(String[] args) throws Exception {
    String xml = "<Root>text 1<!-- test -->text 2</Root>";
    DocumentBuilder domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document doc = domBuilder.parse(new InputSource(new StringReader(xml)));
    printDom("", doc);
    deleteComments(doc);
    printDom("", doc);
    doc.normalizeDocument();
    printDom("", doc);
}
private static void deleteComments(Node node) {
    if (node.getNodeType() == Node.COMMENT_NODE)
        node.getParentNode().removeChild(node);
    else {
        NodeList children = node.getChildNodes();
        for (int i = 0; i < children.getLength(); i++)
            deleteComments(children.item(i));
    }
}

Output

[#document: null]
  [Root: null]
    [#text: text 1]
    [#comment:  test ]
    [#text: text 2]

[#document: null]
  [Root: null]
    [#text: text 1]
    [#text: text 2]

[#document: null]
  [Root: null]
    [#text: text 1text 2]

Example 2

public static void main(String[] args) throws Exception {
    DocumentBuilder domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document doc = domBuilder.newDocument();
    Element root = doc.createElement("Root");
    doc.appendChild(root);
    root.appendChild(doc.createTextNode("Hello"));
    root.appendChild(doc.createTextNode(" "));
    root.appendChild(doc.createTextNode("World"));
    printDom("", doc);
    doc.normalizeDocument();
    printDom("", doc);
}

Output

[#document: null]
  [Root: null]
    [#text: Hello]
    [#text:  ]
    [#text: World]

[#document: null]
  [Root: null]
    [#text: Hello World]

What does Java Node normalize method do?

You can programmatically build a DOM tree that has extraneous structure not corresponding to actual XML structures - specifically things like multiple nodes of type text next to each other, or empty nodes of type text. The normalize() method removes these, i.e. it combines adjacent text nodes and removes empty ones.

This can be useful when you have other code that expects DOM trees to always look like something built from an actual XML document.

This basically means that the following XML element

<foo>hello 
wor
ld</foo>

could be represented like this in a denormalized node:

Element foo
    Text node: ""
    Text node: "Hello "
    Text node: "wor"
    Text node: "ld"

When normalized, the node will look like this

Element foo
    Text node: "Hello world"

xml dom parser in java?

Following are tutorial for using DOM in java:

xml dom
DOM-Parser
java-xml-dom
dom example

Hope this helps.

XML DOM parsing returns only the first node

Solved: the problem was in the saving stuff, as I was trying to save a tag existing only for the first node. The reported code is working for me.

Normalization in Dom Parsing With Java - How Does It Work