Normalization in DOM parsing with java - how does it work?
The rest of the sentence is:
where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.
This basically means that the following XML element
<foo>hello
wor
ld</foo>
could be represented like this in a denormalized node:
Element foo
Text node: ""
Text node: "Hello "
Text node: "wor"
Text node: "ld"
When normalized, the node will look like this
Element foo
Text node: "Hello world"
And the same goes for attributes: <foo bar="Hello world"/>
, comments, etc.
Normalization DOM same effect without normalize
The parser is already creating a normalized DOM tree.
The normalize()
method is useful for when you're building/modifying the DOM, which might not result in a normalized tree, in which case the method will normalize it for you.
Common Helper
private static void printDom(String indent, Node node) {
System.out.println(indent + node);
for (Node child = node.getFirstChild(); child != null; child = child.getNextSibling())
printDom(indent + " ", child);
}
Example 1
public static void main(String[] args) throws Exception {
String xml = "<Root>text 1<!-- test -->text 2</Root>";
DocumentBuilder domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = domBuilder.parse(new InputSource(new StringReader(xml)));
printDom("", doc);
deleteComments(doc);
printDom("", doc);
doc.normalizeDocument();
printDom("", doc);
}
private static void deleteComments(Node node) {
if (node.getNodeType() == Node.COMMENT_NODE)
node.getParentNode().removeChild(node);
else {
NodeList children = node.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
deleteComments(children.item(i));
}
}
Output
[#document: null]
[Root: null]
[#text: text 1]
[#comment: test ]
[#text: text 2]
[#document: null]
[Root: null]
[#text: text 1]
[#text: text 2]
[#document: null]
[Root: null]
[#text: text 1text 2]
Example 2
public static void main(String[] args) throws Exception {
DocumentBuilder domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = domBuilder.newDocument();
Element root = doc.createElement("Root");
doc.appendChild(root);
root.appendChild(doc.createTextNode("Hello"));
root.appendChild(doc.createTextNode(" "));
root.appendChild(doc.createTextNode("World"));
printDom("", doc);
doc.normalizeDocument();
printDom("", doc);
}
Output
[#document: null]
[Root: null]
[#text: Hello]
[#text: ]
[#text: World]
[#document: null]
[Root: null]
[#text: Hello World]
What does Java Node normalize method do?
You can programmatically build a DOM tree that has extraneous structure not corresponding to actual XML structures - specifically things like multiple nodes of type text next to each other, or empty nodes of type text. The normalize()
method removes these, i.e. it combines adjacent text nodes and removes empty ones.
This can be useful when you have other code that expects DOM trees to always look like something built from an actual XML document.
This basically means that the following XML element
<foo>hello
wor
ld</foo>
could be represented like this in a denormalized node:
Element foo
Text node: ""
Text node: "Hello "
Text node: "wor"
Text node: "ld"
When normalized, the node will look like this
Element foo
Text node: "Hello world"
xml dom parser in java?
Following are tutorial for using DOM in java:
- xml dom
- DOM-Parser
- java-xml-dom
- dom example
Hope this helps.
XML DOM parsing returns only the first node
Solved: the problem was in the saving stuff, as I was trying to save a tag existing only for the first node. The reported code is working for me.
Related Topics
Difference Between Javac and the Eclipse Compiler
How to Calculate Someone'S Age in Java
Random Errors When Changing Series Using Jfreechart
Access Restriction on Class Due to Restriction on Required Library Rt.Jar
Spring Boot Configure and Use Two Data Sources
Avoid Synchronized(This) in Java
How to Use Comparator in Java to Sort
How to Format the Day of the Month to Say "11Th", "21St" or "23Rd" (Ordinal Indicator)
How to Capitalize the First Character of Each Word in a String
How to Escape Text For Regular Expression in Java
How to Add Blank Page in Digitally Signed Pdf Using Java
Difference Between Fetchtype Lazy and Eager in Java Persistence API