What is the shortest way to pretty print a org.w3c.dom.Document to stdout?
Call printDocument(doc, System.out)
, where that method looks like this:
public static void printDocument(Document doc, OutputStream out) throws IOException, TransformerException {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(doc),
new StreamResult(new OutputStreamWriter(out, "UTF-8")));
}
(The indent-amount
is optional, and might not work with your particular configuration)
Pretty print XML in java 8
I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer
is going to preserve them.
original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);
for (int i = 0; i < blankTextNodes.getLength(); i++) {
blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}
Is there a way to pretty print XML with vertical alignment?
I created the following script to align the columns. I first pass my xml thought xmllint, and then through the following:
#!/usr/bin/env ruby
#
# vertically aligns columns
def print_buf(b)
max_lengths={}
max_lengths.default=0
b.each do |line|
for i in (0..line.size() - 1)
d = line[i]
s = d.size()
if s > max_lengths[i] then
max_lengths[i] = s
end
end
end
b.each do |line|
for i in (0..line.size() - 1)
print line[i], ' ' * (max_lengths[i] - line[i].size())
end
end
end
cols=0
buf=[]
ARGF.each do |line|
columns=line.split(/( |\r\n|\n|\r)(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/m)
if columns.size != cols then
print_buf(buf) if !buf.empty?
buf=[]
end
buf << columns
cols = columns.size
end
print_buf(buf)
How to pretty print XML from Java?
Now it's 2012 and Java can do more than it used to with XML, I'd like to add an alternative to my accepted answer. This has no dependencies outside of Java 6.
import org.w3c.dom.Node;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.StringReader;
/**
* Pretty-prints xml, supplied as a string.
* <p/>
* eg.
* <code>
* String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
* </code>
*/
public class XmlFormatter {
public String format(String xml) {
try {
final InputSource src = new InputSource(new StringReader(xml));
final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml"));
//May need this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl");
final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
final LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); // Set this to true if the output needs to be beautified.
writer.getDomConfig().setParameter("xml-declaration", keepDeclaration); // Set this to true if the declaration is needed to be outputted.
return writer.writeToString(document);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public static void main(String[] args) {
String unformattedXml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
" xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
" xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
" <Query>\n" +
" <query:CategorySchemeWhere>\n" +
" \t\t\t\t\t <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
" </query:CategorySchemeWhere>\n" +
" </Query>\n\n\n\n\n" +
"</QueryMessage>";
System.out.println(new XmlFormatter().format(unformattedXml));
}
}
What is the rationale behind XmlDocument mixed content pretty-printing behavior?
This behavior is unfortunate, but I think it can be explained by the description of the Formatting.Indented option for XmlTextWriter (which is what XmlDocument.Save is using here):
Causes child elements to be indented according to the Indentation and IndentChar settings.
This option indents element content only; mixed content is not affected.
The intent of this option is to preserve the formatting of XML like
<p>Here is some <b>bold</b> text.</p>
and not have it reformatted as
<p>
Here is some
<b>
bold
</b>
text.
</p>
But there's a problem: How does XmlTextWriter know an element contains mixed content? Because XmlTextWriter is a non-cached, forward-only writer, the answer is that it doesn't until it actually encounters character data. At that point, it switches to "mixed content" mode and suppresses formatting. Unfortunately, it's too late to undo the formatting of child nodes that have already been written to the underlying stream.
Why does javax.xml.xpath.XPath act differently with a cloned node?
The XPath expression //name
is an absolute path (beginning with a /
), so selects a node set containing all name
elements in the document to which the context node belongs. Thus evaluating that expression as a string according to the XPath 1.0 data model will give you the string value of the first such node in document order.
The crucial part of that first sentence is "the document to which the context node belongs" - a cloned node is not attached to a document, so the XPath evaluator treats the node itself as the root of a document fragment and evaluates the expression against that fragment (which contains only one name
element) instead of against the original document (which contains two).
If in printNameAndValue
you instead used relative XPath expressions
public static void printNameAndValue(Node node) throws XPathExpressionException {
System.out.println("Name=" + (String) factoryXpath.evaluate("name", node, XPathConstants.STRING));
System.out.println("Value=" + (String) factoryXpath.evaluate("value", node, XPathConstants.STRING));
}
(or .//name
if the name
element might be a grandchild or deeper rather than an immediate child) then you should get the output you expect, i.e. the value of the first name
(respectively value
) element child of the specified node
.
Related Topics
Extract Source Code from .Jar File
Annotation to Make a Private Method Public Only for Test Classes
How to Format a Number 0..9 to Display with 2 Digits (It's Not a Date)
How to Autowire Bean of Generic Type <T> in Spring
What Is the Use of Filter and Chain in Servlet
Java: What's the Difference Between Autoboxing and Casting
Why Did Servlet.Service() for Servlet Jsp Throw This Exception
How to Make a Countdown Timer in Java
How to Get an Enum Based on the Value of Its Field
Rgb to Cmyk and Back Algorithm
Resize a Picture to Fit a Jlabel
JSON Gson.Fromjson Java Objects
Transactional Saves Without Calling Update Method
How to Improve the Performance of G.Drawimage() Method for Resizing Images
How Do Java Method Annotations Work in Conjunction with Method Overriding
Java Restfull Webservice: Jax-Rs Implementation with Jersey 2.3.1 Libraries