What Is the Shortest Way to Pretty Print a Org.W3C.Dom.Document to Stdout

What is the shortest way to pretty print a org.w3c.dom.Document to stdout?

Call printDocument(doc, System.out), where that method looks like this:

public static void printDocument(Document doc, OutputStream out) throws IOException, TransformerException {
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer transformer = tf.newTransformer();
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
    transformer.setOutputProperty(OutputKeys.METHOD, "xml");
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

    transformer.transform(new DOMSource(doc), 
         new StreamResult(new OutputStreamWriter(out, "UTF-8")));
}

(The indent-amount is optional, and might not work with your particular configuration)

Pretty print XML in java 8

I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer is going to preserve them.

original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);

for (int i = 0; i < blankTextNodes.getLength(); i++) {
     blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}

Is there a way to pretty print XML with vertical alignment?

I created the following script to align the columns. I first pass my xml thought xmllint, and then through the following:

#!/usr/bin/env ruby
#
# vertically aligns columns

def print_buf(b)
  max_lengths={}
  max_lengths.default=0

  b.each do |line|
    for i in (0..line.size() - 1)
      d = line[i]
      s = d.size()
      if s > max_lengths[i] then
        max_lengths[i] = s
      end
    end
  end

  b.each do |line|
    for i in (0..line.size() - 1)
      print line[i], ' ' * (max_lengths[i] - line[i].size())
    end
  end

end

cols=0
buf=[]

ARGF.each do |line|
  columns=line.split(/( |\r\n|\n|\r)(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/m)
  if columns.size != cols then
    print_buf(buf) if !buf.empty?
    buf=[]
  end
  buf << columns
  cols = columns.size
end

print_buf(buf)

How to pretty print XML from Java?

Now it's 2012 and Java can do more than it used to with XML, I'd like to add an alternative to my accepted answer. This has no dependencies outside of Java 6.

import org.w3c.dom.Node;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;

import javax.xml.parsers.DocumentBuilderFactory;
import java.io.StringReader;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public String format(String xml) {

        try {
            final InputSource src = new InputSource(new StringReader(xml));
            final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
            final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml"));

        //May need this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl");

            final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
            final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
            final LSSerializer writer = impl.createLSSerializer();

            writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); // Set this to true if the output needs to be beautified.
            writer.getDomConfig().setParameter("xml-declaration", keepDeclaration); // Set this to true if the declaration is needed to be outputted.

            return writer.writeToString(document);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }
}

What is the rationale behind XmlDocument mixed content pretty-printing behavior?

This behavior is unfortunate, but I think it can be explained by the description of the Formatting.Indented option for XmlTextWriter (which is what XmlDocument.Save is using here):

Causes child elements to be indented according to the Indentation and IndentChar settings.
This option indents element content only; mixed content is not affected.

The intent of this option is to preserve the formatting of XML like

<p>Here is some <b>bold</b> text.</p>

and not have it reformatted as

<p>
    Here is some 
    <b>
        bold
    </b>
     text.
</p>

But there's a problem: How does XmlTextWriter know an element contains mixed content? Because XmlTextWriter is a non-cached, forward-only writer, the answer is that it doesn't until it actually encounters character data. At that point, it switches to "mixed content" mode and suppresses formatting. Unfortunately, it's too late to undo the formatting of child nodes that have already been written to the underlying stream.

Why does javax.xml.xpath.XPath act differently with a cloned node?

The XPath expression //name is an absolute path (beginning with a /), so selects a node set containing all name elements in the document to which the context node belongs. Thus evaluating that expression as a string according to the XPath 1.0 data model will give you the string value of the first such node in document order.

The crucial part of that first sentence is "the document to which the context node belongs" - a cloned node is not attached to a document, so the XPath evaluator treats the node itself as the root of a document fragment and evaluates the expression against that fragment (which contains only one name element) instead of against the original document (which contains two).

If in printNameAndValue you instead used relative XPath expressions

public static void printNameAndValue(Node node) throws XPathExpressionException {
    System.out.println("Name=" + (String) factoryXpath.evaluate("name", node, XPathConstants.STRING));
    System.out.println("Value=" + (String) factoryXpath.evaluate("value", node, XPathConstants.STRING));
}

(or .//name if the name element might be a grandchild or deeper rather than an immediate child) then you should get the output you expect, i.e. the value of the first name (respectively value) element child of the specified node.

What Is the Shortest Way to Pretty Print a Org.W3C.Dom.Document to Stdout