Pretty Printing Xml in Python

Pretty printing XML in Python

import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()

How do I get Python's ElementTree to pretty print to an XML file?

Whatever your XML string is, you can write it to the file of your choice by opening a file for writing and writing the string to the file.

from xml.dom import minidom

xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ")
with open("New_Database.xml", "w") as f:
f.write(xmlstr)

There is one possible complication, especially in Python 2, which is both less strict and less sophisticated about Unicode characters in strings. If your toprettyxml method hands back a Unicode string (u"something"), then you may want to cast it to a suitable file encoding, such as UTF-8. E.g. replace the one write line with:

f.write(xmlstr.encode('utf-8'))

Write .xml in Python with pretty print and encoding declaration

The most elegant solution is certainly using the third-party library lxml, which is being used a lot – for good reasons.
It offers both a pretty_print and an xml_declaration parameter in the tostring() method, so you get both. And the API is quite close to that of the std-lib ElementTree, which you seem to be using now. Here's an example:

>>> from lxml import etree
>>> doc = etree.parse(xmlPath)
>>> print etree.tostring(doc, encoding='UTF-8', xml_declaration=True,
pretty_print=True)
<?xml version='1.0' encoding='UTF-8'?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>

However, I understand your desire to use the "included batteries" only.
As far as I can see, xml.etree.ElementTree has no means of changing the indentation automatically.
But the minidom work-around has a solution to getting both pretty-printing and a full declaration: use the encoding parameter of the toprettyxml() method!

>>> doc = minidom.parseString(ET.tostring(root))
>>> print doc.toprettyxml(encoding='utf8')
<?xml version="1.0" encoding="utf8"?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>

(Be aware that the returned string is already encoded and that you should write it to a file opened in binary mode ("wb") and without further encoding.)

Python pretty print an XML given an XML string

Here's how to parse from a text string to the lxml structured data type.

Python 2:

from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print etree.tostring(root, pretty_print=True)

Python 3:

from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print(etree.tostring(root, pretty_print=True).decode())

Outputs:

<parent>
<child>text</child>
<child>other text</child>
</parent>

Python/XML: Pretty-printing ElementTree

As the docs say, in the write method:

file is a file name, or a file object opened for writing.

This includes a StringIO object. So:

outfile = cStringIO.StringIO()
tree.write(of)

Then you can just pretty-print outfile using your favorite method—just outfile.seek(0) then pass outfile itself to a function that takes a file, or pass outfile.getvalue() to a function that takes a string.


However, notice that many of the ways to pretty-print XML in the question you linked don't even need this. For example:

  • lxml.etree.tostring (answer #2): lxml.etree is a near-perfect superset of the stdlib etree, so if you're going to use it for pretty-printing, just use it to build the XML in the first place.
  • Effbot indent/prettyprint (answer #3): This expects an ElementTree tree, which is exactly what you already have, not a string or file.

XML ElementTree with pretty print

Update: See xml.etree.ElementTree.indent as of Python 3.9.

I couldn't reproduce the bad indentation from your example, but according to http://effbot.org/zone/element-lib.htm#prettyprint, your function is mis-copied. For these lines:

if not elem.text or not elem.text.strip():
elem.text = i + ""

There should be two space between the quotes:

if not elem.text or not elem.text.strip():
elem.text = i + " "

I ran this code and it displays properly.

from xml.etree import ElementTree as et

def indent(elem, level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for e in elem:
indent(e, level+1)
if not e.tail or not e.tail.strip():
e.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i

data = '''<one><two><three>3</three><four>4</four></two></one>'''
tree = et.fromstring(data)
indent(tree)
et.dump(tree)

Output:

<one>
<two>
<three>3</three>
<four>4</four>
</two>
</one>

Notes for future:

  • Images of text can't be copied so I leave it as an exercise for you to test on your own XML.
  • Cut-n-paste exact code and input data as text to reproduce the issue to make it easy for answers to reproduce the issue.

XML pretty print fails in Python lxml

Use the "xml" output method when writing (that's the default so it does not have to be given explicitly).

Set the text property of the c element to an empty string to ensure that the element gets serialized as <c></c>.

Code:

import lxml.etree as et

parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse('in.xml', parser)

b = xml_doc.getroot().find('b')
c = et.Element('c')
c.text=''
b.append(c)

xml_doc.write('out.xml', pretty_print=True)

Result (out.xml):

<a>
<b>
<c></c>
</b>
</a>


Related Topics



Leave a reply



Submit