How to Get Python's Elementtree to Pretty Print to an Xml File

How do I get Python's ElementTree to pretty print to an XML file?

Whatever your XML string is, you can write it to the file of your choice by opening a file for writing and writing the string to the file.

from xml.dom import minidom

xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ")
with open("New_Database.xml", "w") as f:
f.write(xmlstr)

There is one possible complication, especially in Python 2, which is both less strict and less sophisticated about Unicode characters in strings. If your toprettyxml method hands back a Unicode string (u"something"), then you may want to cast it to a suitable file encoding, such as UTF-8. E.g. replace the one write line with:

f.write(xmlstr.encode('utf-8'))

Pretty printing XML in Python

import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()

Python/XML: Pretty-printing ElementTree

As the docs say, in the write method:

file is a file name, or a file object opened for writing.

This includes a StringIO object. So:

outfile = cStringIO.StringIO()
tree.write(of)

Then you can just pretty-print outfile using your favorite method—just outfile.seek(0) then pass outfile itself to a function that takes a file, or pass outfile.getvalue() to a function that takes a string.


However, notice that many of the ways to pretty-print XML in the question you linked don't even need this. For example:

  • lxml.etree.tostring (answer #2): lxml.etree is a near-perfect superset of the stdlib etree, so if you're going to use it for pretty-printing, just use it to build the XML in the first place.
  • Effbot indent/prettyprint (answer #3): This expects an ElementTree tree, which is exactly what you already have, not a string or file.

Write .xml in Python with pretty print and encoding declaration

The most elegant solution is certainly using the third-party library lxml, which is being used a lot – for good reasons.
It offers both a pretty_print and an xml_declaration parameter in the tostring() method, so you get both. And the API is quite close to that of the std-lib ElementTree, which you seem to be using now. Here's an example:

>>> from lxml import etree
>>> doc = etree.parse(xmlPath)
>>> print etree.tostring(doc, encoding='UTF-8', xml_declaration=True,
pretty_print=True)
<?xml version='1.0' encoding='UTF-8'?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>

However, I understand your desire to use the "included batteries" only.
As far as I can see, xml.etree.ElementTree has no means of changing the indentation automatically.
But the minidom work-around has a solution to getting both pretty-printing and a full declaration: use the encoding parameter of the toprettyxml() method!

>>> doc = minidom.parseString(ET.tostring(root))
>>> print doc.toprettyxml(encoding='utf8')
<?xml version="1.0" encoding="utf8"?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>

(Be aware that the returned string is already encoded and that you should write it to a file opened in binary mode ("wb") and without further encoding.)

Use xml.etree.ElementTree to print nicely formatted xml files

You can use the function toprettyxml() from xml.dom.minidom in order to do that:

def prettify(elem):
"""Return a pretty-printed XML string for the Element.
"""
rough_string = ElementTree.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent="\t")

The idea is to print your Element in a string, parse it using minidom and convert it again in XML using the toprettyxml function.

Source: http://pymotw.com/2/xml/etree/ElementTree/create.html

XML ElementTree with pretty print

Update: See xml.etree.ElementTree.indent as of Python 3.9.

I couldn't reproduce the bad indentation from your example, but according to http://effbot.org/zone/element-lib.htm#prettyprint, your function is mis-copied. For these lines:

if not elem.text or not elem.text.strip():
elem.text = i + ""

There should be two space between the quotes:

if not elem.text or not elem.text.strip():
elem.text = i + " "

I ran this code and it displays properly.

from xml.etree import ElementTree as et

def indent(elem, level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for e in elem:
indent(e, level+1)
if not e.tail or not e.tail.strip():
e.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i

data = '''<one><two><three>3</three><four>4</four></two></one>'''
tree = et.fromstring(data)
indent(tree)
et.dump(tree)

Output:

<one>
<two>
<three>3</three>
<four>4</four>
</two>
</one>

Notes for future:

  • Images of text can't be copied so I leave it as an exercise for you to test on your own XML.
  • Cut-n-paste exact code and input data as text to reproduce the issue to make it easy for answers to reproduce the issue.

Python pretty print an XML given an XML string

Here's how to parse from a text string to the lxml structured data type.

Python 2:

from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print etree.tostring(root, pretty_print=True)

Python 3:

from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print(etree.tostring(root, pretty_print=True).decode())

Outputs:

<parent>
<child>text</child>
<child>other text</child>
</parent>

How to use lxml and python to pretty print a subtree of an xml file?

You can use tree.find to get the xml element you need extracted. Them convert it to element tree. Then you can issue a write statement on the resulting elementtree (et) in this case.

python -c '
from lxml import etree;
from sys import stdout, stdin;
parser=etree.XMLParser(remove_blank_text=True,strip_cdata=False);
tree=etree.parse(stdin, parser)
e = tree.find("project")
et = etree.ElementTree(e)
et.write(stdout.buffer, pretty_print = True)'

[Note: for Python 2, just use 'stdout' instead of 'stdout.buffer']



Related Topics



Leave a reply



Submit