How do I get Python's ElementTree to pretty print to an XML file?
Whatever your XML string is, you can write it to the file of your choice by opening a file for writing and writing the string to the file.
from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ")
with open("New_Database.xml", "w") as f:
f.write(xmlstr)
There is one possible complication, especially in Python 2, which is both less strict and less sophisticated about Unicode characters in strings. If your toprettyxml
method hands back a Unicode string (u"something"
), then you may want to cast it to a suitable file encoding, such as UTF-8. E.g. replace the one write line with:
f.write(xmlstr.encode('utf-8'))
Pretty printing XML in Python
import xml.dom.minidom
dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
Python/XML: Pretty-printing ElementTree
As the docs say, in the write
method:
file is a file name, or a file object opened for writing.
This includes a StringIO
object. So:
outfile = cStringIO.StringIO()
tree.write(of)
Then you can just pretty-print outfile
using your favorite method—just outfile.seek(0)
then pass outfile
itself to a function that takes a file, or pass outfile.getvalue()
to a function that takes a string.
However, notice that many of the ways to pretty-print XML in the question you linked don't even need this. For example:
lxml.etree.tostring
(answer #2):lxml.etree
is a near-perfect superset of the stdlib etree, so if you're going to use it for pretty-printing, just use it to build the XML in the first place.- Effbot
indent
/prettyprint
(answer #3): This expects anElementTree
tree, which is exactly what you already have, not a string or file.
Write .xml in Python with pretty print and encoding declaration
The most elegant solution is certainly using the third-party library lxml, which is being used a lot – for good reasons.
It offers both a pretty_print
and an xml_declaration
parameter in the tostring()
method, so you get both. And the API is quite close to that of the std-lib ElementTree, which you seem to be using now. Here's an example:
>>> from lxml import etree
>>> doc = etree.parse(xmlPath)
>>> print etree.tostring(doc, encoding='UTF-8', xml_declaration=True,
pretty_print=True)
<?xml version='1.0' encoding='UTF-8'?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>
However, I understand your desire to use the "included batteries" only.
As far as I can see, xml.etree.ElementTree
has no means of changing the indentation automatically.
But the minidom
work-around has a solution to getting both pretty-printing and a full declaration: use the encoding
parameter of the toprettyxml()
method!
>>> doc = minidom.parseString(ET.tostring(root))
>>> print doc.toprettyxml(encoding='utf8')
<?xml version="1.0" encoding="utf8"?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>
(Be aware that the returned string is already encoded and that you should write it to a file opened in binary mode ("wb"
) and without further encoding.)
Use xml.etree.ElementTree to print nicely formatted xml files
You can use the function toprettyxml()
from xml.dom.minidom
in order to do that:
def prettify(elem):
"""Return a pretty-printed XML string for the Element.
"""
rough_string = ElementTree.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent="\t")
The idea is to print your Element
in a string, parse it using minidom and convert it again in XML using the toprettyxml
function.
Source: http://pymotw.com/2/xml/etree/ElementTree/create.html
XML ElementTree with pretty print
Update: See xml.etree.ElementTree.indent as of Python 3.9.
I couldn't reproduce the bad indentation from your example, but according to http://effbot.org/zone/element-lib.htm#prettyprint, your function is mis-copied. For these lines:
if not elem.text or not elem.text.strip():
elem.text = i + ""
There should be two space between the quotes:
if not elem.text or not elem.text.strip():
elem.text = i + " "
I ran this code and it displays properly.
from xml.etree import ElementTree as et
def indent(elem, level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for e in elem:
indent(e, level+1)
if not e.tail or not e.tail.strip():
e.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
data = '''<one><two><three>3</three><four>4</four></two></one>'''
tree = et.fromstring(data)
indent(tree)
et.dump(tree)
Output:
<one>
<two>
<three>3</three>
<four>4</four>
</two>
</one>
Notes for future:
- Images of text can't be copied so I leave it as an exercise for you to test on your own XML.
- Cut-n-paste exact code and input data as text to reproduce the issue to make it easy for answers to reproduce the issue.
Python pretty print an XML given an XML string
Here's how to parse from a text string to the lxml structured data type.
Python 2:
from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print etree.tostring(root, pretty_print=True)
Python 3:
from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print(etree.tostring(root, pretty_print=True).decode())
Outputs:
<parent>
<child>text</child>
<child>other text</child>
</parent>
How to use lxml and python to pretty print a subtree of an xml file?
You can use tree.find to get the xml element you need extracted. Them convert it to element tree. Then you can issue a write statement on the resulting elementtree (et) in this case.
python -c '
from lxml import etree;
from sys import stdout, stdin;
parser=etree.XMLParser(remove_blank_text=True,strip_cdata=False);
tree=etree.parse(stdin, parser)
e = tree.find("project")
et = etree.ElementTree(e)
et.write(stdout.buffer, pretty_print = True)'
[Note: for Python 2, just use 'stdout' instead of 'stdout.buffer']
Related Topics
List All the Modules That Are Part of a Python Package
Compare Two Different Files Line by Line in Python
Python Pandas Slice Dataframe by Multiple Index Ranges
What's the Shortest Way to Count the Number of Items in a Generator/Iterator
How to Check If a Value Is in the List in Selection from Pandas Data Frame
Record Speakers Output with Pyaudio
Numpy: Formal Definition of "Array_Like" Objects
How to Force Python to Be 32-Bit on Snow Leopard and Other 32-Bit/64-Bit Questions
Search in Lists of Lists by Given Index
Python String 'In' Operator Implementation Algorithm and Time Complexity
Fill Username and Password Using Selenium in Python
How to Display Tooltips in Tkinter