Pretty printing XML in Python
import xml.dom.minidom
dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
How do I get Python's ElementTree to pretty print to an XML file?
Whatever your XML string is, you can write it to the file of your choice by opening a file for writing and writing the string to the file.
from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ")
with open("New_Database.xml", "w") as f:
f.write(xmlstr)
There is one possible complication, especially in Python 2, which is both less strict and less sophisticated about Unicode characters in strings. If your toprettyxml
method hands back a Unicode string (u"something"
), then you may want to cast it to a suitable file encoding, such as UTF-8. E.g. replace the one write line with:
f.write(xmlstr.encode('utf-8'))
Write .xml in Python with pretty print and encoding declaration
The most elegant solution is certainly using the third-party library lxml, which is being used a lot – for good reasons.
It offers both a pretty_print
and an xml_declaration
parameter in the tostring()
method, so you get both. And the API is quite close to that of the std-lib ElementTree, which you seem to be using now. Here's an example:
>>> from lxml import etree
>>> doc = etree.parse(xmlPath)
>>> print etree.tostring(doc, encoding='UTF-8', xml_declaration=True,
pretty_print=True)
<?xml version='1.0' encoding='UTF-8'?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>
However, I understand your desire to use the "included batteries" only.
As far as I can see, xml.etree.ElementTree
has no means of changing the indentation automatically.
But the minidom
work-around has a solution to getting both pretty-printing and a full declaration: use the encoding
parameter of the toprettyxml()
method!
>>> doc = minidom.parseString(ET.tostring(root))
>>> print doc.toprettyxml(encoding='utf8')
<?xml version="1.0" encoding="utf8"?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>
(Be aware that the returned string is already encoded and that you should write it to a file opened in binary mode ("wb"
) and without further encoding.)
Python pretty print an XML given an XML string
Here's how to parse from a text string to the lxml structured data type.
Python 2:
from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print etree.tostring(root, pretty_print=True)
Python 3:
from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print(etree.tostring(root, pretty_print=True).decode())
Outputs:
<parent>
<child>text</child>
<child>other text</child>
</parent>
Python/XML: Pretty-printing ElementTree
As the docs say, in the write
method:
file is a file name, or a file object opened for writing.
This includes a StringIO
object. So:
outfile = cStringIO.StringIO()
tree.write(of)
Then you can just pretty-print outfile
using your favorite method—just outfile.seek(0)
then pass outfile
itself to a function that takes a file, or pass outfile.getvalue()
to a function that takes a string.
However, notice that many of the ways to pretty-print XML in the question you linked don't even need this. For example:
lxml.etree.tostring
(answer #2):lxml.etree
is a near-perfect superset of the stdlib etree, so if you're going to use it for pretty-printing, just use it to build the XML in the first place.- Effbot
indent
/prettyprint
(answer #3): This expects anElementTree
tree, which is exactly what you already have, not a string or file.
XML ElementTree with pretty print
Update: See xml.etree.ElementTree.indent as of Python 3.9.
I couldn't reproduce the bad indentation from your example, but according to http://effbot.org/zone/element-lib.htm#prettyprint, your function is mis-copied. For these lines:
if not elem.text or not elem.text.strip():
elem.text = i + ""
There should be two space between the quotes:
if not elem.text or not elem.text.strip():
elem.text = i + " "
I ran this code and it displays properly.
from xml.etree import ElementTree as et
def indent(elem, level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for e in elem:
indent(e, level+1)
if not e.tail or not e.tail.strip():
e.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
data = '''<one><two><three>3</three><four>4</four></two></one>'''
tree = et.fromstring(data)
indent(tree)
et.dump(tree)
Output:
<one>
<two>
<three>3</three>
<four>4</four>
</two>
</one>
Notes for future:
- Images of text can't be copied so I leave it as an exercise for you to test on your own XML.
- Cut-n-paste exact code and input data as text to reproduce the issue to make it easy for answers to reproduce the issue.
XML pretty print fails in Python lxml
Use the "xml" output method when writing (that's the default so it does not have to be given explicitly).
Set the text
property of the c
element to an empty string to ensure that the element gets serialized as <c></c>
.
Code:
import lxml.etree as et
parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse('in.xml', parser)
b = xml_doc.getroot().find('b')
c = et.Element('c')
c.text=''
b.append(c)
xml_doc.write('out.xml', pretty_print=True)
Result (out.xml):
<a>
<b>
<c></c>
</b>
</a>
Related Topics
Create a Pandas Dataframe by Appending One Row At a Time
How to Explicitly Free Memory in Python
Urllib and "Ssl: Certificate_Verify_Failed" Error
"Unicode Error "Unicodeescape" Codec Can't Decode Bytes... Cannot Open Text Files in Python 3
What Is the Eafp Principle in Python
Installing Pip Is Not Working in Python ≪ 3.6
How to Write Json Data to a File
Most Efficient Way to Map Function Over Numpy Array
Why Is "Except: Pass" a Bad Programming Practice
How to Use Glob() to Find Files Recursively
Does Python Have "Private" Variables in Classes
Remove All Occurrences of a Value from a List
Why Can't I Iterate Twice Over the Same Data
How to Play an Mp3 With Pygame
Importerror: No Module Named 'Pygame'