Converting XML to JSON using Python?
There is no "one-to-one" mapping between XML and JSON, so converting one to the other necessarily requires some understanding of what you want to do with the results.
That being said, Python's standard library has several modules for parsing XML (including DOM, SAX, and ElementTree). As of Python 2.6, support for converting Python data structures to and from JSON is included in the json
module.
So the infrastructure is there.
Python convert xml to json a bytes-like object is required
I am using Python 3.7.6
When I tried, ET.fromstring() will parse the XML that is already represented in string format.
import os
import xml.etree.ElementTree as et
xml_doc_path = os.path.abspath(r"C:\dir1\path\to\file\example.xml")
root = et.fromstring(xml_doc_path)
print(root)
this example will show the following ERROR
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 2
I used ET.tostring() to generate a string representation of the XML data, which can be used as a valid argument for xmltodict.parse(). Click here for the ET.tostring() documentation.
The below code will parse an XML file and also generates the JSON file. I used my own XML example. Make sure all the XML tags are closed properly.
XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<element1 attribute1 = 'first attribute'>
</element1>
<element2 attribute1 = 'second attribute'>
some data
</element2>
</root>
PYTHON CODE:
import os
import xmltodict
import xml.etree.ElementTree as et
import json
xml_doc_path = os.path.abspath(r"C:\directory\path\to\file\example.xml")
xml_tree = et.parse(xml_doc_path)
root = xml_tree.getroot()
#set encoding to and method proper
to_string = et.tostring(root, encoding='UTF-8', method='xml')
xml_to_dict = xmltodict.parse(to_string)
with open("json_data.json", "w",) as json_file:
json.dump(xml_to_dict, json_file, indent = 2)
OUTPUT:
The above code will create the following JSON file:
{
"root": {
"element1": {
"@attribute1": "first attribute"
},
"element2": {
"@attribute1": "second attribute",
"#text": "some data"
}
}
}
How can i convert an xml file into JSON using python?
This is probably what you are looking for:
https://github.com/mutaku/xml2json
import xml2json
s = '''<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>'''
print xml2json.xml2json(s)
How to convert XML to JSON in Python?
Soviut's advice for lxml objectify is good. With a specially subclassed simplejson, you can turn an lxml objectify result into json.
import simplejson as json
import lxml
class objectJSONEncoder(json.JSONEncoder):
"""A specialized JSON encoder that can handle simple lxml objectify types
>>> from lxml import objectify
>>> obj = objectify.fromstring("<Book><price>1.50</price><author>W. Shakespeare</author></Book>")
>>> objectJSONEncoder().encode(obj)
'{"price": 1.5, "author": "W. Shakespeare"}'
"""
def default(self,o):
if isinstance(o, lxml.objectify.IntElement):
return int(o)
if isinstance(o, lxml.objectify.NumberElement) or isinstance(o, lxml.objectify.FloatElement):
return float(o)
if isinstance(o, lxml.objectify.ObjectifiedDataElement):
return str(o)
if hasattr(o, '__dict__'):
#For objects with a __dict__, return the encoding of the __dict__
return o.__dict__
return json.JSONEncoder.default(self, o)
See the docstring for example of usage, essentially you pass the result of lxml objectify
to the encode method of an instance of objectJSONEncoder
Note that Koen's point is very valid here, the solution above only works for simply nested xml and doesn't include the name of root elements. This could be fixed.
I've included this class in a gist here: http://gist.github.com/345559
Transforming xml to json with python lxml
I think if you need to preserve document order (what you referenced as "text-order"), XSLT is a good option. XSLT can output plain text which can be loaded as json. Luckily lxml supports XSLT 1.0.
Example...
XML Input (input.xml)
<root>
<tag>
Some tag-text<subtag>Some subtag-text</subtag> Some tail-text
</tag>
</root>
XSLT 1.0 (xml2json.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*">
<xsl:if test="position() != 1">, </xsl:if>
<xsl:value-of select="concat('{"',
local-name(),
'": ')"/>
<xsl:choose>
<xsl:when test="count(node()) > 1">
<xsl:text>[</xsl:text>
<xsl:apply-templates/>
<xsl:text>]</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates/>
</xsl:otherwise>
</xsl:choose>
<xsl:text>}</xsl:text>
</xsl:template>
<xsl:template match="text()">
<xsl:if test="position() != 1">, </xsl:if>
<xsl:value-of select="concat('{"text": "',
normalize-space(),
'"}')"/>
</xsl:template>
</xsl:stylesheet>
Python
import json
from lxml import etree
tree = etree.parse("input.xml")
xslt_root = etree.parse("xml2json.xsl")
transform = etree.XSLT(xslt_root)
result = transform(tree)
json_load = json.loads(str(result))
json_dump = json.dumps(json_load, indent=2)
print(json_dump)
For informational purposes, the output of the xslt (result
) is:
{"root": {"tag": [{"text": "Some tag-text"}, {"subtag": {"text": "Some subtag-text"}}, {"text": "Some tail-text"}]}}
The printed output from Python (after loads()/dumps()) is:
{
"root": {
"tag": [
{
"text": "Some tag-text"
},
{
"subtag": {
"text": "Some subtag-text"
}
},
{
"text": "Some tail-text"
}
]
}
}
File not converting to JSON properly from xml
You need to add below code after json_data=json.dumps(my_dict)
to convert string to json object
json_data = json.loads(json_data)
How to convert XML to JSON using python?
I would recommend to use XSLT to transform the XML to JSON:
import json
from lxml import etree
XSL = '''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text"/>
<xsl:template match="/collection">
<xsl:text>{</xsl:text>
<xsl:apply-templates/>
<xsl:text>}</xsl:text>
</xsl:template>
<xsl:template match="genre">
<xsl:text>"</xsl:text>
<xsl:value-of select="@category"/>
<xsl:text>": [</xsl:text>
<xsl:for-each select="descendant::movie" >
<xsl:text>"</xsl:text>
<xsl:value-of select="@title"/>
<xsl:text>"</xsl:text>
<xsl:if test="position() != last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>]</xsl:text>
<xsl:if test="following-sibling::*">
<xsl:text>,
</xsl:text>
</xsl:if>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>'''
# load input
dom = etree.parse('movies.xml')
# load XSLT
transform = etree.XSLT(etree.fromstring(XSL))
# apply XSLT on loaded dom
json_text = str(transform(dom))
# json_text contains the data converted to JSON format.
# you can use it with the JSON API. Example:
data = json.loads(json_text)
print(data)
Output:
{'Action': ['Indiana Jones: The raiders of the lost Ark', 'THE KARATE KID', 'Back 2 the Future', 'X-Men', 'Batman Returns', 'Reservoir Dogs'], 'Thriller': ['ALIEN', "Ferris Bueller's Day Off", 'American Psycho']}
I don't understand what you want to achieve with "second output" and "third output", though, as these outputs seem to be constants.
Related Topics
Replace() Method Not Working on Pandas Dataframe
Df.Append() Is Not Appending to the Dataframe
Error When Configuring Tkinter Widget: 'Nonetype' Object Has No Attribute
How to Debug in Django, the Good Way
Elegant Python Function to Convert Camelcase to Snake_Case
Multiprocessing: Understanding Logic Behind 'Chunksize'
What Is the Quickest Way to Http Get in Python
Differencebetween Contiguous and Non-Contiguous Arrays
How to Convert a Dataframe to a Dictionary
Setting Different Color for Each Series in Scatter Plot on Matplotlib
Creating a New Column Based on If-Elif-Else Condition
How to Insert a Column at a Specific Column Index in Pandas
Choosing a File in Python with Simple Dialog
Matplotlib Y Axis Values Are Not Ordered
Datetime Dtypes in Pandas Read_Csv
What Does 'Valueerror: Cannot Reindex from a Duplicate Axis' Mean
How to Pass a Default Argument Value of an Instance Member to a Method