Converting Xml to JSON Using Python

Converting XML to JSON using Python?

There is no "one-to-one" mapping between XML and JSON, so converting one to the other necessarily requires some understanding of what you want to do with the results.

That being said, Python's standard library has several modules for parsing XML (including DOM, SAX, and ElementTree). As of Python 2.6, support for converting Python data structures to and from JSON is included in the json module.

So the infrastructure is there.

Python convert xml to json a bytes-like object is required

I am using Python 3.7.6

When I tried, ET.fromstring() will parse the XML that is already represented in string format.

import os
import xml.etree.ElementTree as et
xml_doc_path = os.path.abspath(r"C:\dir1\path\to\file\example.xml")
root = et.fromstring(xml_doc_path)
print(root)

this example will show the following ERROR

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 2

I used ET.tostring() to generate a string representation of the XML data, which can be used as a valid argument for xmltodict.parse(). Click here for the ET.tostring() documentation.

The below code will parse an XML file and also generates the JSON file. I used my own XML example. Make sure all the XML tags are closed properly.

XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<element1 attribute1 = 'first attribute'>
</element1>
<element2 attribute1 = 'second attribute'>
some data
</element2>
</root>

PYTHON CODE:

import os
import xmltodict
import xml.etree.ElementTree as et
import json
xml_doc_path = os.path.abspath(r"C:\directory\path\to\file\example.xml")

xml_tree = et.parse(xml_doc_path)

root = xml_tree.getroot()
#set encoding to and method proper
to_string = et.tostring(root, encoding='UTF-8', method='xml')

xml_to_dict = xmltodict.parse(to_string)

with open("json_data.json", "w",) as json_file:
json.dump(xml_to_dict, json_file, indent = 2)

OUTPUT:
The above code will create the following JSON file:

{
"root": {
"element1": {
"@attribute1": "first attribute"
},
"element2": {
"@attribute1": "second attribute",
"#text": "some data"
}
}
}

How can i convert an xml file into JSON using python?

This is probably what you are looking for:

https://github.com/mutaku/xml2json

import xml2json

s = '''<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>'''
print xml2json.xml2json(s)

How to convert XML to JSON in Python?

Soviut's advice for lxml objectify is good. With a specially subclassed simplejson, you can turn an lxml objectify result into json.

import simplejson as json
import lxml

class objectJSONEncoder(json.JSONEncoder):
"""A specialized JSON encoder that can handle simple lxml objectify types
>>> from lxml import objectify
>>> obj = objectify.fromstring("<Book><price>1.50</price><author>W. Shakespeare</author></Book>")
>>> objectJSONEncoder().encode(obj)
'{"price": 1.5, "author": "W. Shakespeare"}'
"""

def default(self,o):
if isinstance(o, lxml.objectify.IntElement):
return int(o)
if isinstance(o, lxml.objectify.NumberElement) or isinstance(o, lxml.objectify.FloatElement):
return float(o)
if isinstance(o, lxml.objectify.ObjectifiedDataElement):
return str(o)
if hasattr(o, '__dict__'):
#For objects with a __dict__, return the encoding of the __dict__
return o.__dict__
return json.JSONEncoder.default(self, o)

See the docstring for example of usage, essentially you pass the result of lxml objectify to the encode method of an instance of objectJSONEncoder

Note that Koen's point is very valid here, the solution above only works for simply nested xml and doesn't include the name of root elements. This could be fixed.

I've included this class in a gist here: http://gist.github.com/345559

Transforming xml to json with python lxml

I think if you need to preserve document order (what you referenced as "text-order"), XSLT is a good option. XSLT can output plain text which can be loaded as json. Luckily lxml supports XSLT 1.0.

Example...

XML Input (input.xml)

<root>
<tag>
Some tag-text<subtag>Some subtag-text</subtag> Some tail-text
</tag>
</root>

XSLT 1.0 (xml2json.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>

<xsl:template match="*">
<xsl:if test="position() != 1">, </xsl:if>
<xsl:value-of select="concat('{"',
local-name(),
'": ')"/>
<xsl:choose>
<xsl:when test="count(node()) > 1">
<xsl:text>[</xsl:text>
<xsl:apply-templates/>
<xsl:text>]</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates/>
</xsl:otherwise>
</xsl:choose>
<xsl:text>}</xsl:text>
</xsl:template>

<xsl:template match="text()">
<xsl:if test="position() != 1">, </xsl:if>
<xsl:value-of select="concat('{"text": "',
normalize-space(),
'"}')"/>
</xsl:template>

</xsl:stylesheet>

Python

import json
from lxml import etree

tree = etree.parse("input.xml")

xslt_root = etree.parse("xml2json.xsl")
transform = etree.XSLT(xslt_root)

result = transform(tree)

json_load = json.loads(str(result))

json_dump = json.dumps(json_load, indent=2)

print(json_dump)

For informational purposes, the output of the xslt (result) is:

{"root": {"tag": [{"text": "Some tag-text"}, {"subtag": {"text": "Some subtag-text"}}, {"text": "Some tail-text"}]}}

The printed output from Python (after loads()/dumps()) is:

{
"root": {
"tag": [
{
"text": "Some tag-text"
},
{
"subtag": {
"text": "Some subtag-text"
}
},
{
"text": "Some tail-text"
}
]
}
}

File not converting to JSON properly from xml

You need to add below code after json_data=json.dumps(my_dict) to convert string to json object

json_data = json.loads(json_data)

How to convert XML to JSON using python?

I would recommend to use XSLT to transform the XML to JSON:

import json

from lxml import etree

XSL = '''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text"/>

<xsl:template match="/collection">
<xsl:text>{</xsl:text>
<xsl:apply-templates/>
<xsl:text>}</xsl:text>
</xsl:template>

<xsl:template match="genre">
<xsl:text>"</xsl:text>
<xsl:value-of select="@category"/>
<xsl:text>": [</xsl:text>
<xsl:for-each select="descendant::movie" >
<xsl:text>"</xsl:text>
<xsl:value-of select="@title"/>
<xsl:text>"</xsl:text>
<xsl:if test="position() != last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>]</xsl:text>
<xsl:if test="following-sibling::*">
<xsl:text>,
</xsl:text>
</xsl:if>
</xsl:template>

<xsl:template match="text()"/>
</xsl:stylesheet>'''

# load input
dom = etree.parse('movies.xml')
# load XSLT
transform = etree.XSLT(etree.fromstring(XSL))

# apply XSLT on loaded dom
json_text = str(transform(dom))

# json_text contains the data converted to JSON format.
# you can use it with the JSON API. Example:
data = json.loads(json_text)
print(data)

Output:

{'Action': ['Indiana Jones: The raiders of the lost Ark', 'THE KARATE KID', 'Back 2 the Future', 'X-Men', 'Batman Returns', 'Reservoir Dogs'], 'Thriller': ['ALIEN', "Ferris Bueller's Day Off", 'American Psycho']}

I don't understand what you want to achieve with "second output" and "third output", though, as these outputs seem to be constants.



Related Topics



Leave a reply



Submit