How to Transform an Xml File Using Xslt in Python

How to transform an XML file using XSLT in Python?

Using lxml,

import lxml.etree as ET

dom = ET.parse(xml_filename)
xslt = ET.parse(xsl_filename)
transform = ET.XSLT(xslt)
newdom = transform(dom)
print(ET.tostring(newdom, pretty_print=True))

Transform xml using xslt in python

ET.parse(testing.xml) should be ET.parse('testing.xml'). The same change needs to be done for the second line.

I don't understand what you want to achieve with print(ET.tostring(newdom, pretty_print=True)) with output method text in the XSLT.

Anyway, the lxml documentation suggestes using newdom.write_output e.g. newdom.write_output('result.txt') is a better way to deal with outputting XSLT transformation results.

Python: Generate xml using xml & xslt file

Saxon-C 1.2.1 is the latest release of Saxon-C https://www.saxonica.com/saxon-c/index.xml and has a Python API https://www.saxonica.com/saxon-c/doc/html/saxonc.html so you can download https://www.saxonica.com/download/c.xml, install https://www.saxonica.com/saxon-c/documentation/index.html#!starting/installing and run it https://www.saxonica.com/saxon-c/documentation/index.html#!samples/samples_python from Python if you think you need to use XSLT 3.

The HE edition does not require you to buy a license.

As for the error in XSLT, if you want to test XSLT 1.0 code with the xsltfiddle then choose XslCompiledTransform as the XSLT processor and you will get a similar error for your code and the easiest way, if you really declare your variable value inline as a result tree fragment, is to use exsl:node-set:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="exsl"
version="1.0">

<xsl:output method="xml"/>

<!-- <xsl:variable name="childDoc" select="document('child.xml')"/> -->
<xsl:variable name="childDoc">
<root>
<child1 value="child1"/>
<child2 value="child2"/>
</root>
</xsl:variable>

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="inserthere">
<xsl:variable name="currentParent" select="."/>
<xsl:copy-of select="exsl:node-set($childDoc)/root/node()"/>
</xsl:template>

</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/aiyned/4

Or if you want to change values then use the template

  <xsl:template match="inserthere">
<xsl:variable name="currentParent" select="."/>
<xsl:for-each select="exsl:node-set($childDoc)/root/node()">
<xsl:copy>
<xsl:attribute name="value">
<xsl:value-of select="concat($currentParent/@value,'_',@value)"/>
</xsl:attribute>
</xsl:copy>
</xsl:for-each>
</xsl:template>

https://xsltfiddle.liberty-development.net/aiyned/3

How to parse huge XML with Python and XSLT file iteratively and write to CSV

One possibility is to use XSLT 3.0 streaming. There are two challenges here:

(a) making your code streamable. We can't judge how difficult that is without seeing the stylesheet code.

(b) installing and running a streaming XSLT 3.0 processor. This depends how locked in to the Python environment you are. If it has to be done in Python, you could try installing Saxon/C. The alternative is to call out to a different environment in which case you have more options, for example you could run Saxon-EE on Java.

LATER

Looking at the code you have posted, it's rather strange

<xsl:for-each select="level1/level2/level3/level4">
<xsl:value-of select="ancestor::root/level1/level2/topid" />

I suspect you want to output the topid of the "current" level2 element, but that's not what this is doing (in XSLT 1.0 it will print the value of the first level2/topic, in XSLT 2.0+ is will print the values of all the level2/topic elements. I suspect you really want something like this:

    <xsl:for-each select="level1/level2/level3/level4">
<xsl:value-of select="ancestor::level2/topid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="ancestor::level3/subtopid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="subid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="descr" />
<xsl:value-of select="$newline" />
</xsl:for-each>

That's almost streamable, but not quite. Streaming doesn't allow you to go back to the topid and subtopid elements. The easiest way to make it streamable might be to save the most recent values of these elements in accumulators:

<xsl:accumulator name="topid" as="xs:string" initial-value="''">
<xsl:accumulator-rule match="topid/text()" select="string(.)"/>
</xsl:accumulator>

<xsl:accumulator name="subtopid" as="xs:string" initial-value="''">
<xsl:accumulator-rule match="subtopid/text()" select="string(.)"/>
</xsl:accumulator>

and then access the values as:

    <xsl:for-each select="level1/level2/level3/level4">
<xsl:value-of select="accumulator-before('topid')" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="accumulator-before('subtopid')" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="subid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="descr" />
<xsl:value-of select="$newline" />
</xsl:for-each>

xslt template to transform xml file to xml

While asking a question it is a good idea to provide a minimal reproducible example, i.e. XML/XSLT pair.

Please try the following conceptual example.

I am using SAXON 9.7.0.15

It is very possible that the last Python line is causing the issue:

outfile.write(ET.tostring(newdom,pretty_print=True,xml_declaration=True,standalone='yes').decode())

Please try Python last lines as follows:

import sys
if sys.version_info[0] >= 3:
unicode = str
...
newdom = transform(dom)
infile = unicode((ET.tostring(newdom, pretty_print=True)))
outfile = open(structure + "\\" + filename, 'a')
outfile.write(infile, encoding='utf-8', xml_declaration=True, pretty_print=True)

https://lxml.de/api/lxml.etree._ElementTree-class.html#write

Reference link: How to transform an XML file using XSLT in Python

Input XML

<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="xsd:string">あいうえお@domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Output XML

<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:string"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">あいうえお@domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>

Python Transforming XML using XSLT

Your doing some clean up on the xml and saving it into a file whose path string is stix_report:

with open(stix_report, "w") as stix_file:
stix_file.write(ET.tostring(tree))
print("Save STIX report successful")

Then you create a string path to an xslt file:

xslt_path = os.path.join("/home",
"user",
"Desktop",
"stix_to_openioc",
"stix_to_openioc.xsl")
print("Retrieve XSL file successful")

But then you load up xml and xsl files from new variables:

from lxml import etree
f_xsl = 'stix_to_openioc.xsl'
f_xml = 'report.stix.xml'
f_out = 'report.ioc.xml'

transform = etree.XSLT(etree.parse(f_xsl))
result = transform(etree.parse(f_xml))
result.write(f_out)

I can't guarantee that it will work (since I have no idea what's in these files), but I think a good start here will be to change this code:

from lxml import etree
f_xsl = xslt_path
f_xml = stix_report
f_out = 'report.ioc.xml'

transform = etree.XSLT(etree.parse(f_xsl))
result = transform(etree.parse(f_xml))
result.write(f_out)

Identiy Transform in Python XSLT XML merging text between indexed attributes

Does this generate the output you expect?

import lxml.etree

xml_in = 'footnotes.xml'
xml_out = 'result.xml'

xslt = lxml.etree.fromstring('''
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

<!--Identity Transform.-->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>

<!--do nothing for <note prev=""> tags-->
<xsl:template match="note[@place='foot'][@prev]"/>

<xsl:template match="note[@place='foot'][@next]">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
<xsl:text>∀</xsl:text>
<xsl:copy-of select="id(substring(@next, 2))/node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>'''
)

doc = lxml.etree.parse(xml_in)

with open(xml_out, 'w') as f:
print(str(doc.xslt(xslt)), file=f)

Before:

<note place="foot" n="(h)" xml:id="seg2pn_2_1" next="#seg2pn_2_2">
aaa1 <hi rendition="#aq">some text</hi> <hi rendition="#g">aaa2</hi>
</note>
<note place="foot" n="(m)" xml:id="seg2pn_3_1" next="#seg2pn_3_2">
some text CCC some text
</note>
<note place="foot" n="(h)" xml:id="seg2pn_2_2" prev="#seg2pn_2_1">
<hi rendition="#aq">bbb1</hi> <hi rendition="#g">some text bbb2</hi>
</note>
<note place="foot" n="(m)" xml:id="seg2pn_3_2" prev="#seg2pn_3_1">
DDD1 <hi rendition="#aq">some Text</hi> <hi rendition="#g">DDD2</hi>
</note>
<note place="foot" n="(ii)" xml:id="seg2pn_10_10" next="#seg2pn_10_11">
one one one
</note>
<note place="foot" n="(ii)" xml:id="seg2pn_10_11" prev="#seg2pn_10_10">
two two two
</note>

After:

<note place="foot" n="(h)" xml:id="seg2pn_2_1" next="#seg2pn_2_2">
aaa1 <hi rendition="#aq">some text</hi> <hi rendition="#g">aaa2</hi>

<hi rendition="#aq">bbb1</hi> <hi rendition="#g">some text bbb2</hi>
</note>

<note place="foot" n="(m)" xml:id="seg2pn_3_1" next="#seg2pn_3_2">
some text CCC some text

DDD1 <hi rendition="#aq">some Text</hi> <hi rendition="#g">DDD2</hi>
</note>

<note place="foot" n="(ii)" xml:id="seg2pn_10_10" next="#seg2pn_10_11">
one one one

two two two
</note>

How to split an XML file using XSLT 2.0 in Python

Python uses libxslt which supports the EXSLT exsl:document http://exslt.org/exsl/index.html:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:template match="dataset/tags">
<xsl:for-each select="tag">
<xsl:variable name="tagName" select="@name" />

<exsl:document method="xml" href="{$tagName}.xml">
<dataset>
<xsl:copy-of select="/dataset/name"/>
<xsl:copy-of select="/dataset/comment"/>
<tags>
<xsl:copy-of select="/dataset/tags/tag[@name = $tagName]"/>
</tags>
<images>
<xsl:for-each select="/dataset/images/image[box/label = $tagName]">
<image>
<xsl:copy-of select="@file"/>
<xsl:copy-of select="box[label = $tagName]"/>
</image>
</xsl:for-each>
</images>
</dataset>
</exsl:document>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>


Related Topics



Leave a reply



Submit