How to transform an XML file using XSLT in Python?
Using lxml,
import lxml.etree as ET
dom = ET.parse(xml_filename)
xslt = ET.parse(xsl_filename)
transform = ET.XSLT(xslt)
newdom = transform(dom)
print(ET.tostring(newdom, pretty_print=True))
Transform xml using xslt in python
ET.parse(testing.xml)
should be ET.parse('testing.xml')
. The same change needs to be done for the second line.
I don't understand what you want to achieve with print(ET.tostring(newdom, pretty_print=True))
with output method text
in the XSLT.
Anyway, the lxml documentation suggestes using newdom.write_output
e.g. newdom.write_output('result.txt')
is a better way to deal with outputting XSLT transformation results.
Python: Generate xml using xml & xslt file
Saxon-C 1.2.1 is the latest release of Saxon-C https://www.saxonica.com/saxon-c/index.xml and has a Python API https://www.saxonica.com/saxon-c/doc/html/saxonc.html so you can download https://www.saxonica.com/download/c.xml, install https://www.saxonica.com/saxon-c/documentation/index.html#!starting/installing and run it https://www.saxonica.com/saxon-c/documentation/index.html#!samples/samples_python from Python if you think you need to use XSLT 3.
The HE edition does not require you to buy a license.
As for the error in XSLT, if you want to test XSLT 1.0 code with the xsltfiddle then choose XslCompiledTransform as the XSLT processor and you will get a similar error for your code and the easiest way, if you really declare your variable value inline as a result tree fragment, is to use exsl:node-set
:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="exsl"
version="1.0">
<xsl:output method="xml"/>
<!-- <xsl:variable name="childDoc" select="document('child.xml')"/> -->
<xsl:variable name="childDoc">
<root>
<child1 value="child1"/>
<child2 value="child2"/>
</root>
</xsl:variable>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="inserthere">
<xsl:variable name="currentParent" select="."/>
<xsl:copy-of select="exsl:node-set($childDoc)/root/node()"/>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/aiyned/4
Or if you want to change values then use the template
<xsl:template match="inserthere">
<xsl:variable name="currentParent" select="."/>
<xsl:for-each select="exsl:node-set($childDoc)/root/node()">
<xsl:copy>
<xsl:attribute name="value">
<xsl:value-of select="concat($currentParent/@value,'_',@value)"/>
</xsl:attribute>
</xsl:copy>
</xsl:for-each>
</xsl:template>
https://xsltfiddle.liberty-development.net/aiyned/3
How to parse huge XML with Python and XSLT file iteratively and write to CSV
One possibility is to use XSLT 3.0 streaming. There are two challenges here:
(a) making your code streamable. We can't judge how difficult that is without seeing the stylesheet code.
(b) installing and running a streaming XSLT 3.0 processor. This depends how locked in to the Python environment you are. If it has to be done in Python, you could try installing Saxon/C. The alternative is to call out to a different environment in which case you have more options, for example you could run Saxon-EE on Java.
LATER
Looking at the code you have posted, it's rather strange
<xsl:for-each select="level1/level2/level3/level4">
<xsl:value-of select="ancestor::root/level1/level2/topid" />
I suspect you want to output the topid
of the "current" level2
element, but that's not what this is doing (in XSLT 1.0 it will print the value of the first level2/topic
, in XSLT 2.0+ is will print the values of all the level2/topic
elements. I suspect you really want something like this:
<xsl:for-each select="level1/level2/level3/level4">
<xsl:value-of select="ancestor::level2/topid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="ancestor::level3/subtopid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="subid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="descr" />
<xsl:value-of select="$newline" />
</xsl:for-each>
That's almost streamable, but not quite. Streaming doesn't allow you to go back to the topid and subtopid elements. The easiest way to make it streamable might be to save the most recent values of these elements in accumulators:
<xsl:accumulator name="topid" as="xs:string" initial-value="''">
<xsl:accumulator-rule match="topid/text()" select="string(.)"/>
</xsl:accumulator>
<xsl:accumulator name="subtopid" as="xs:string" initial-value="''">
<xsl:accumulator-rule match="subtopid/text()" select="string(.)"/>
</xsl:accumulator>
and then access the values as:
<xsl:for-each select="level1/level2/level3/level4">
<xsl:value-of select="accumulator-before('topid')" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="accumulator-before('subtopid')" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="subid" />
<xsl:value-of select="$delimiter" />
<xsl:value-of select="descr" />
<xsl:value-of select="$newline" />
</xsl:for-each>
xslt template to transform xml file to xml
While asking a question it is a good idea to provide a minimal reproducible example, i.e. XML/XSLT pair.
Please try the following conceptual example.
I am using SAXON 9.7.0.15
It is very possible that the last Python line is causing the issue:
outfile.write(ET.tostring(newdom,pretty_print=True,xml_declaration=True,standalone='yes').decode())
Please try Python last lines as follows:
import sys
if sys.version_info[0] >= 3:
unicode = str
...
newdom = transform(dom)
infile = unicode((ET.tostring(newdom, pretty_print=True)))
outfile = open(structure + "\\" + filename, 'a')
outfile.write(infile, encoding='utf-8', xml_declaration=True, pretty_print=True)
https://lxml.de/api/lxml.etree._ElementTree-class.html#write
Reference link: How to transform an XML file using XSLT in Python
Input XML
<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="xsd:string">あいうえお@domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" standalone="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output XML
<?xml version="1.0" encoding="UTF-8"?>
<a:ruleInputTestConfigs xmlns:a="URI">
<a:value xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:string"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">あいうえお@domain.com</a:value>
<a:nameRef>email</a:nameRef>
<a:id>1</a:id>
</a:ruleInputTestConfigs>
Python Transforming XML using XSLT
Your doing some clean up on the xml and saving it into a file whose path string is stix_report
:
with open(stix_report, "w") as stix_file:
stix_file.write(ET.tostring(tree))
print("Save STIX report successful")
Then you create a string path to an xslt file:
xslt_path = os.path.join("/home",
"user",
"Desktop",
"stix_to_openioc",
"stix_to_openioc.xsl")
print("Retrieve XSL file successful")
But then you load up xml and xsl files from new variables:
from lxml import etree
f_xsl = 'stix_to_openioc.xsl'
f_xml = 'report.stix.xml'
f_out = 'report.ioc.xml'
transform = etree.XSLT(etree.parse(f_xsl))
result = transform(etree.parse(f_xml))
result.write(f_out)
I can't guarantee that it will work (since I have no idea what's in these files), but I think a good start here will be to change this code:
from lxml import etree
f_xsl = xslt_path
f_xml = stix_report
f_out = 'report.ioc.xml'
transform = etree.XSLT(etree.parse(f_xsl))
result = transform(etree.parse(f_xml))
result.write(f_out)
Identiy Transform in Python XSLT XML merging text between indexed attributes
Does this generate the output you expect?
import lxml.etree
xml_in = 'footnotes.xml'
xml_out = 'result.xml'
xslt = lxml.etree.fromstring('''
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<!--Identity Transform.-->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!--do nothing for <note prev=""> tags-->
<xsl:template match="note[@place='foot'][@prev]"/>
<xsl:template match="note[@place='foot'][@next]">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
<xsl:text>∀</xsl:text>
<xsl:copy-of select="id(substring(@next, 2))/node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>'''
)
doc = lxml.etree.parse(xml_in)
with open(xml_out, 'w') as f:
print(str(doc.xslt(xslt)), file=f)
Before:
<note place="foot" n="(h)" xml:id="seg2pn_2_1" next="#seg2pn_2_2">
aaa1 <hi rendition="#aq">some text</hi> <hi rendition="#g">aaa2</hi>
</note>
<note place="foot" n="(m)" xml:id="seg2pn_3_1" next="#seg2pn_3_2">
some text CCC some text
</note>
<note place="foot" n="(h)" xml:id="seg2pn_2_2" prev="#seg2pn_2_1">
<hi rendition="#aq">bbb1</hi> <hi rendition="#g">some text bbb2</hi>
</note>
<note place="foot" n="(m)" xml:id="seg2pn_3_2" prev="#seg2pn_3_1">
DDD1 <hi rendition="#aq">some Text</hi> <hi rendition="#g">DDD2</hi>
</note>
<note place="foot" n="(ii)" xml:id="seg2pn_10_10" next="#seg2pn_10_11">
one one one
</note>
<note place="foot" n="(ii)" xml:id="seg2pn_10_11" prev="#seg2pn_10_10">
two two two
</note>
After:
<note place="foot" n="(h)" xml:id="seg2pn_2_1" next="#seg2pn_2_2">
aaa1 <hi rendition="#aq">some text</hi> <hi rendition="#g">aaa2</hi>
∀
<hi rendition="#aq">bbb1</hi> <hi rendition="#g">some text bbb2</hi>
</note>
<note place="foot" n="(m)" xml:id="seg2pn_3_1" next="#seg2pn_3_2">
some text CCC some text
∀
DDD1 <hi rendition="#aq">some Text</hi> <hi rendition="#g">DDD2</hi>
</note>
<note place="foot" n="(ii)" xml:id="seg2pn_10_10" next="#seg2pn_10_11">
one one one
∀
two two two
</note>
How to split an XML file using XSLT 2.0 in Python
Python uses libxslt which supports the EXSLT exsl:document
http://exslt.org/exsl/index.html:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:template match="dataset/tags">
<xsl:for-each select="tag">
<xsl:variable name="tagName" select="@name" />
<exsl:document method="xml" href="{$tagName}.xml">
<dataset>
<xsl:copy-of select="/dataset/name"/>
<xsl:copy-of select="/dataset/comment"/>
<tags>
<xsl:copy-of select="/dataset/tags/tag[@name = $tagName]"/>
</tags>
<images>
<xsl:for-each select="/dataset/images/image[box/label = $tagName]">
<image>
<xsl:copy-of select="@file"/>
<xsl:copy-of select="box[label = $tagName]"/>
</image>
</xsl:for-each>
</images>
</dataset>
</exsl:document>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Related Topics
How to Set Environment Variables in Pycharm
Maximum Value for Long Integer
Replacing Few Values in a Pandas Dataframe Column with Another Value
List to Dictionary Conversion with Multiple Values Per Key
How to Make a Tkinter Window Jump to the Front
List Comprehension VS Generator Expression's Weird Timeit Results
Generating a List of Random Numbers, Summing to 1
Running Selenium with Headless Chrome Webdriver
Python: JSON.Loads Returns Items Prefixing with 'U'
Most Recent Previous Business Day in Python
Python Unexpected Eof While Parsing
Convert String Date to Timestamp in Python
Datetime Dtypes in Pandas Read_Csv
What Is the Easiest Way to Remove All Packages Installed by Pip