Generate/Get Xpath from Xml Node Java

Generate/get xpath from XML node java

Update:

@c0mrade has updated his question. Here is a solution to it:

This XSLT transformation:

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>

<xsl:variable name="vApos">'</xsl:variable>

<xsl:template match="*[@* or not(*)] ">
<xsl:if test="not(*)">
<xsl:apply-templates select="ancestor-or-self::*" mode="path"/>
<xsl:value-of select="concat('=',$vApos,.,$vApos)"/>
<xsl:text> </xsl:text>
</xsl:if>
<xsl:apply-templates select="@*|*"/>
</xsl:template>

<xsl:template match="*" mode="path">
<xsl:value-of select="concat('/',name())"/>
<xsl:variable name="vnumPrecSiblings" select=
"count(preceding-sibling::*[name()=name(current())])"/>
<xsl:if test="$vnumPrecSiblings">
<xsl:value-of select="concat('[', $vnumPrecSiblings +1, ']')"/>
</xsl:if>
</xsl:template>

<xsl:template match="@*">
<xsl:apply-templates select="../ancestor-or-self::*" mode="path"/>
<xsl:value-of select="concat('[@',name(), '=',$vApos,.,$vApos,']')"/>
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root>
<elemA>one</elemA>
<elemA attribute1='first' attribute2='second'>two</elemA>
<elemB>three</elemB>
<elemA>four</elemA>
<elemC>
<elemB>five</elemB>
</elemC>
</root>

produces exactly the wanted, correct result:

/root/elemA='one'
/root/elemA[2]='two'
/root/elemA[2][@attribute1='first']
/root/elemA[2][@attribute2='second']
/root/elemB='three'
/root/elemA[3]='four'
/root/elemC/elemB='five'

When applied to the newly-provided document by @c0mrade:

<root>
<elemX serial="kefw90234kf2esda9231">
<id>89734</id>
</elemX>
</root>

again the correct result is produced:

/root/elemX[@serial='kefw90234kf2esda9231']
/root/elemX/id='89734'

Explanation:

  • Only elements that have no children elements, or have attributes are matched and processed.

  • For any such element, if it doesn't have children-elements all of its ancestor-or self elements are processed in a specific mode, named 'path'. Then the "='theValue'" part is output and then a NL character.

  • All attributes of the matched element are then processed.

  • Then finally, templates are applied to all children-elements.

  • Processing an element in the 'path' mode is simple: A / character and the name of the element are output. Then, if there are preceding siblings with the same name, a "[numPrecSiblings+1]` part is output.

  • Processing of attributes is simple: First all ancestor-or-self:: elements of its parent are processed in 'path' mode, then the [attrName=attrValue] part is output, followed by a NL character.

Do note:

  • Names that are in a namespace are displayed without any problem and in their initial readable form.

  • To aid readability, an index of [1] is never displayed.


Below is my initial answer (may be ignored)

Here is a pure XSLT 1.0 solution:

Below is a sample xml document and a stylesheet that takes a node-set parameter and produces one valid XPath expression for every member-node.

stylesheet (buildPath.xsl):



<xsl:stylesheet version='1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
>

<xsl:output method="text"/>
<xsl:variable name="theParmNodes" select="//namespace::*[local-name() =
'myNamespace']"/>
<xsl:template match="/">
<xsl:variable name="theResult">
<xsl:for-each select="$theParmNodes">
<xsl:variable name="theNode" select="."/>
<xsl:for-each select="$theNode |
$theNode/ancestor-or-self::node()[..]">
<xsl:element name="slash">/</xsl:element>
<xsl:choose>
<xsl:when test="self::*">
<xsl:element name="nodeName">
<xsl:value-of select="name()"/>
<xsl:variable name="thisPosition"
select="count(preceding-sibling::*[name(current()) =
name()])"/>
<xsl:variable name="numFollowing"
select="count(following-sibling::*[name(current()) =
name()])"/>
<xsl:if test="$thisPosition + $numFollowing > 0">
<xsl:value-of select="concat('[', $thisPosition +
1, ']')"/>
</xsl:if>
</xsl:element>
</xsl:when>
<xsl:otherwise> <!-- This node is not an element -->
<xsl:choose>
<xsl:when test="count(. | ../@*) = count(../@*)">
<!-- Attribute -->
<xsl:element name="nodeName">
<xsl:value-of select="concat('@',name())"/>
</xsl:element>
</xsl:when>
<xsl:when test="self::text()"> <!-- Text -->
<xsl:element name="nodeName">
<xsl:value-of select="'text()'"/>
<xsl:variable name="thisPosition"
select="count(preceding-sibling::text())"/>
<xsl:variable name="numFollowing"
select="count(following-sibling::text())"/>
<xsl:if test="$thisPosition + $numFollowing > 0">
<xsl:value-of select="concat('[', $thisPosition +
1, ']')"/>
</xsl:if>
</xsl:element>
</xsl:when>
<xsl:when test="self::processing-instruction()">
<!-- Processing Instruction -->
<xsl:element name="nodeName">
<xsl:value-of select="'processing-instruction()'"/>
<xsl:variable name="thisPosition"
select="count(preceding-sibling::processing-instruction())"/>
<xsl:variable name="numFollowing"
select="count(following-sibling::processing-instruction())"/>
<xsl:if test="$thisPosition + $numFollowing > 0">
<xsl:value-of select="concat('[', $thisPosition +
1, ']')"/>
</xsl:if>
</xsl:element>
</xsl:when>
<xsl:when test="self::comment()"> <!-- Comment -->
<xsl:element name="nodeName">
<xsl:value-of select="'comment()'"/>
<xsl:variable name="thisPosition"
select="count(preceding-sibling::comment())"/>
<xsl:variable name="numFollowing"
select="count(following-sibling::comment())"/>
<xsl:if test="$thisPosition + $numFollowing > 0">
<xsl:value-of select="concat('[', $thisPosition +
1, ']')"/>
</xsl:if>
</xsl:element>
</xsl:when>
<!-- Namespace: -->
<xsl:when test="count(. | ../namespace::*) =
count(../namespace::*)">

<xsl:variable name="apos">'</xsl:variable>
<xsl:element name="nodeName">
<xsl:value-of select="concat('namespace::*',
'[local-name() = ', $apos, local-name(), $apos, ']')"/>

</xsl:element>
</xsl:when>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="msxsl:node-set($theResult)"/>
</xsl:template>
</xsl:stylesheet>

xml source (buildPath.xml):



<!-- top level Comment -->
<root>
<nodeA>textA</nodeA>
<nodeA id="nodeA-2">
<?myProc ?>
xxxxxxxx
<nodeB/>
<nodeB xmlns:myNamespace="myTestNamespace">
<!-- Comment within /root/nodeA[2]/nodeB[2] -->
<nodeC/>
<!-- 2nd Comment within /root/nodeA[2]/nodeB[2] -->
</nodeB>
yyyyyyy
<nodeB/>
<?myProc2 ?>
</nodeA>
</root>
<!-- top level Comment -->

Result:

/root/nodeA[2]/nodeB[2]/namespace::*[local-name() = 'myNamespace']
/root/nodeA[2]/nodeB[2]/nodeC/namespace::*[local-name() =
'myNamespace']

Get Xpath from the org.w3c.dom.Node

There's no generic method for getting the XPath, mainly because there's no one generic XPath that identifies a particular node in the document. In some schemas, nodes will be uniquely identified by an attribute (id and name are probably the most common attributes.) In others, the name of each element (that is, the tag) is enough to uniquely identify a node. In a few (unlikely, but possible) cases, there's no one unique name or attribute that takes you to a specific node, and so you'd need to use cardinality (get the n'th child of the m'th child of...).

EDIT:
In most cases, it's not hard to create a schema-dependent function to assemble an XPath for a given node. For example, suppose you have a document where every node is uniquely identified by an id attribute, and you're not using namespaces. Then (I think) the following pseudo-Java would work to return an XPath based on those attributes. (Warning: I have not tested this.)

String getXPath(Node node)
{
Node parent = node.getParent();
if (parent == null) {
return "/" + node.getTagName();
}
return getXPath(parent) + "/" + "[@id='" + node.getAttribute("id") + "']";
}

Attempting to use Java & Xpath to updated the value of any selected node inside an XML doc

For the test you performed online it seems your XML file contains namespace information.

With that information in mind, probably both of your examples of XPath evaluation would work, or not, dependent on several things.

For example, you probably can use the attempt #4, and the XPath evaluation will be adequate, if you are using a non namespace aware (the default) DocumentBuilderFactory and you o not provide any namespace information in your XPath expression.

But the XPath evaluation in attempt #3 can also be adequate if the inverse conditions apply, i.e., you are using a namespace aware DocumentBuilderFactory:

DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setNamespaceAware(true);

and you provide namespace information in your XPath expression and a convenient NamespaceContext implementation. Please, see this related SO question and this great IBM article.

Be aware that you do not need to provide the same namespace prefixes in both your XML file an XPath expression, the only requirement is namespace awareness in XML (XPath is always namespace aware).

Given that conditions, I think you can apply both approaches.

In any case, I think the problem may have to do with the way you are dealing with the actual text replacement: you are looking for a node with a value attribute, and reviewing the associated XML Schema this attribute does not exist.

Please, consider instead the following approach:

// You can get here following both attempts #3 an #4
Node node = (Node) xpath.evaluate(sb.toString(), doc, XPathConstants.NODE);
boolean changed = node != null;
if (changed) {
// See https://docs.oracle.com/javase/9/docs/api/org/w3c/dom/Node.html#setTextContent-java.lang.String-
node.setTextContent(newValue);
SingleTask.currentTask.setDoc(doc);
}

return changed;

This code assumes that the selected node will be unique to work properly.

Although probably unlike, please, be aware that the way in which you are constructing the XPath selector from the JTree model can provide duplicates if you define the same value for repeated elements in your XML. Consider the elements external_id_property_maps in your screenshot, for instance.

In order to avoid that, you can take a different approach when constructing the XPath selector.

It is unclear for your code snippet, but probably you are using DefaultMutableTreeNode as the base JTree node type. If that is the case, you can associate with every node the arbitrary information you need to.

Consider for example the creation of a simple POJO with two fields, the name of the Element that the node represents, and some kind of unique, generated, id, let's name it uid or uuid to avoid confusion with the id attribute, most likely included in the original XML document.

This uid should be associated with every node. Maybe you can take advantage of the JTree creation process and, while processing every node of your XML file, include this attribute as well, generated using the UUID class, for example.

Or you can apply a XSLT transform to the original XML document prior to representation:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes"/>

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:attribute name="uid">
<xsl:value-of select="generate-id(.)"/>
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

With this changes, your XPath query should looks like:

/ns:Ingest_LDD[@uid='w1ab1']/ns:Property_Maps[@uid='w1ab1a']/ns:identifier[@uid='w1ab1aq']

Of course, it will be necessary to modify the code devoted to the construction of this expression from the selected path of the JTree to take the custom object into account.

You can take this approach to the limit and use a single selector based solely in this uid attribute, although I think that for performance reasons it will be not appropriate:

//*[@uid='w1ab1']

Putting it all together, you can try something like the following.

Please, consider this XML file:

<?xml version="1.0" encoding="utf-8" ?>
<Ingest_LDD xmlns="http://pds.nasa.gov/pds4/pds/v1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pds.nasa.gov/pds4/pds/v1 https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.xsd">
<!-- Please, forgive me, I am aware that the document is not XML Schema conformant,
only for exemplification of the default namespace -->
<Property_Maps>
<identifier>identifier1</identifier>
</Property_Maps>
</Ingest_LDD>

First, let's parse the document:

DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
// As the XML contains namespace, let's configure the parser namespace aware
// This will be of relevance when evaluating XPath
builderFactory.setNamespaceAware(true);
DocumentBuilder builder = builderFactory.newDocumentBuilder();
// Parse the document from some source
Document document = builder.parse(...);
// See http://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
document.getDocumentElement().normalize();

Now, create a JTree structure corresponding to the input XML file. First, let's create a convenient POJO to store the required tree node information:

public class NodeInformation {
// Node name
private String name;
// Node uid
private String did;
// Node value
private String value;

// Setters and getters

// Will be reused by DefaultMutableTreeNode
@Override
public String toString() {
return this.name;
}
}

Convert the XML file to its JTree counterpart:

// Get a reference to root element
Element rootElement = document.getDocumentElement();
// Create root tree node
DefaultMutableTreeNode rootTreeNode = getNodeInformation(rootElement);
// Traverse DOM
traverse(rootTreeNode, rootElement);
// Create tree and tree model based on the computed root tree node
DefaultTreeModel treeModel = new DefaultTreeModel(rootTreeNode);
JTree tree = new JTree(treeModel);

Where:

private NodeInformation getNodeInformation(Node childElement) {
NodeInformation nodeInformation = new NodeInformation();
String name = childElement.getNodeName();
nodeInformation.setName(name);
// Provide a new unique identifier for every node
String uid = UUID.randomUUID().toString();
nodeInformation.setUid(uid);
// Uhnn.... We need to associate the new uid with the DOM node as well.
// There is nothing wrong with it but mutating the DOM in this way in
// a method that should be "read-only" is not the best solution.
// It would be interesting to study the above-mentioned XSLT approach
chilElement.setAttribute("uid", uid);

// Compute node value
StringBuffer buffer = new StringBuffer();
NodeList childNodes = childElement.getChildNodes();
boolean found = false;
for (int i = 0; i < childNodes.getLength(); i++) {
Node node = childNodes.item(i);
if (node.getNodeType() == Node.TEXT_NODE) {
String value = node.getNodeValue();
buffer.append(value);
found = true;
}
}

if (found) {
nodeInformation.setValue(buffer.toString());
}
}

And:

// Finds all the child elements and adds them to the parent node recursively
private void traverse(DefaultMutableTreeNode parentTreeNode, Node parentXMLElement) {
NodeList childElements = parentXMLElement.getChildNodes();
for(int i=0; i<childElements.getLength(); i++) {
Node childElement = childElements.item(i);
if (childElement.getNodeType() == Node.ELEMENT_NODE) {
DefaultMutableTreeNode childTreeNode =
new DefaultMutableTreeNode
(getNodeInformation(childElement));
parentTreeNode.add(childTreeNode);
traverse(childTreeNode, childElement);
}
}
}

Although the NamespaceContext implementation you provided looks fine, please, at a first step, try something simpler, to minimize the possibility of error. See the provided implementation below.

Then, your updateXMLData method should looks like:

public boolean updateXmlData(JTree tree, org.w3c.dom.Document doc, TreeNode parentNode, String oldValue, String newValue) throws XPathExpressionException {
boolean changed = false;

TreePath selectedPath = tree.getSelectionPath();
int count = getPathCount();
StringBuilder sb = new StringBuilder();
NodeInformation lastNodeInformation;
if (count > 0) {
for (int i = 1; i < trp.getPathCount(); i++) {
DefaultMutableTreeNode treeNode = (DefaultMutableTreeNode) trp.getPathComponent(i);
NodeInformation nodeInformation = (NodeInformation) treeNode.getUserObject();
sb.append(String.format("/ns:%s[@uid='%s']", nodeInformation.getName(), nodeInformation.getUid());
lastNodeInformation = nodeInformation;
}
}

System.out.println("Constructed XPath Query:" + sb.toString());

// Although the `NamespaceContext` implementation you provided looks
// fine, please, at a first step, try something simpler, to minimize the
// possibility of error. For example:
NamespaceContext nsContext = new NamespaceContext() {

public String getNamespaceURI(String prefix) {
if (prefix == null) {
throw new IllegalArgumentException("No prefix provided!");
} else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
return "http://pds.nasa.gov/pds4/pds/v1";
} else if (prefix.equals("ns")) {
return "http://pds.nasa.gov/pds4/pds/v1";
} else {
return XMLConstants.NULL_NS_URI;
}
}

public String getPrefix(String namespaceURI) {
// Not needed in this context.
return null;
}

public Iterator getPrefixes(String namespaceURI) {
// Not needed in this context.
return null;
}

};

//new xpath instance
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
// As the parser is namespace aware, we can safely use XPath namespaces
xpath.setNamespaceContext(nsContext);

Node node = (Node) xpath.evaluate(sb.toString(), doc, XPathConstants.NODE);
boolean changed = node != null;
if (changed) {
// See https://docs.oracle.com/javase/9/docs/api/org/w3c/dom/Node.html#setTextContent-java.lang.String-
node.setTextContent(newValue);
SingleTask.currentTask.setDoc(doc);
// Probably the information has been updated in the node, but just in case:
lastNodeInformation.setValue(newValue);
}

return changed;
}

The generated XPath expression will look like:

/ns:Ingest_LDD[@uid='w1ab1']/ns:Property_Maps[@uid='w1ab1a']/ns:identifier[@uid='w1ab1aq']

If you want to use the default namespace, you can also try with:

/:Ingest_LDD[@uid='w1ab1']/:Property_Maps[@uid='w1ab1a']/:identifier[@uid='w1ab1aq']

Please, be aware that I haven't tested the code, but I hope you get the idea.

Just for clarification, in order to give you a proper answer, as mentioned before, if you now remove or comment this line of code:

builderFactory.setNamespaceAware(true);

Then, the XPath expression:

/ns:Ingest_LDD[@uid='w1ab1']/ns:Property_Maps[@uid='w1ab1a']/ns:identifier[@uid='w1ab1aq']

will no longer find the required node. Now, if you remove the namespace information from the XPath expression:

/Ingest_LDD[@uid='w1ab1']/Property_Maps[@uid='w1ab1a']/identifier[@uid='w1ab1aq']

It will find the right node again.

Java, XPath Expression to read all node names, node values, and attributes

//* will select all element nodes, //@* all attribute nodes. However, an element node does not have a meaningful node value in the DOM, so you would need to read out getTextContent() instead of getNodeValue.

As you seem to consider an element with child elements to have a "null" value I think you need to check whether there are any child elements:

    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
docBuilderFactory.setNamespaceAware(true);

DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();

Document doc = docBuilder.parse("sampleInput1.xml");

XPathFactory fact = XPathFactory.newInstance();
XPath xpath = fact.newXPath();

NodeList allElements = (NodeList)xpath.evaluate("//*", doc, XPathConstants.NODESET);

ArrayList<String> elementNames = new ArrayList<>();
ArrayList<String> elementValues = new ArrayList<>();

for (int i = 0; i < allElements.getLength(); i++)
{
Node currentElement = allElements.item(i);
elementNames.add(i, currentElement.getLocalName());
elementValues.add(i, xpath.evaluate("*", currentElement, XPathConstants.NODE) != null ? null : currentElement.getTextContent());
}

for (int i = 0; i < elementNames.size(); i++)
{
System.out.println("Name: " + elementNames.get(i) + "; value: " + (elementValues.get(i)));
}

For the sample input

<Data xmlns="Somenamespace.nsc">
<Test>blah</Test>
<Foo>bar</Foo>
<Date id="2">12242016</Date>
<Phone>
<Home>5555555555</Home>
<Mobile>5555556789</Mobile>
</Phone>
</Data>

the output is

Name: Data; value: null
Name: Test; value: blah
Name: Foo; value: bar
Name: Date; value: 12242016
Name: Phone; value: null
Name: Home; value: 5555555555
Name: Mobile; value: 5555556789

how to get a node value in Xpath - Java

The xpath expression to get that node is

//entry/link/@href

In java you can write

Document doc = ... // your XML document
XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//entry/link/@href");
String href = xp.evaluate(doc);

Then if you need to get the link value of the entry with a specific id you can change the xpath expression to

//entry[id='tag:example.com,2005:Release/343597']/link/@href

Finally if you want to get all the links in the documents, if the document has many entry elements you can write

Document doc = ... // your XML document
XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//entry/link/@href");
NodeList links = (NodeList) xp.evaluate(doc, XPathConstants.NODESET);
// and iterate on links


Related Topics



Leave a reply



Submit