How to Query Xml Using Namespaces in Java with Xpath

How to query XML using namespaces in Java with XPath?

In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.

The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.

However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.

You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri(). For example:

/*[local-name()='workbook'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheets'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheet'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]

As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).

You could also just match on the local-name() of the element and ignore the namespace. For example:

/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]

However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name(), your XPath could match on the wrong elements and select the wrong content:

XPath with namespace in Java

  1. Short answer: use XPath local-name(). Like this: xPathFactory.newXPath().compile("//*[local-name()='requestURL']/text()"); will return /CAMERA/Streaming/status
  2. Or you can implement a NamespaceContext that maps namespaces names and URIs and set it on the XPath object before querying.
  3. Take a look at this blog article, Update: the article is down, you can see it on webarchive

Solution 1 sample:

XPath xpath = XPathFactory.newInstance().newXPath();
String responseStatus = xpath.evaluate("//*[local-name()='ResponseStatus']/text()", document);
System.out.println("-> " + responseStatus);

Solution 2 sample:

// load the Document
Document document = ...;
NamespaceContext ctx = new NamespaceContext() {
public String getNamespaceURI(String prefix) {
return prefix.equals("urn") ? "urn:camera-org" : null;
}
public Iterator getPrefixes(String val) {
return null;
}
public String getPrefix(String uri) {
return null;
}
};
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(ctx);
String responseStatus = xpath.evaluate("//urn:ResponseStatus/text()", document);
System.out.println("-> " + responseStatus);

Edit

This is a complete example, it correctly retrieve the element:

String xml = "<urn:ResponseStatus version=\"1.0\" xmlns:urn=\"urn:camera-org\">\r\n" + //
"\r\n" + //
"<urn:requestURL>/CAMERA/Streaming/status</urn:requestURL>\r\n" + //
"<urn:statusCode>4</urn:statusCode>\r\n" + //
"<urn:statusString>Invalid Operation</urn:statusString>\r\n" + //
"<urn:id>0</urn:id>\r\n" + //
"\r\n" + //
"</urn:ResponseStatus>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new java.io.ByteArrayInputStream(xml.getBytes()));
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
return prefix.equals("urn") ? "urn:camera-org" : null;
}

public Iterator<?> getPrefixes(String val) {
return null;
}

public String getPrefix(String uri) {
return null;
}
});
XPathExpression expr = xpath.compile("//urn:ResponseStatus");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
Node currentItem = nodes.item(i);
System.out.println("found node -> " + currentItem.getLocalName() + " (namespace: " + currentItem.getNamespaceURI() + ")");
}

How does XPath deal with XML namespaces?

Defining namespaces in XPath (recommended)

XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.

It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.


Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.

(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)

C#:

XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(@"/i:IntuitResponse/i:QueryResponse", nsmgr);

Java (SAX):

NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");

Java (XPath):

xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
  • Remember to call
    DocumentBuilderFactory.setNamespaceAware(true).
  • See also:
    Java XPath: Queries with default namespace xmlns

JavaScript:

See Implementing a User Defined Namespace Resolver:

function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );

Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().

Perl (LibXML):

my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my @nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');

Python (lxml):

from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})

Python (ElementTree):

namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)

Python (Scrapy):

response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()

PhP:

Adapted from @Tomalak's answer using DOMDocument:

$result = new DOMDocument();
$result->loadXML($xml);

$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");

$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");

See also @IMSoP's canonical Q/A on PHP SimpleXML namespaces.

Ruby (Nokogiri):

puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")

Note that Nokogiri supports removal of namespaces,

doc.remove_namespaces!

but see the below warnings discouraging the defeating of XML namespaces.

VBA:

xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")

VB.NET:

xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)

SoapUI (doc):

declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse

xmlstarlet:

-N i="http://schema.intuit.com/finance/v3"

XSLT:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...

Once you've declared a namespace prefix, your XPath can be written to use it:

/i:IntuitResponse/i:QueryResponse


Defeating namespaces in XPath (not recommended)

An alternative is to write predicates that test against local-name():

/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']

Or, in XPath 2.0:

/*:IntuitResponse/*:QueryResponse

Skirting namespaces in this manner works but is not recommended because it

  • Under-specifies the full element/attribute name.

  • Fails to differentiate between element/attribute names in different
    namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly1:

     /*[    namespace-uri()='http://schema.intuit.com/finance/v3' 
    and local-name()='IntuitResponse']
    /*[ namespace-uri()='http://schema.intuit.com/finance/v3'
    and local-name()='QueryResponse']

    1Thanks to Daniel Haley for the namespace-uri() note.

  • Is excessively verbose.

Parsing XML with multiple namespaces with xPath in Java

Your XML elements are bound to the namespace http://iptc.org/std/nar/2006-10-01/, but your XPath is not using any namespace-prefixes, so /newsItem/itemMeta is asking for elements that are bound to no namespace.

You could address them by just the local-name():

/*[local-name()='newsItem']/*[local-name()='itemMeta']

Otherwise, you need to register the namespace with a namespace prefix, or use a custom NamespaceContext to resolve the namespace from your chosen namespace-prefix:

xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://iptc.org/std/nar/2006-10-01/";
// ...
}
});

and then use that namespace-prefix in your XPath:

/i:newsItem/i:itemMeta

How to parse xml with namespace using Xpath java

You need to make sure you use

DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);

if you want to use XPath on a DOM tree. I have not looked further for other problems.

Java XPath resolver for documents with namespaces

Your XML is not namespace-well-formed: It uses undefined namespace prefixes.

First fix your XML. Then fix your getNamespaceURI() method to return the right namespace URI for each used namespace prefix.

See How does XPath deal with XML namespaces? for an example of a working getNamespaceURI() method.

Parse XML with namespaces in Java using xpath

You need to set a NamespaceContext on the XPath:

Demo

package forum11644994;

import java.io.StringReader;
import java.util.Iterator;

import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.*;
import javax.xml.xpath.*;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

public class Demo {

public static void main(String[] args) throws Exception {
String xml = "<soapenv:Envelope xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:ser=\"http://services.web.post.list.com\"><soapenv:Header><authInfo xsi:type=\"soap:authentication\" xmlns:soap=\"http://list.com/services/SoapRequestProcessor\"><!--You may enter the following 2 items in any order--><username xsi:type=\"xsd:string\">dfasf@google.com</username><password xsi:type=\"xsd:string\">PfasdfRem91</password></authInfo></soapenv:Header></soapenv:Envelope>";
System.out.println(xml);
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xml)));
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(new NamespaceContext() {

@Override
public Iterator getPrefixes(String arg0) {
return null;
}

@Override
public String getPrefix(String arg0) {
return null;
}

@Override
public String getNamespaceURI(String arg0) {
if("soapenv".equals(arg0)) {
return "http://schemas.xmlsoap.org/soap/envelope/";
}
return null;
}
});
// XPath Query for showing all nodes value

try {
XPathExpression expr = xpath
.compile("/soapenv:Envelope/soapenv:Header/authInfo/password");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println("Got " + nodes.getLength() + " nodes");
// System.out.println(nodes.item(0).getNodeValue());
} catch (Exception E) {
System.out.println(E);
}

}
}

Output

<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ser="http://services.web.post.list.com"><soapenv:Header><authInfo xsi:type="soap:authentication" xmlns:soap="http://list.com/services/SoapRequestProcessor"><!--You may enter the following 2 items in any order--><username xsi:type="xsd:string">dfasf@google.com</username><password xsi:type="xsd:string">PfasdfRem91</password></authInfo></soapenv:Header></soapenv:Envelope>
Got 1 nodes

XPath, XML Namespaces and Java

Aha, I tried to debug your expression + got it to work. You missed a few things. This XPath expression should do it:

/XFDL/globalpage/global/xmlmodel/instances/instance/form_metadata/title/documentnbr/@number
  1. You need to include the root element (XFDL in this case)
  2. I didn't end up needing to use any namespaces in the expression for some reason. Not sure why. If this is the case, then the NamespaceContext.getNamespaceURI() never gets called. If I replace instance with xforms:instance then getNamespaceURI() gets called once with xforms as the input argument, but the program throws an exception.
  3. The syntax for attribute values is @attr, not [attr].

My complete sample code:

import java.io.File;
import java.io.IOException;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import javax.xml.XMLConstants;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class XPathNamespaceExample {
static public class MyNamespaceContext implements NamespaceContext {
final private Map<String, String> prefixMap;
MyNamespaceContext(Map<String, String> prefixMap)
{
if (prefixMap != null)
{
this.prefixMap = Collections.unmodifiableMap(new HashMap<String, String>(prefixMap));
}
else
{
this.prefixMap = Collections.emptyMap();
}
}
public String getPrefix(String namespaceURI) {
// TODO Auto-generated method stub
return null;
}
public Iterator getPrefixes(String namespaceURI) {
// TODO Auto-generated method stub
return null;
}
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Invalid Namespace Prefix");
else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("custom".equals(prefix))
return "http://www.PureEdge.com/XFDL/Custom";
else if ("designer".equals(prefix))
return "http://www.PureEdge.com/Designer/6.1";
else if ("pecs".equals(prefix))
return "http://www.PureEdge.com/PECustomerService";
else if ("xfdl".equals(prefix))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("xforms".equals(prefix))
return "http://www.w3.org/2003/xforms";
else
return XMLConstants.NULL_NS_URI;
}

}

protected static final String QUERY_FORM_NUMBER =
"/XFDL/globalpage/global/xmlmodel/xforms:instances/instance" +
"/form_metadata/title/documentnbr[number]";

public static void main(String[] args) {
try
{
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.parse(new File(args[0]));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/@sid"));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/global/xmlmodel/instances/instance/@id" ));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/global/xmlmodel/instances/instance/form_metadata/title/documentnbr/@number" ));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
}

private static String extractNodeValue(Document doc, String expression) {
try{

XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new MyNamespaceContext(null));

Node result = (Node)xPath.evaluate(expression, doc, XPathConstants.NODE);
if(result != null) {
return result.getNodeValue();
} else {
throw new RuntimeException("can't find expression");
}

} catch (XPathExpressionException err) {
throw new RuntimeException(err);
}
}
}

Java XPath: Queries with default namespace xmlns

In your Namespace context, bind a prefix of your choice (e.g. df) to the namespace URI in the document

xpath.setNamespaceContext( new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "df": return "http://xml.sap.com/2002/10/metamodel/webdynpro";
...
}
});

and then use that prefix in your path expressions to qualify element names e.g. /df:ModelClass/df:ModelClass.Parent/df:Core.Reference[@type = 'Model']/@package.



Related Topics



Leave a reply



Submit