How to Access Owl Documents Using Xpath in Java

How to access OWL documents using XPath in Java?

as xpath does not know the namespaces you are using.
try using:

"/*[local-name()='RDF']/*[local-name()='Ontology']/*[local-name()='label']/text()"

local name will ignore the namespaces and will work (for the first instance of this that it finds)

XPath, XML Namespaces and Java

Aha, I tried to debug your expression + got it to work. You missed a few things. This XPath expression should do it:

/XFDL/globalpage/global/xmlmodel/instances/instance/form_metadata/title/documentnbr/@number
  1. You need to include the root element (XFDL in this case)
  2. I didn't end up needing to use any namespaces in the expression for some reason. Not sure why. If this is the case, then the NamespaceContext.getNamespaceURI() never gets called. If I replace instance with xforms:instance then getNamespaceURI() gets called once with xforms as the input argument, but the program throws an exception.
  3. The syntax for attribute values is @attr, not [attr].

My complete sample code:

import java.io.File;
import java.io.IOException;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import javax.xml.XMLConstants;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class XPathNamespaceExample {
static public class MyNamespaceContext implements NamespaceContext {
final private Map<String, String> prefixMap;
MyNamespaceContext(Map<String, String> prefixMap)
{
if (prefixMap != null)
{
this.prefixMap = Collections.unmodifiableMap(new HashMap<String, String>(prefixMap));
}
else
{
this.prefixMap = Collections.emptyMap();
}
}
public String getPrefix(String namespaceURI) {
// TODO Auto-generated method stub
return null;
}
public Iterator getPrefixes(String namespaceURI) {
// TODO Auto-generated method stub
return null;
}
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Invalid Namespace Prefix");
else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("custom".equals(prefix))
return "http://www.PureEdge.com/XFDL/Custom";
else if ("designer".equals(prefix))
return "http://www.PureEdge.com/Designer/6.1";
else if ("pecs".equals(prefix))
return "http://www.PureEdge.com/PECustomerService";
else if ("xfdl".equals(prefix))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("xforms".equals(prefix))
return "http://www.w3.org/2003/xforms";
else
return XMLConstants.NULL_NS_URI;
}

}

protected static final String QUERY_FORM_NUMBER =
"/XFDL/globalpage/global/xmlmodel/xforms:instances/instance" +
"/form_metadata/title/documentnbr[number]";

public static void main(String[] args) {
try
{
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.parse(new File(args[0]));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/@sid"));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/global/xmlmodel/instances/instance/@id" ));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/global/xmlmodel/instances/instance/form_metadata/title/documentnbr/@number" ));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
}

private static String extractNodeValue(Document doc, String expression) {
try{

XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new MyNamespaceContext(null));

Node result = (Node)xPath.evaluate(expression, doc, XPathConstants.NODE);
if(result != null) {
return result.getNodeValue();
} else {
throw new RuntimeException("can't find expression");
}

} catch (XPathExpressionException err) {
throw new RuntimeException(err);
}
}
}

How to query XML using namespaces in Java with XPath?

In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.

The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.

However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.

You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri(). For example:

/*[local-name()='workbook'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheets'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheet'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]

As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).

You could also just match on the local-name() of the element and ignore the namespace. For example:

/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]

However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name(), your XPath could match on the wrong elements and select the wrong content:

How to extract RDF triples from XML file using an existing ontology?

Forget about XPath to extract triples, it way easier and less problematic with Jena.

You can use the interface SimpleSelector together with model.listStatements from Jena.

In this example I am using SimpleSelector to find all the triples with a single property but you can implement the any search you need by customizing the method selects.

FileManager fManager = FileManager.get();
Model model = fManager.loadModel("some_file.rdf");

Property someRelevantProperty =
model. createProperty("http://your.data.org/ontology/",
"someRelevantProperty");

SimpleSelector selector = new SimpleSelector(null, null, (RDFNode)null) {
public boolean selects(Statement s)
{ return s.getPredicate().equals(someRelevantProperty);}
}

StmtIterator iter = model.listStatements(selector);
while(it.hasNext()) {
Statement stmt = iter.nextStatement();
System.out.print(stmt.getSubject().toString());
System.out.print(stmt.getPredicate().toString());
System.out.println(stmt.getObject().toString());
}

You'll find more information here.

If you describe a bit more the ontology you are using and the type of search you need we might be able to help more.

Accessing RDF/XML/OWL file nodes using Perl

Parsing RDF as if it were XML is a folly. The exact same data can appear in many different ways. For example, all of the following RDF files carry the same data. Any conforming RDF implementation MUST handle them identically...

<!-- example 1 -->
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="#me">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" />
<foaf:name>Toby Inkster</foaf:name>
</rdf:Description>
</rdf:RDF>

<!-- example 2 -->
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:Person rdf:about="#me">
<foaf:name>Toby Inkster</foaf:name>
</foaf:Person>
</rdf:RDF>

<!-- example 3 -->
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:Person rdf:about="#me" foaf:name="Toby Inkster" />
</rdf:RDF>

<!-- example 4 -->
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="">
<rdf:Description rdf:about="#me"
rdf:type="http://xmlns.com/foaf/0.1/Person"
foaf:name="Toby Inkster" />
</rdf:RDF>

<!-- example 5 -->
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:ID="me">
<rdf:type>
<rdf:Description rdf:about="http://xmlns.com/foaf/0.1/Person" />
</rdf:type>
<foaf:name>Toby Inkster</foaf:name>
</rdf:Description>
</rdf:RDF>

<!-- example 6 -->
<foaf:Person
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
rdf:about="#me"
foaf:name="Toby Inkster" />

I could easily list half a dozen other variations too, but I'll stop there. And this RDF file contains just two statements - I'm a Person; my name is "Toby Inkster" - the OP's data contains over 50,000 statements.

And this is just the XML serialization of RDF; there are other serializations too.

If you try handling all that with XPath, you're likely to end up becoming a lunatic locked away in a tower somewhere, muttering in his sleep about the triples; the triples...

Luckily, Greg Williams has taken that mental health bullet for you. RDF::Trine and RDF::Query are not only the best RDF frameworks for Perl; they're amongst the best in any programming language.

Here is how the OP's task could be achieved using RDF::Trine and RDF::Query:

#!/usr/bin/env perl

use v5.12;
use RDF::Trine;
use RDF::Query;

my $model = 'RDF::Trine::Model'->new(
'RDF::Trine::Store::DBI'->new(
'vo',
'dbi:SQLite:dbname=/tmp/vo.sqlite',
'', # no username
'', # no password
),
);

'RDF::Trine::Parser::RDFXML'->new->parse_url_into_model(
'http://svn.code.sf.net/p/vaccineontology/code/trunk/src/ontology/VO.owl',
$model,
) unless $model->size > 0;

my $query = RDF::Query->new(<<'SPARQL');
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?super_label ?sub_label
WHERE {
?sub rdfs:subClassOf ?super .
?sub rdfs:label ?sub_label .
?super rdfs:label ?super_label .
}
LIMIT 5
SPARQL

print $query->execute($model)->as_string;

Sample output:

+----------------------------+----------------------------------+
| super_label | sub_label |
+----------------------------+----------------------------------+
| "Aves vaccine" | "Ducks vaccine" |
| "route of administration" | "intravaginal route" |
| "Shigella gene" | "aroA from Shigella" |
| "Papillomavirus vaccine" | "Bovine papillomavirus vaccine" |
| "virus protein" | "Feline leukemia virus protein" |
+----------------------------+----------------------------------+

UPDATE: Here's a SPARQL query that can be plugged into the script above to retrieve the data you wanted:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT ?subclass ?label
WHERE {
?subclass
rdfs:subClassOf obo:VO_0000001 ;
rdfs:label ?label .
}

JDOM2 xpath finding nodes within a different namespace

In your XML, dc prefix mapped to the namespace uri http://purl.org/dc/elements/1.1/, so make sure you declared the namespace prefix mapping to be used in the XPath accordingly. This is part where the namespace prefix declare in your XML :

<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

XML parser only see the namespace explicitly declared in the XML, it won't try to open the namespace URL since namespace is not necessarily a URL. For example, the following URI which I found in this recent SO question is also acceptable for namespace : uuid:ebfd9-45-48-a9eb-42d

Namespace prefix not declared error after extracting a node in OWL/XML file with Java & xPath

Is there any other way to do like this job?

Yes, there are other,more appropriate, ways to do this job.

It's typically not a great idea to try to process RDF documents using XML tools, since the same RDF graph can often be represented a number of different ways in RDF/XML. This is discussed in more detail in my answer to How to access OWL documents using XPath in Java?, but we can see the issue pretty quickly here. After adding some additional namespace declarations your data looks like this:

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:Situation="https://stackoverflow.com/q/22170071/1281433/"
xmlns:owl="http://www.w3.org/2002/07/owl#">
<owl:Class/>
<owl:Class/>
<owl:ObjectProperty/>
<Situation:Situation rdf:about="http://localhost/rdf#situa0">
<Situation:composedBy></Situation:composedBy>
</Situation:Situation>
</rdf:RDF>

The same RDF graph can be serialized like this, too:

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:Situation="https://stackoverflow.com/q/22170071/1281433/"
xmlns:owl="http://www.w3.org/2002/07/owl#" >
<rdf:Description rdf:nodeID="A0">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
</rdf:Description>
<rdf:Description rdf:about="http://localhost/rdf#situa0">
<rdf:type rdf:resource="https://stackoverflow.com/q/22170071/1281433/Situation"/>
<Situation:composedBy></Situation:composedBy>
</rdf:Description>
<rdf:Description rdf:nodeID="A1">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A2">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
</rdf:Description>
</rdf:RDF>

If you're looking for a Situation:Situation element, you'll find one in the first serialization, but not the second, even though they're the same RDF graph.

You could probably use a SPARQL query to get what you're looking for. The typical implementation of describe queries might do what you want. E.g., the very simple query

describe <http://localhost/rdf#situa0>

produces this result (in RDF/XML):

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:Situation="https://stackoverflow.com/q/22170071/1281433/"
xmlns:owl="http://www.w3.org/2002/07/owl#">
<Situation:Situation rdf:about="http://localhost/rdf#situa0">
<Situation:composedBy></Situation:composedBy>
</Situation:Situation>
</rdf:RDF>

Alternatively, you could ask for everything that has the type Situation:Situation:

prefix s: <https://stackoverflow.com/q/22170071/1281433/>
describe ?situation where {
?situation a s:Situation .
}
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:s="https://stackoverflow.com/q/22170071/1281433/"
xmlns:owl="http://www.w3.org/2002/07/owl#">
<s:Situation rdf:about="http://localhost/rdf#situa0">
<s:composedBy></s:composedBy>
</s:Situation>
</rdf:RDF>

The important point here is to use an appropriate query language for the type of data that you have. You have RDF, which is a graph-based data representation. An RDF graph is a set of triples. Your data is five triples:

_:BX2D6970b66dX3A1448f4e1bcfX3AX2D7ffe <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://localhost/rdf#situa0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://stackoverflow.com/q/22170071/1281433/Situation> .
<http://localhost/rdf#situa0> <https://stackoverflow.com/q/22170071/1281433/composedBy> "" .
_:BX2D6970b66dX3A1448f4e1bcfX3AX2D7ffd <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#ObjectProperty> .
_:BX2D6970b66dX3A1448f4e1bcfX3AX2D7fff <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .

In the Turtle serialization, the graph is:

@prefix owl:   <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix Situation: <https://stackoverflow.com/q/22170071/1281433/> .

[ a owl:Class ] .

<http://localhost/rdf#situa0>
a Situation:Situation ;
Situation:composedBy "" .

[ a owl:Class ] .

[ a owl:ObjectProperty ] .

You should use SPARQL (the standard RDF query language) or an RDF-based API for extracting data from RDF documents.

How to get values of child node from RDF file using xpath PHP

The short answer is that you really shouldn't try to access the RDF with XPath. "Solutions" based on the RDF/XML serialization of an RDF graph are very brittle, because the same RDF graph can be serialized as many different RDF/XML documents. It's different XML, but the same RDF. See, for instance my answer to How to access OWL documents using XPath in Java? If you insist, though, the accepted answer to that question may help you. I'd suggest that instead you use dedicated RDF tools.

At the moment, I can't help much with the PHP side of things, although it appears that there's a library called EasyRDF that may let you run SPARQL queries against your data. Coming up with the SPARQL query I can help you with. RDF is a graph-based data representation. The fundamental "thing" is the triple, which is just a three-tuple of the form (subject, predicate, object). We treat that as a directed edge from subject to object, labeled by predicate.

RDF/XML is just one representation of it. It's handy because there are so many XML processing tools, but it's inconvenient because it doesn't make the triples very clear, and it's not easy to read as plain text, or to write by hand. If we convert your data to N-Triples, which is a format that just puts one triple per line, it looks like this (just a part of it):

_:BX2D39ae9d40X3A1468ac2fcd1X3AX2D7ff9 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "text/plain; charset=iso-8859-1"^^<http://purl.org/dc/terms/IMT> .
_:BX2D39ae9d40X3A1468ac2fcd1X3AX2D7ff9 <http://purl.org/dc/dcam/memberOf> <http://purl.org/dc/terms/IMT> .
<http://www.gutenberg.org/2009/agents/1609> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.gutenberg.org/2009/pgterms/agent> .
<http://www.gutenberg.org/2009/agents/1609> <http://www.gutenberg.org/2009/pgterms/webpage> <http://en.wikipedia.org/wiki/August_Strindberg> .
<http://www.gutenberg.org/2009/agents/1609> <http://www.gutenberg.org/2009/pgterms/deathdate> "1912"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.gutenberg.org/2009/agents/1609> <http://www.gutenberg.org/2009/pgterms/birthdate> "1849"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.gutenberg.org/2009/agents/1609> <http://www.gutenberg.org/2009/pgterms/alias> "Strindberg, Johan August" .
<http://www.gutenberg.org/2009/agents/1609> <http://www.gutenberg.org/2009/pgterms/name> "Strindberg, August" .

That's very easy to write, but it's hard to read, and it's hard to see the graph structure. The Turtle serialization is very nice because it's easy to read and write, and it makes the graph structure more apparent, and it's very similar to the SPARQL query language syntax. The part about August Strindberg in Turtle is:

<http://www.gutenberg.org/2009/agents/1609>
a pgterms:agent ;
pgterms:alias "Strindberg, Johan August" ;
pgterms:birthdate 1849 ;
pgterms:deathdate 1912 ;
pgterms:name "Strindberg, August" ;
pgterms:webpage <http://en.wikipedia.org/wiki/August_Strindberg> .

Now, it sounds like what you've actually got is one RDF file per ebook, and you're looking for the creator information about the ebook. Here's a query that will get the pgterms:name property of each author for each ebook in the document. Of course, if you expect there to be only one ebook description in the file, you could select just the name (i.e., select ?name where …) instead of select ?ebook ?name where ….

prefix dcterms: <http://purl.org/dc/terms/>
prefix pgterms: <http://www.gutenberg.org/2009/pgterms/>

select ?ebook ?name where {
?ebook a pgterms:ebook ;
dcterms:creator ?creator .
?creator pgterms:name ?name .
}
------------------------------------------------------------------
| ebook | name |
==================================================================
| <http://www.gutenberg.org/ebooks/45916> | "Strindberg, August" |
------------------------------------------------------------------

Now, it's pretty clear that this data is coming from Project Gutenberg, in which case you may also find why sparql query below do not return cartesian product useful. It's got some more examples of SPARQL queries against Project Gutenberg data. It's also got some discussion about the differences between the new and the legacy RDF representations of the data, but it looks like you're already using the new representation, so that's not as important. In fact, the final query in that question is similar to this one, and uses property paths, which are actually kind of like XPaths, and sort of like regular expressions. You can simplify the query above using property paths as:

prefix dcterms: <http://purl.org/dc/terms/>
prefix pgterms: <http://www.gutenberg.org/2009/pgterms/>

select ?ebook ?name where {
?ebook a pgterms:ebook ;
dcterms:creator/pgterms:name ?name .
}

Java How to extract a complete XML block

Adding to lwburk's solution, to convert a DOM Node to string form, you can use a Transformer:

private static String nodeToString(Node node)
throws TransformerException
{
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return(buf.toString());
}

Complete example:

public static void main(String... args)
throws Exception
{
String xml = "<A><B><id>0</id></B><B><id>1</id></B></A>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc = dbf.newDocumentBuilder().parse(new InputSource(new StringReader(xml)));

XPath xPath = XPathFactory.newInstance().newXPath();
Node result = (Node)xPath.evaluate("A/B[id = '1']", doc, XPathConstants.NODE);

System.out.println(nodeToString(result));
}

private static String nodeToString(Node node)
throws TransformerException
{
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return(buf.toString());
}


Related Topics



Leave a reply



Submit