Undefined Namespace Prefix in Nokogiri and Xpath

Undefined namespace prefix in Nokogiri and XPath

I'm not sure why, but it seems that you have to drop the namespace prefix to get the node:

xmlfeed.at_xpath("//totalresults")

Also note that I added the double forward slash, which scopes the search over the whole document (it won't work without it).

UPDATE:

Based on this answer: How do I get Nokogiri to understand my namespaces? I'd guess that the namespace (openSearch:totalResults) is not correctly declared as an attribute on the root node of the document, and hence Nokogiri is just ignoring it, which is why the selector above works but the namespaced one doesn't.

Avoiding Nokogiri::XML::XPath::SyntaxError: ERROR: Undefined namespace prefix

I ended up solving the problem by editing the XML file and adding the namespaces in the root. Here is an example:

  temp = Nokogiri::XML(@document_xml)
temp.root['xmlns:w'] = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
@doc = Nokogiri::XML(temp.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML))

Nokogiri/Xpath namespace query

All namespaces need to be registered when parsing. Nokogiri automatically registers namespaces on the root node. Any namespaces that are not on the root node you have to register yourself. This should work:

puts doc.xpath('//dc:title', 'dc' => "URI")

Alternately, you can remove namespaces altogether. Only do this if you are certain there will be no conflicting node names.

doc.remove_namespaces!
puts doc.xpath('//title')

How do I use xpath on nodes with a prefix but without a namespace?

The problem is that the namespace is not properly defined in the XML document. As a result, Nokogiri sees the node names as being "a:root" instead of "a" being a namespace and "root" being the node name:

xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
doc = Nokogiri::XML(xml)
puts doc.at_xpath('*').node_name
#=> "a:root"
puts doc.at_xpath('*').namespace
#=> ""

Solution 1 - Specify node name with colon

One solution is to search for nodes with the name "a:thing". You cannot do //a:thing since the XPath will treat the "a" as a namespace. You can get around this by doing //*[name()="a:thing"]:

xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
doc = Nokogiri::XML(xml)
things = doc.xpath('//*[name()="a:thing"]')
puts things
#=> <a:thing>stuff0</a:thing>
#=> <a:thing>stuff1</a:thing>

Solution 2 - Modify the XML document to define the namespace

An alternative solution is to modify the XML file that you get to properly define the namespace. The document will then behave with namespaces as expected:

xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
xml.gsub!('<a:root>', '<a:root xmlns:a="foo">')
doc = Nokogiri::XML(xml)
things = doc.xpath('//a:thing')
puts things
#=> <a:thing>stuff0</a:thing>
#=> <a:thing>stuff1</a:thing>

Syntax error about XPath in Nokogiri, when combining namespace and node()

Different from elements, you don't need to use a namespace prefix to match by node(). The following will return all nodes in any namespace just fine:

result = xml_doc.xpath("//node()")

There are several types of nodes in XPath, namely text node, comment node, element node, so on. node() is a node tests which simply returns true for any node type whatsoever. Compare to text() which is another type of node tests that returns true only for text nodes. (See "w3.org > Xpath > Node Tests")

In my understanding, the notion of local name and namespace are only exists in the context of element nodes, so using a namespace prefix along with the node() test simply doesn't make sense.

If you meant to select all elements in a specific namespace use * instead of node():

result = xml_doc.xpath("//x:*", 'x' => 'www.example.com')

How do I get Nokogiri to understand my namespaces?

It doesn't look like the namespaces in this document are correctly declared - there should be xmlns:samlp and xmlns:saml attributes on the root node. In cases like this, Nokogiri essentially ignores the namespaces (as it can't map them to URIs or URNs), so your XPath works if you remove them, i.e.

doc.xpath(XPATH_QUERY)

Splunk-client (with Nokogiri) giving Undefined Namespace Prefix

I found out the issue -- the splunk client wasn't authenticating properly, and so search was actually a broken SplunkJob object (with a nil username and authentication key). It's strange that there was no error raised until the wait command, but upon inspecting the search object, one of the fields stated that the object was malformed.

Remove nokogiri attribute based on namespace prefix

Node objects have a remove method that drops them from the tree, so you can write something like this:

require 'nokogiri'

doc = Nokogiri::XML(DATA)
puts '--- Before'
puts doc.to_s

doc.traverse do |node|
next unless node.respond_to? :attributes
node.attributes.each do |key, val|
val.remove if val&.namespace&.prefix == 'opf'
end
end

puts
puts '--- After'
puts doc.to_s

__END__
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="iden" opf:scheme="ISBN">xxxx</dc:identifier>
<dc:creator opf:role="aut" opf:file-as="Name">xxxx</dc:creator>
<dc:date opf:event="publication">xxxx</dc:date>
<dc:publisher>xxxx</dc:publisher>
<meta name="cover" content="x"/>
</metadata>

And see the following output:

➜  ~ ruby test.rb
--- Before
<?xml version="1.0"?>
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="iden" opf:scheme="ISBN">xxxx</dc:identifier>
<dc:creator opf:role="aut" opf:file-as="Name">xxxx</dc:creator>
<dc:date opf:event="publication">xxxx</dc:date>
<dc:publisher>xxxx</dc:publisher>
<meta name="cover" content="x"/>
</metadata>

--- After
<?xml version="1.0"?>
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="iden">xxxx</dc:identifier>
<dc:creator>xxxx</dc:creator>
<dc:date>xxxx</dc:date>
<dc:publisher>xxxx</dc:publisher>
<meta name="cover" content="x"/>
</metadata>

Note If the Ruby version you are using doesn't support &. you'll need to handle the namespace being potentially nil.



Related Topics



Leave a reply



Submit