Getting the siblings of a node with Nokogiri
require 'nokogiri'
doc = Nokogiri::XML.parse(File.open('info.xml'))
details = doc.css('details').find{|node| node.css('id').text == "5678"}
email = details.css('email').text # => "zzzz@zzz.com"
images = details.css('image').map(&:text) # => ["images/4.jpg", "images/5.jpg"]
Update: There are shorter, arguably better, ways to grab the details
node you want:
details = doc.at('details:has(id[text()="5678"])')
or
details = doc.search('id[text()="5678"] ~ *')
Those are both courtesy of pguardiario.
How to get siblings' child according to specific defined sibling content
You can use XPath following-sibling
axis for this purpose assuming the target element always located after role
:
doc.xpath('//comic').each do |main_element|
main_element.xpath("mainsection/credits/credit/role[@id='dfWriter']").each do |n|
writer << n.xpath('following-sibling::person/displayname').text
end
main_element.xpath("mainsection/credits/credit/role[@id='dfPenciler']").each do |n|
penciler << n.xpath('following-sibling::person/displayname').text
end
end
Or you can just iterate through credit
instead of role
in the first place :
doc.xpath('//comic').each do |main_element|
main_element.xpath("mainsection/credits/credit[role/@id='dfWriter']").each do |n|
writer << n.xpath('person/displayname').text
end
main_element.xpath("mainsection/credits/credit[role/@id='dfPenciler']").each do |n|
penciler << n.xpath('person/displayname').text
end
end
CSS/Xpath sibling selector in Nokogiri
The problem is actually with your XPath for getting the the surname and given name, i.e., the XPath is incorrect for the lines:
puts surname = corrdetails.xpath( "//surname" ).text
puts givennames = corrdetails.xpath("//given-names").text
Starting the XPath with //
means to look for the node anywhere in the document. You only want to look within the corrdetails
node, which means the XPath needs to start with a dot, e.g., .//
.
Change the two lines to:
puts surname = corrdetails.xpath( ".//surname" ).text
puts givennames = corrdetails.xpath(".//given-names").text
Using Nokogiri to find element before another element
Nokogiri allows you to use xpath expressions to locate an element:
categories = []
doc.xpath("//li").each do |elem|
categories << elem.parent.xpath("preceding-sibling::h2").last.text
end
categories.uniq!
p categories
The first part looks for all "li" elements, then inside, we look for the parent (ul, ol), the for an element before (preceding-sibling) which is an h2. There can be more than one, so we take the last (ie, the one closest to the current position).
We need to call "uniq!" as we get the h2 for each 'li' (as the 'li' is the starting point).
Using your own HTML example, this code output:
["Destinations", "Shopping List"]
XPath to find all following siblings up until the next sibling of a particular type
One possible solution:
dl.xpath('dt').each_with_index do |dt, i|
dds = dt.xpath("following-sibling::dd[not(../dt[#{i + 2}]) or " +
"following-sibling::dt[1]=../dt[#{i + 2}]]")
puts "#{dt.text}: #{dds.map(&:text).join(', ')}"
end
This relies on a value comparison of dt
elements and will fail when there are duplicates. The following (much more complicated) expression does not depend on unique dt
values:
following-sibling::dd[not(../dt[$n]) or
(following-sibling::dt[1] and count(following-sibling::dt[1]|../dt[$n])=1)]
Note: Your use of self
fails because you're not properly using it as an axis (self::
). Also, self
always contains just the context node, so it would refer to each dd
inspected by the expression, not back to the original dt
Use XPath to group siblings from an HTML/XML document?
Updated Answer
Here's a general solution that creates a hierarchy of <section>
elements based on header levels and their following siblings:
class Nokogiri::XML::Node
# Create a hierarchy on a document based on heading levels
# wrap : e.g. "<section>" or "<div class='section'>"
# stops : array of tag names that stop all sections; use nil for none
# levels : array of tag names that control nesting, in order
def auto_section(wrap='<section>', stops=%w[hr], levels=%w[h1 h2 h3 h4 h5 h6])
levels = Hash[ levels.zip(0...levels.length) ]
stops = stops && Hash[ stops.product([true]) ]
stack = []
children.each do |node|
unless level = levels[node.name]
level = stops && stops[node.name] && -1
end
stack.pop while (top=stack.last) && top[:level]>=level if level
stack.last[:section].add_child(node) if stack.last
if level && level >=0
section = Nokogiri::XML.fragment(wrap).children[0]
node.replace(section); section << node
stack << { :section=>section, :level=>level }
end
end
end
end
Here is this code in use, and the result it gives.
The original HTML
<body>
<h1>Main Section 1</h1>
<p>Intro</p>
<h2>Subhead 1.1</h2>
<p>Meat</p><p>MOAR MEAT</p>
<h2>Subhead 1.2</h2>
<p>Meat</p>
<h3>Caveats</h3>
<p>FYI</p>
<h4>ProTip</h4>
<p>Get it done</p>
<h2>Subhead 1.3</h2>
<p>Meat</p>
<h1>Main Section 2</h1>
<h3>Jumpin' in it!</h3>
<p>Level skip!</p>
<h2>Subhead 2.1</h2>
<p>Back up...</p>
<h4>Dive! Dive!</h4>
<p>...and down</p>
<hr /><p id="footer">Copyright © All Done</p>
</body>
The conversion code
# Use XML only so that we can pretty-print the results; HTML works fine, too
doc = Nokogiri::XML(html,&:noblanks) # stripping whitespace allows indentation
doc.at('body').auto_section # make the magic happen
puts doc.to_xhtml # show the result with indentation
The result
<body>
<section>
<h1>Main Section 1</h1>
<p>Intro</p>
<section>
<h2>Subhead 1.1</h2>
<p>Meat</p>
<p>MOAR MEAT</p>
</section>
<section>
<h2>Subhead 1.2</h2>
<p>Meat</p>
<section>
<h3>Caveats</h3>
<p>FYI</p>
<section>
<h4>ProTip</h4>
<p>Get it done</p>
</section>
</section>
</section>
<section>
<h2>Subhead 1.3</h2>
<p>Meat</p>
</section>
</section>
<section>
<h1>Main Section 2</h1>
<section>
<h3>Jumpin' in it!</h3>
<p>Level skip!</p>
</section>
<section>
<h2>Subhead 2.1</h2>
<p>Back up...</p>
<section>
<h4>Dive! Dive!</h4>
<p>...and down</p>
</section>
</section>
</section>
<hr />
<p id="footer">Copyright All Done</p>
</body>
Original Answer
Here's an answer using no XPath, but Nokogiri. I've taken the liberty of making the solution somewhat flexible, handling arbitrary start/stops (but not nested sections).
html = "<h2>Header</h2>
<p>First paragraph</p>
<p>Second paragraph</p>
<h2>Second header</h2>
<p>Third paragraph</p>
<p>Fourth paragraph</p>
<hr>
<p id='footer'>All done!</p>"
require 'nokogiri'
class Nokogiri::XML::Node
# Provide a block that returns:
# true - for nodes that should start a new section
# false - for nodes that should not start a new section
# :stop - for nodes that should stop any current section but not start a new one
def group_under(name="section")
group = nil
element_children.each do |child|
case yield(child)
when false, nil
group << child if group
when :stop
group = nil
else
group = document.create_element(name)
child.replace(group)
group << child
end
end
end
end
doc = Nokogiri::HTML(html)
doc.at('body').group_under do |node|
if node.name == 'hr'
:stop
else
%w[h1 h2 h3 h4 h5 h6].include?(node.name)
end
end
puts doc
#=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
#=> <html><body>
#=> <section><h2>Header</h2>
#=> <p>First paragraph</p>
#=> <p>Second paragraph</p></section>
#=>
#=> <section><h2>Second header</h2>
#=> <p>Third paragraph</p>
#=> <p>Fourth paragraph</p></section>
#=>
#=> <hr>
#=> <p id="footer">All done!</p>
#=> </body></html>
For XPath, see XPath : select all following siblings until another sibling
Related Topics
Is There an Equivalent of Array#Find_Index for the Last Index in Ruby
How to Make Ruby's Restclient Gem Respect Content_Type on Post
Finding the Ip Address of a Domain
Rvm Error While Running Make Install. Error Comes While Installing Power_Assert Gem
Error Installing Rubymine, No Sdk Specified, But It Is Listed
How to Get the Destination Url of a Shortened Url Using Ruby
Ruby: What Is the Order of Keys/Values Returned by Hash.Keys and Hash.Values Methods
How to Perform Vector Addition in Ruby
Ruby 1.9 Doesn't Support Unicode Normalization Yet
Rake Aborted! Stack Level Too Deep
How to Handle Utf-8 Email Headers (Like Subject:) Using Ruby
How to Increment/Decrement a Character in Ruby for All Possible Values
Rails: Violates Foreign Key Constraint
Heroku: How to Push to Specific App If You Have Multiple Apps in Heroku
Call Before Methods in Model on Ruby
Get, or Calculate the Entropy of an Image with Ruby and Imagemagick