Get text directly inside a tag in Nokogiri
To get all the direct children with text, but not any further sub-children, you can use XPath like so:
doc.xpath('//dt/text()')
Or if you wish to use search:
doc.search('dt').xpath('text()')
How do I get all the text within a tag using a Nokogiri CSS selector?
It looks like you can simply call the #text
method of the target element and it will include all child text nodes:
doc = Nokogiri::HTML(your_html_snippet)
str = doc.css('td').text
str # => "\n\nsome text\n\n\nmore text\n\n"
Nokogiri: Get text which is not inside the a tag
This is straightforward using XPath and the text()
node test. If you have extracted the li
s into nodeset
, you can get the text with:
nodeset.xpath('./text()')
Or you can get it directly from the whole doc:
doc.xpath('//li/text()')
This uses the text()
node test as part of te XPath expression, not the text
Ruby method. It extracts any text nodes that are direct descendants of the li
node, so doesn’t include the contents of the a
element.
Getting text only when nokogiri certain HTML structure
I would delete the other nodes that are in this section if you're not using the document any further.
nokogiri_object.css("div.line1 *").each(&:remove)
nokogiri_object.at_css("div.line1").text.strip # => "text I need"
Get content after header tag with Nokogiri
You can get ul elements after h4 using following-sibling
:
require 'nokogiri'
html = <<-EOF
<div class="colmask">
<div class="box box_1">
<h4>Alabama</h4>
<ul>
<li><a href="//auburn.craigslist.org/">auburn</a></li>
<li><a href="//bham.craigslist.org/">birmingham</a></li>
<li><a href="//dothan.craigslist.org/">dothan</a></li>
<li><a href="//shoals.craigslist.org/">florence / muscle shoals</a></li>
<li><a href="//gadsden.craigslist.org/">gadsden-anniston</a></li>
<li><a href="//huntsville.craigslist.org/">huntsville / decatur</a></li>
<li><a href="//mobile.craigslist.org/">mobile</a></li>
<li><a href="//montgomery.craigslist.org/">montgomery</a></li>
<li><a href="//tuscaloosa.craigslist.org/">tuscaloosa</a></li>
</ul>
<h4>Alaska</h4>
<ul>
<li><a href="//anchorage.craigslist.org/">anchorage / mat-su</a></li>
<li><a href="//fairbanks.craigslist.org/">fairbanks</a></li>
<li><a href="//kenai.craigslist.org/">kenai peninsula</a></li>
<li><a href="//juneau.craigslist.org/">southeast alaska</a></li>
</ul>
EOF
doc = Nokogiri::HTML(html)
doc.xpath('//h4/following-sibling::ul').each do |node|
puts node.to_html
end
To select ul after an h4 with exact text:
puts doc.xpath("//h4[text()='Alabama']/following-sibling::ul")[0].to_html
Related Topics
How Does Pack() and Unpack() Work in Ruby
Get Title, Content via Link in Rails
Ruby Split String by Repeating Characters or a Space
How to Split String into Only Two Parts with a Given Character in Ruby
Why Is Ruby's Date Class Automatically Loaded But Datetime Is Not
Convert Array to Hash While Preserving Array Index Values in Ruby
What Does the Regular Expression [\W-] Mean
Pry Not Stopping When Called from a Ruby Script That Reads from Stdin
Rails 4.1 Mailer Previews and Devise Custom Emails
Testing After_Commit with Rspec and Mocking
Check If a File Exists Using a Wildcard
How to Run My Ruby Code After Rails Server Start
Testing a Concern/Module That Uses Activerecord