Nokogiri recursively get all children
the traverse method yields the current node and all children to a block, recursively.
# if you would like it to be returned as an array, rather than each node being yielded to a block, you can do this
result = []
doc.traverse {|node| result << node }
result
# or,
require 'enumerator'
result = doc.enum_for(:traverse).map
Nokogiri, fetch all classes from page
#classes
returns only classes of the node itself. It doesn't deal with the child nodes. You need to scan all the child nodes recursively.
require 'nokogiri'
def flatten_dom(node)
node.elements.flat_map { |child| flatten(child) } << node
end
page = Nokogiri::HTML.parse('<html><body class="a b"><b class="c">x</b></body></html>')
flatten(page).flat_map(&:classes)
# => ["c", "a", "b"]
You may also want to add .uniq
in order to get rid of the duplicates.
Accessing Nokogiri element children
name, number, date, title = *content[1].css('td').map(&:text)
if content[1]
is a tr
, content[1].css('td')
will find all td
elements beneath it, .map(&:text)
will call td.text
for each of those td
and put it into an array, which we than splat with *
so we can do multiple assignment.
(Note: next time, please include the original HTML fragment, not the Nokogiri node inspect result.)
I can't get Nokogiri to loop through children nodes
Change this line of code:
content_options = xmldoc.xpath("//content_options")
to this:
content_options = xmldoc.xpath("//content_option")
Of course it will only show you one entry; in your XML, there's only one content_options
element, and there's 2 content_option
elements.
find first level children in nokogiri rails
When you say this:
table = page.css('table')
you're grabbing both tables rather than just the top level table. So you can either go back to the document root and use a selector that only matches the rows in the first table as mosch says or you can fix table
to be only the outer table with something like this:
table = page.css('table').first
trs = table.xpath('./tr')
or even this (depending on the HTML's real structure):
table = page.xpath('/html/body/table')
trs = table.xpath('./tr')
or perhaps one of these for table
(thanks Phrogz, again):
table = page.at('table')
table = page.at_css('table')
# or various other CSS and XPath incantations
Nokogiri: Merge neighbour text nodes recursively?
Okay, finally I got it right myself:
def merge_text_nodes(node)
prev_is_text = false
newnodes = []
node.children.each do |element|
if element.text?
if prev_is_text
newnodes[-1].content += element.text
else
newnodes << element
end
element.remove
prev_is_text = true
else
newnodes << merge_text_nodes(element)
element.remove
prev_is_text = false
end
end
node.children.remove
newnodes.each do |item|
node.add_child(item)
end
return node
end
How do I find direct children and not nested children using Rails and Nokogiri?
You can do it in a couple of steps using XPath. First you need to find the “level” of the table
(i.e. how nested it is in other tables), then find all descendant tr
that have the same number of table
ancestors:
tables = doc.xpath('//table')
tables.each do |table|
level = table.xpath('count(ancestor-or-self::table)')
rows = table.xpath(".//tr[count(ancestor::table) = #{level}]")
# do what you want with rows...
end
In the more general case, where you might have tr
nested directly other tr
s, you could do something like this (this would be invalid HTML, but you might have XML or some other tags):
tables.each do |table|
# Find the first descendant tr, and determine its level. This
# will be a "top-level" tr for this table. "level" here means how
# many tr elements (including itself) are between it and the
# document root.
level = table.xpath("count(descendant::tr[1]/ancestor-or-self::tr)")
# Now find all descendant trs that have that same level. Since
# the table itself is at a fixed level, this means all these nodes
# will be "top-level" rows for this table.
rows = table.xpath(".//tr[count(ancestor-or-self::tr) = #{level}]")
# handle rows...
end
The first step could be broken into two separate queries, which may be clearer:
first_tr = table.at_xpath(".//tr")
level = first_tr.xpath("count(ancestor-or-self::tr)")
(This will fail if there is a table with no tr
s though, as first_tr
will be nil
. The combined XPath above handles that situation correctly.)
Related Topics
Add Current Time Before Log Message
Get All Local Variables or Available Methods from Irb
How Do Open a File for Writing Only If It Doesn't Already Exist in Ruby
What's the Efficient Way to Multiply Two Arrays and Get Sum of Multiplied Values in Ruby
Validate That a Value Is in a Certain Range, E.G. 1 <= Val <=2
Ruby: Multiply All Elements of an Array
Ruby Attr_Reader Allows One to Modify String Variable If Using <<
How Can a Nested Class Access a Method in The Outer Class in Ruby
Before(:Each) for All Tests Except One
Initializing Instance Variable as an Array - Ruby
Combine Thumbnails to One Large Image with Rmagick
Should Rbenv Be Installed System-Wide, or at a User Level
How Does One Match Character or Nothing Using Regular Expression