Nokogiri VS Hpricot

Nokogiri vs Hpricot?

Pick Nokogiri, for all points and especially point one: Hpricot is no longer maintained.

Meta answer: See ruby-toolbox to get an idea of the popularity of different tools in a given area.

Parse/Iterate html file with Hpricot/Nokogiri

Sure.

doc = Nokogiri::HTML(html)
doc.xpath('//a[@class="headline"]').each do |headline|
  puts headline.text
  puts headline.xpath('../following-sibling::div[1]').text
end

Hpricot / nokogiri - Parse SVG / XML file to get colors used

Something like this should work:

require 'nokogiri'
require 'open-uri'

url = 'http://upload.wikimedia.org/wikipedia/commons/e/e9/Pepsi_logo_2008.svg'
doc = Nokogiri::HTML open(url)
puts doc.xpath('//*[contains(@style,"fill")]').map{|e| e[:style][/fill:([^;]*)/, 1]}.uniq

how to translate this hpricot code to nokogiri?

Nokogiri and Hpricot are pretty interchangeable. I.e. Nokogiri(html) is an equivalent of Hpricot(html). Not really sure I understand what the linked article is trying to achieve, but to:

Extract text from HTML body which includes ignoring large white spaces between tags and words.

This would be an easier approach in Hpricot, and remove the need for the hpricot.search("script").remove bits. I.e. Just get the body in the first place:

Hpricot(html).search('body').inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ")

And in Nokogiri:

Nokogiri(html).search('body').inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ")

Hpricot-style container method for Nokogiri? Select only certain node_types

To clarify, you want only child elements, but not child text nodes? If so, here are three techniques:

require 'nokogiri'
doc = Nokogiri::XML "<r>no<a1><b1/></a1><a2>no<b2>hi</b2>mom</a2>no</r>"

# If the element is uniquely selectable via CSS
kids1 = doc.css('r > *')

# ...or if we assume you found an element and want only its children
some_node = doc.at('r')

# One way to do it
kids2 = some_node.children.grep(Nokogiri::XML::Element)

# A geekier-but-shorter-way
kids3 = some_node.xpath('*')

# Confirm that they're the same (converting the NodeSets to arrays)
p [ kids1.to_a == kids2, kids2 == kids3.to_a ]
#=> [true, true]

p kids1.map(&:name), kids2.map(&:name), kids3.map(&:name)
#=> ["a1", "a2"]
#=> ["a1", "a2"]
#=> ["a1", "a2"]

open-uri + hpricot & nokogiri don't parse html correctly

There's no DIV with the id pasajes in the static HTML page. If you are running *nix you can see that by doing:

curl http://www.despegar.com.ar/ | grep pasajes

My guess is that it's JavaScript-generated.

If you are using MacRuby you could try Lyndon.

Rails Console - Hpricot, Nokogiri Unavailable in Rails Console?

Have you added the gems to Gemfile? They will be auto-loaded then when console starts.

Nokogiri VS Hpricot