Nokogiri vs Hpricot?
Pick Nokogiri, for all points and especially point one: Hpricot is no longer maintained.
Meta answer: See ruby-toolbox to get an idea of the popularity of different tools in a given area.
Parse/Iterate html file with Hpricot/Nokogiri
Sure.
doc = Nokogiri::HTML(html)
doc.xpath('//a[@class="headline"]').each do |headline|
puts headline.text
puts headline.xpath('../following-sibling::div[1]').text
end
Hpricot / nokogiri - Parse SVG / XML file to get colors used
Something like this should work:
require 'nokogiri'
require 'open-uri'
url = 'http://upload.wikimedia.org/wikipedia/commons/e/e9/Pepsi_logo_2008.svg'
doc = Nokogiri::HTML open(url)
puts doc.xpath('//*[contains(@style,"fill")]').map{|e| e[:style][/fill:([^;]*)/, 1]}.uniq
how to translate this hpricot code to nokogiri?
Nokogiri and Hpricot are pretty interchangeable. I.e. Nokogiri(html) is an equivalent of Hpricot(html). Not really sure I understand what the linked article is trying to achieve, but to:
Extract text from HTML body which includes ignoring large white spaces between tags and words.
This would be an easier approach in Hpricot, and remove the need for the hpricot.search("script").remove
bits. I.e. Just get the body in the first place:
Hpricot(html).search('body').inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ")
And in Nokogiri:
Nokogiri(html).search('body').inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ")
Hpricot-style container method for Nokogiri? Select only certain node_types
To clarify, you want only child elements, but not child text nodes? If so, here are three techniques:
require 'nokogiri'
doc = Nokogiri::XML "<r>no<a1><b1/></a1><a2>no<b2>hi</b2>mom</a2>no</r>"
# If the element is uniquely selectable via CSS
kids1 = doc.css('r > *')
# ...or if we assume you found an element and want only its children
some_node = doc.at('r')
# One way to do it
kids2 = some_node.children.grep(Nokogiri::XML::Element)
# A geekier-but-shorter-way
kids3 = some_node.xpath('*')
# Confirm that they're the same (converting the NodeSets to arrays)
p [ kids1.to_a == kids2, kids2 == kids3.to_a ]
#=> [true, true]
p kids1.map(&:name), kids2.map(&:name), kids3.map(&:name)
#=> ["a1", "a2"]
#=> ["a1", "a2"]
#=> ["a1", "a2"]
open-uri + hpricot & nokogiri don't parse html correctly
There's no DIV with the id pasajes in the static HTML page. If you are running *nix you can see that by doing:
curl http://www.despegar.com.ar/ | grep pasajes
My guess is that it's JavaScript-generated.
If you are using MacRuby you could try Lyndon.
Rails Console - Hpricot, Nokogiri Unavailable in Rails Console?
Have you added the gems to Gemfile? They will be auto-loaded then when console starts.
Related Topics
Ruby 'Each_With_Object' with Index
Why Does Foreman Not Output Some Things Until I Press Control-C
Parsing Simple Xml with Nokogiri
Build Hash from Collection of Activerecord Models
Heroku Gem Not Working with Rvm
How to Sort a Hash by Value in Descending Order and Output a Hash in Ruby
How to Dynamically Call Routes Helper in Rails
Ruby Singleton Methods for Class and Objects
Specifying a Layout and a Template in a Standalone (Not Rails) Ruby App, Using Slim or Haml
How to Use Define_Method Inside Initialize()
Regex with Named Capture Groups Getting All Matches in Ruby
Mongodb and Mongoid in Production