How do I get content from a website using Ruby / Rails?
This isn't really a Rails question. It's something you'd do using Ruby, then possibly display using Rails, or Sinatra or Padrino - pick your poison.
There are several different HTTP clients you can use:
Open-URI comes with Ruby and is the easiest. Net::HTTP comes with Ruby and is the standard toolbox, but it's lower-level so you'd have to do more work. HTTPClient and Typhoeus+Hydra are capable of threading and have both high-level and low-level interfaces.
I recommend using Nokogiri to parse the returned HTML. It's very full-featured and robust.
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.example.com'))
puts doc.to_html
If you need to navigate through login screens or fill in forms before you get to the page you need to parse, then I'd recommend looking at Mechanize. It relies on Nokogiri internally so you can ask it for a Nokogiri document and parse away once Mechanize retrieves the desired URL.If you need to deal with Dynamic HTML, then look into the various WATIR tools. They drive various web browsers then let you access the content as seen by the browser.
Once you have the content or data you want, you can "repurpose" it into text inside a Rails page.
How to scrape data from another website using Rails 3
I'd recommend a combination of Nokogiri and open-uri. Require both gems, and then just do something along the lines of doc = Nokogiri::HTML(open(YOUR_URL))
. Then find the element you want to capture (using developer tools in chrome (or the equivalent) or something like Selector Gadget. Then you can use doc.at_css(SELECTOR)
for a single element, or doc.search(SELECTOR)
for multiple selectors. Calling the text method the response should get you the product description you're looking for. No need to save anything to the database (unless you want to) Hope that helps!
Get the html from a website with ruby on rails
You can use httparty to just get the data
Sample code (from example):
require File.join(dir, 'httparty')
require 'pp'
class Google
include HTTParty
format :html
end
# google.com redirects to www.google.com so this is live test for redirection
pp Google.get('http://google.com')
puts '', '*'*70, ''
# check that ssl is requesting right
pp Google.get('https://www.google.com')
Nokogiri really excels at parsing that data.. Here's some example code from the Railscast:url = "http://www.walmart.com/search/search-ng.do?search_constraint=0&ic=48_0&search_query=batman&Find.x=0&Find.y=0&Find=Find"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".item").each do |item|
title = item.at_css(".prodLink").text
price = item.at_css(".PriceCompare .BodyS, .PriceXLBold").text[/\$[0-9\.]+/]
puts "#{title} - #{price}"
puts item.at_css(".prodLink")[:href]
end
ruby code to search and get a string from a html content
key = get()[/commit\s+([a-f0-9]{10,})/i, 1]
puts key
Regex explanation here. How to get the HTML source of a webpage in Ruby
Use Net::HTTP:
require 'net/http'
source = Net::HTTP.get('stackoverflow.com', '/index.html')
Related Topics
Use Pry in Gems Without Modifying The Gemfile or Using 'Require'
Pod Install in Xcode Bots Trigger
Where Is Ruby's Erb Format "Officially" Defined
Why Use Gemspec + Gemfile When Checking for Dependencies
How to Wait for System Command to End
How to Automatically Escape HTML Content Using Jekyll and Markdown
Problems While Making a Generic Model in Ruby on Rails 3
How to Test Strong Params with Rspec
Do Ruby Objects Have a Size Limit
Dbi::Interfaceerror: Could Not Load Driver (Uninitialized Constant MySQL error)
How to Click on Specific Element in Canvas by Its Coordinates (Using Webdriver)
No Such File to Load - Mechanize
How to Update All of My Products to a Specific User When Seeding