How to Get The HTML Source of a Webpage in Ruby

How to get the HTML source of a webpage in Ruby

Use Net::HTTP:

require 'net/http'

source = Net::HTTP.get('stackoverflow.com', '/index.html')

How to get the raw HTML source code for a page by using Ruby or Nokogiri?

Don't use Nokogiri at all if you want the raw source of a web page. Just fetch the web page directly as a string, and then do not feed that to Nokogiri. For example:

require 'open-uri'
html = open('http://phrogz.net').read
puts html.length #=> 8461
puts html        #=> ...raw source of the page...

If, on the other hand, you want the post-JavaScript-modified contents of a page (such as an AJAX library that executes JavaScript code to fetch new content and change the page), then you can't use Nokogiri. You need to use Ruby to control a web browser (e.g. read up on Selenium or Watir).

Get the html from a website with ruby on rails

You can use httparty to just get the data

Sample code (from example):

require File.join(dir, 'httparty')
require 'pp'

class Google
  include HTTParty
  format :html
end

# google.com redirects to www.google.com so this is live test for redirection
pp Google.get('http://google.com')

puts '', '*'*70, ''

# check that ssl is requesting right
pp Google.get('https://www.google.com')

Nokogiri really excels at parsing that data.. Here's some example code from the Railscast:

url = "http://www.walmart.com/search/search-ng.do?search_constraint=0&ic=48_0&search_query=batman&Find.x=0&Find.y=0&Find=Find"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".item").each do |item|
  title = item.at_css(".prodLink").text
  price = item.at_css(".PriceCompare .BodyS, .PriceXLBold").text[/\$[0-9\.]+/]
  puts "#{title} - #{price}"
  puts item.at_css(".prodLink")[:href]
end

ruby watir to get html of a page

This should do it:

puts browser.html

ruby code to search and get a string from a html content

key = get()[/commit\s+([a-f0-9]{10,})/i, 1]
puts key

Regex explanation here.

Ruby On Rails: Display html source code instead of rendering it

I fixed it. It was something to do with mongrel. I found the solution here:

https://rails.lighthouseapp.com/projects/8994/tickets/4690

(RUBY) How to read HTML tag contents and print them in the console

Use nokogiri to parse html. Run gem install nokogiri.

require 'nokogiri'
html = Nokogiri::HTML(open("http://#{website}"))

html.css('h3').each do |title_node|
  puts "Title: #{title_node.content}"
end

How to Get The HTML Source of a Webpage in Ruby