getaddrinfo error with Mechanize
I found the solution. Mechanize was leaving the connection open and relying on GC to clean them up. After a certain point, there were enough open connections that no additional outbound connection could be established to do a DNS lookup. Here's the code that caused it to work:
agent = Mechanize.new do |a|
a.follow_meta_refresh = true
a.keep_alive = false
end
By setting keep_alive to false, the connection is immediately closed and cleaned up.
Getting error getaddrinfo: No such host is known. (Socke tError) with mechanize gem
That error means dns is not resolving. In my experience it's usually because your internet is down.
Mechanize won't conect to site
While it's true that mechanize doesn't support javascript, your problem is that you are trying to access a site that doesn't exist. You are trying to access www.imbd.com
instead of www.imdb.com
. So, the error message is accurate.
And FWIW, IMDB doesn't want you to scrape their site:
Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.
Mechanize in Module, Nameerror ' agent'
Ruby has a naming convention. agent
is a local variable in the class scope. To make it visible to other methods you should make it a class variable by naming it @@agent
, and it'll be shared among all the objects of WebBot
. The preferred way though is to make it an instance variable by naming it @agent
. Every object of WebBot
will have its own @agent
. But you should put it in initialize
, initialize
will be invoked when you create a new object with new
class WebBot
def initialize
@agent = Mechanize.new do |a|
a.user_agent_alias = 'Windows Chrome'
end
end
.....
And the same error will occur to page
. You defined it in form
as a local variable. When form
finishes execution, it'll be deleted. You should make it an instance variable. Fortunately, you don't have to put it in initialize
. You can define it here in form
. And the object will have its own @page
after invoking form
. Do this in form
:
def form(response)
require "addressable/uri"
require "addressable/template"
template = Addressable::Template.new("http://www.domain.com/{?query*}")
url = template.expand({"query" => response}).to_s
@page = agent.get(url)
end
And remember to change every occurrence of page
and agent
to @page
and @agent
. In your get_products
for example:
def get_products
products = []
@page.search("datatable").search('tr').each do |row|
.....
These changes will resolve the name errors. Refactoring is another story btw.
Related Topics
What Are Some Good Role Authorization Solutions Used with Authlogic
Is There a Bug in Ruby Lookbehind Assertions (1.9/2.0)
How to Set Ca-Bundle Path for Openssl in Ruby
Rails Devise - Current_User Is Nil
Is Getting Converted as "\U0092" by Nokogiri in Ruby on Rails
Rails How to Change Attribute Name When Rendering JSON
How to Inherit from Nilclass or How to Simulate Similar Function
Can Someone Please Explain Class << Self to Me
Rails Script Segmentation Fault with Rvm
How to Override Gemfile for Local Development
Authlogic Perishable_Token Resets on Every Request
Pg::Invalidparametervalue: Error: Invalid Value for Parameter "Client_Min_Messages": "Panic"
How to Convert Utf8 Combined Characters into Single Utf8 Characters in Ruby
Converting String "2½" (Two and a Half) into 2.5
Catching Timeout Errors with Ruby Mechanize
How to Render a String as an Erb File