Getaddrinfo Error with Mechanize

getaddrinfo error with Mechanize

I found the solution. Mechanize was leaving the connection open and relying on GC to clean them up. After a certain point, there were enough open connections that no additional outbound connection could be established to do a DNS lookup. Here's the code that caused it to work:

agent = Mechanize.new do |a| 
a.follow_meta_refresh = true
a.keep_alive = false
end

By setting keep_alive to false, the connection is immediately closed and cleaned up.

Getting error getaddrinfo: No such host is known. (Socke tError) with mechanize gem

That error means dns is not resolving. In my experience it's usually because your internet is down.

Mechanize won't conect to site

While it's true that mechanize doesn't support javascript, your problem is that you are trying to access a site that doesn't exist. You are trying to access www.imbd.com instead of www.imdb.com. So, the error message is accurate.

And FWIW, IMDB doesn't want you to scrape their site:

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

Mechanize in Module, Nameerror ' agent'

Ruby has a naming convention. agent is a local variable in the class scope. To make it visible to other methods you should make it a class variable by naming it @@agent, and it'll be shared among all the objects of WebBot. The preferred way though is to make it an instance variable by naming it @agent. Every object of WebBot will have its own @agent. But you should put it in initialize, initialize will be invoked when you create a new object with new

class WebBot
def initialize
@agent = Mechanize.new do |a|
a.user_agent_alias = 'Windows Chrome'
end
end
.....

And the same error will occur to page. You defined it in form as a local variable. When form finishes execution, it'll be deleted. You should make it an instance variable. Fortunately, you don't have to put it in initialize. You can define it here in form. And the object will have its own @page after invoking form. Do this in form:

def form(response)
require "addressable/uri"
require "addressable/template"
template = Addressable::Template.new("http://www.domain.com/{?query*}")
url = template.expand({"query" => response}).to_s
@page = agent.get(url)
end

And remember to change every occurrence of page and agent to @page and @agent. In your get_products for example:

def get_products
products = []
@page.search("datatable").search('tr').each do |row|
.....

These changes will resolve the name errors. Refactoring is another story btw.



Related Topics



Leave a reply



Submit