Catching Timeout Errors with Ruby Mechanize

Catching timeout errors with ruby mechanize

Instead of retrying some timeouts on some mechanize requests I think you'd better set Mechanize::HTTP::Agent::read_timeout attribute to a reasonable amount of seconds like 2 or 5, anyway one that prevent timeouts errors for this request.

Then, it seem's that your log out procedure only required access to a simple HTTP GET request. I mean there is no form to fill in so no HTTP POST request.
So if I were you, I would prefere inspected the page source code (Ctrl+U with Firefox or Chrome) in order to identify the link which is reached by your agent.click(page.link_with(:text => /Log Out/i))
It should be faster because these type of pages are usually blank and Mechanize will not have to load a full html web page in memory.

Here is the code I would prefer use :

def logmeout(agent)
  begin
  agent.read_timeout=2  #set the agent time out
  page = agent.get('http://www.example.com/logout_url.php')
  agent.history.pop()   #delete this request in the history
  rescue Timeout::Error 
    puts "Timeout!"
    puts "read_timeout attribute is set to #{agent.read_timeout}s" if !agent.read_timeout.nil?
    #retry      #retry is no more needed
  end
end

but you can use your retry function too :

def trythreetimes
  tries = 0
  begin
  yield
  rescue Exception => e  
  tries += 1
  puts "Error: #{e.message}"
  puts "Trying again!" if tries <= 3
  retry if tries <= 3
  puts "No more attempt!"
  end
end

def logmeout(agent)
  trythreetimes do
  agent.read_timeout=2  #set the agent time out
  page = agent.get('http://www.example.com/logout_url.php')
  agent.history.pop()       #delete this request in the history
  end
end

hope it helps ! ;-)

mechanize dealing with errors

You'd want to rescue on failed request, just like here

task :estimateone => :environment do
  require 'mechanize'
  require 'csv'

  begin
  # ...
  page = mechanize.get('http://www.theurbanlist.com/brisbane/a-list/50-brisbane-cafes-you-should-have-eaten-breakfast-at')
  rescue Mechanize::ResponseCodeError
    # do something with the result, log it, write it, mark it as failed, wait a bit and then continue the job
    next
  end
end

My guess is that you're hitting API rate limits. This will not solve your problem as it is not in your side but at the server's; but will give you range to work as now you can flag the links that did not work and continue from there on.

Handling Timeout error

Well, that's expected behaviour of Timeout. If the block takes too long, its execution gets terminated and an exception thrown.

You would probably like to catch the exception and handle it appropriately:

require 'timeout'
begin
  status = Timeout::timeout(5) {
    # Something that should be interrupted if it takes too much time...
  }
rescue Timeout::Error
  puts 'That took too long, exiting...'
end

What is proper way to test error handling?

Sort of. You shouldn't test dependent libraries again in your application. It's enough to catch the Net::HTTP::Persistent::Error without ensuring the underlying functionality is working. Well written gems should provide their own tests, and you should be able to access those tests as needed by testing that gem (Mechanize, for example).

You could mock for those errors, but you should be judicious. Here is some code to mock an SMTP connection

 class Mock
    require 'net/smtp'

    def initialize( options )
      @options = options
      @username = options[:username]
      @password = options[:password]
      options[:port] ? @port = options[:port] : @port = 25
      @helo_domain = options[:helo_domain]
      @from_addr = options[:from_address]
      @from_domain = options[:from_domain]

      #Mock object for SMTP connections
      mock_config = {}
      mock_config[:address] = options[:server]
      mock_config[:port] = @port

      @connection = RSpec::instance_double(Net::SMTP, mock_config)

      allow(@connection).to receive(:start).and_yield(@connection)
      allow(@connection).to receive(:send_message).and_return(true)
      allow(@connection).to receive(:started?).and_return(true)
      allow(@connection).to receive(:finish).and_return(true)
    end
    #more stuff here
 end

I don't see you testing for any custom errors which would make more sense here. For example, you might test for url-unfriendly characters in your parameter and rescue from that. In that case, your test would offer something explicit.

 expect(get("???.net")).to raise_error(CustomError)

Ruby Mechanize Connection timed out

After some more programming experience, I realized that this was a simple error on my part: my code did not catch the error thrown and appropriately move to the next link when a link was corrupted.

For any novice Ruby programmers that encounter a similar problem:

The Connection timed out error is usually due to an invalid link, etc. on the page being scrapped.

You need to wrap the code that is accessing link in a statement such as the below

begin 
     #[1 your scraping code here ] 
rescue
     #[2 code to move to the next link/page/etc. that you are scraping instead of sticking to the invalid one] 
end

For instance, if you have a for loop that is iterating over links and extracting information from each link, then that should be at [1] and code to move to the next link (consider using something like ruby "next") should be placed at [2]. You might also consider printing something to the console to let the user know that a link was invalid.

Retry testing sites after timeout error in Watir

Your loop can be:

#Use Ruby's method for iterating through the array
testsite_array.each do |site|
    attempt = 1
    begin
        ie.goto site
        if ie.html.include? 'teststring'
            puts site + ' yes'
        else
            puts site + ' no'
        end 
    rescue
        attempt += 1

        #Retry accessing the site or stop trying
        if attempt > MAX_ATTEMPTS
            puts site + ' site failed, moving on'
        else
            retry
        end
    end
end

Catching Timeout Errors with Ruby Mechanize