Ruby Net::Ftp Timeout Threads

Ruby Net::FTP Timeout Threads

The trick for me that worked was to use ruby's Timeout.timeout to ensure the FTP connection was not hanging.

begin
Timeout.timeout(10) do
ftp.getbinaryfile(rmls_path, local_path)
end
# ...
rescue Timeout::Error
errors << "#{thread_num}> File download timed out for: #{rmls_path}"
puts errors.last
rescue
errors << "unable to get file > ftp reponse: #{ftp.last_response}"
# ...
end

Hanging FTP downloads were causing my threads to appear to hang. Now that the threads are no longer hanging, I can use the more proper way of dealing with threads:

threads.each { |t| t.join }

rather than the ugly:

# If @last_updated has not been updated on the server in over 20 seconds, wait 3 seconds and check again
while Time.now < @last_updated + 20 do
sleep 3
end
# threads are hanging so joining the threads does not work.
threads.each { |t| t.kill }

How to give a timeout to an FTP connection

I'm not sure if it hangs indefinitely. If not, the best way would be to try and capture the error code when/if it eventually times out. That would give a bit more info for analysis.

Some possible workarounds below.

Timeouts using Process.fork

However in the meantime you might switch to running the FTP task in another process instead, and using timeout on that. This will prevent the ruby global interpreter lock from disabling a possible timeout event like you suspect now.

Something like this:

child = Process.fork do
# Run the whole FTP task in here...
ftp = Net::FTP.new(...)
...
end

# Timeout handling is done in the parent process
begin
Timeout::timeout(...) do
Process.wait(child)
end
rescue Timeout::Error
# Terminate child in case of timeout
Process.kill("KILL", child)
end

Timeouts using SystemTimer

Another option, since you're running ruby 1.8.6, would be to take a look at SystemTimer, which tries to get around the limitations of the ruby 1.8 Timeout implementation.

How to recursively download FTP folder in parallel in Ruby?

The syncftp gem may help you:

http://rubydoc.info/gems/syncftp/0.0.3/frames

Ruby has a decent built-in FTP library in case you want to roll your own:

http://www.ruby-doc.org/stdlib-1.9.3/libdoc/net/ftp/rdoc/Net/FTP.html

To download files in parallel, you can use multiple threads with timeouts:

Ruby Net::FTP Timeout Threads

A great way to get parallel work done is Celluloid, the concurrent framework:

https://github.com/celluloid/celluloid

All that said, if the download speed is limited to your overall network bandwidth, then none of these approaches will help much.

To speed up the transfers in this case, be sure you're only downloading the information that's changed: new files and changed sections of existing files.

Segmented downloading can give massive speedups in some cases, such as downloaded big log files where only a small percentage of the file has changed, and the changes are all at the end of the file, and are all appends.

You can also consider shelling out to the command line. There are many tools that can help you with this. A good general-purpose one is "curl", which supports simple ranges for FTP files as well, for example you can get the first 100 bytes of a document using FTP like this:

curl -r 0-99 ftp://www.get.this/README

Are you open to other protocols besides FTP? Take a look at the "rsync" command, which is excellent for download synchronization. The rsync command has many optimizations to transfer just the changed data. For example rsync can sync a remote directory to a local directory like this:

rsync -auvC me@my.com:/remote/foo/ /local/foo/ 


Related Topics



Leave a reply



Submit