How to Download via Http Only Piece of Big File with Ruby

How to download via HTTP only piece of big file with ruby

This seems to work when using sockets:

require 'socket'                  
host = "download.thinkbroadband.com"
path = "/1GB.zip" # get 1gb sample file
request = "GET #{path} HTTP/1.0\r\n\r\n"
socket = TCPSocket.open(host,80)
socket.print(request)

# find beginning of response body
buffer = ""
while !buffer.match("\r\n\r\n") do
buffer += socket.read(1)
end

response = socket.read(100) #read first 100 bytes of body
puts response

I'm curious if there is a "ruby way".

Download the last parts of a file, using Ruby?

You have create a range request (partial download), here is the info how to do it: How to make an HTTP GET with modified headers?

You'll need the size of the file, so you need another request to fetch only the headers to parse that info, preferably with a HEAD command (or a GET with Range: bytes=0-0).

How do I download a binary file over HTTP?

The simplest way is the platform-specific solution:

 #!/usr/bin/env ruby
`wget http://somedomain.net/flv/sample/sample.flv`

Probably you are searching for:

require 'net/http'
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception.
Net::HTTP.start("somedomain.net") do |http|
resp = http.get("/flv/sample/sample.flv")
open("sample.flv", "wb") do |file|
file.write(resp.body)
end
end
puts "Done."

Edit: Changed. Thank You.

Edit2: The solution which saves part of a file while downloading:

# instead of http.get
f = open('sample.flv')
begin
http.request_get('/sample.flv') do |resp|
resp.read_body do |segment|
f.write(segment)
end
end
ensure
f.close()
end

How to save pictures from URL to disk

You are almost done. The only thing left is to store files. Let’s do it.

LOCATION = 'C:\pickaxe\pictures'
if !File.exist? LOCATION # create folder if it is not exist
require 'fileutils'
FileUtils.mkpath LOCATION
end

require 'net/http'
.... # your code with nokogiri etc.
links.each{|link|
Net::HTTP.start(PAGE_URL) do |http|
localname = link.gsub /.*\//, '' # left the filename only
resp = http.get link['src']
open("#{LOCATION}/#{localname}", "wb") do |file|
file.write resp.body
end
end
end

That’s it.

Memory issues with HTTParty and download large files

You can use Net::HTTP. See the documentation (in particular the section titled "Streaming Response Bodies").

Here's the example from the docs:

uri = URI('http://example.com/large_file')

Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri.request_uri

http.request request do |response|
open 'large_file', 'w' do |io|
response.read_body do |chunk|
io.write chunk
end
end
end
end

Limit fetch size of Net::HTTP.request_get

I'm not sure when using Net::HTTP but using OpenURI i usually do the following:

require 'open-uri'

resource = open('http://google.com')

resource.read( 5120 )
=> # reads first 5120 characters, which i'm assuming would be 5KB.

hope this helps.

How can I use Net::Http to download a file with UTF-8 characters in it?

How can I: 1) Check the encoding of a remote file like that.

You can check the Content-Type header of the response, which, if present, may look something like this:

Content-Type: text/plain; charset=utf-8

As you can see, the encoding is specified there. If there's no Content-Type header, or if the charset is not specified, or if the charset is specified incorrectly, then you can't know the encoding of the text. There are gems that can try to guess the encoding(with increasing accuracy), e.g. rchardet, charlock_holmes, but for complete accuracy, you have to know the encoding before reading the text.

This code somehow thinks all files that are downloaded are encoded in
ASCII 8-bit.

In ruby, ASCII-8BIT is equivalent to binary, which means the Net::HTTP library just gives you a string containing a series of single bytes, and it's up to you to decide how to interpret those bytes.

If you want to interpret those bytes as UTF-8, then you do that with String#force_encoding():

text = text.force_encoding("UTF-8")

You might want to do that if, for instance, you want to do some regex matching on the string, and you want to match full characters(which might be multi-byte) rather than just single bytes.

Encoding::UndefinedConversionError: "\x95" from ASCII-8BIT to UTF-8

Using String#encode('UTF-8') to convert ASCII-8BIT to UTF-8 doesn't work for bytes whose ascii codes are greater than 127:

(0..255).each do |ascii_code|
str = ascii_code.chr("ASCII-8BIT")
#puts str.encoding #=>ASCII-8BIT

begin
str.encode("UTF-8")
rescue Encoding::UndefinedConversionError
puts "Can't encode char with ascii code #{ascii_code} to UTF-8."
end

end

--output:--
Can't encode char with ascii code 128 to UTF-8.
Can't encode char with ascii code 129 to UTF-8.
Can't encode char with ascii code 130 to UTF-8.
...
...
Can't encode char with ascii code 253 to UTF-8.
Can't encode char with ascii code 254 to UTF-8.
Can't encode char with ascii code 255 to UTF-8.

Ruby just reads one byte at a time from the ASCII-8BIT string and tries to convert the character in the byte to UTF-8. So, while 128 may be a legal byte in UTF-8 when part of a multi-byte character sequence, 128 is not a legal UTF-8 character as a single byte.

As for writing the strings to a file, instead of this:

f = open(filename)

...if you want to output UTF-8 to the file, you would write:

f = open(filename, "w:UTF-8")

By default, ruby uses whatever the value of Encoding.default_external is to encode output to a file. The default_external encoding is pulled from your system's environment, or you can set it explicitly.



Related Topics



Leave a reply



Submit