How to download via HTTP only piece of big file with ruby
This seems to work when using sockets:
require 'socket'
host = "download.thinkbroadband.com"
path = "/1GB.zip" # get 1gb sample file
request = "GET #{path} HTTP/1.0\r\n\r\n"
socket = TCPSocket.open(host,80)
socket.print(request)
# find beginning of response body
buffer = ""
while !buffer.match("\r\n\r\n") do
buffer += socket.read(1)
end
response = socket.read(100) #read first 100 bytes of body
puts response
I'm curious if there is a "ruby way".
Download the last parts of a file, using Ruby?
You have create a range request (partial download), here is the info how to do it: How to make an HTTP GET with modified headers?
You'll need the size of the file, so you need another request to fetch only the headers to parse that info, preferably with a HEAD command (or a GET with Range: bytes=0-0
).
How do I download a binary file over HTTP?
The simplest way is the platform-specific solution:
#!/usr/bin/env ruby
`wget http://somedomain.net/flv/sample/sample.flv`
Probably you are searching for:
require 'net/http'
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception.
Net::HTTP.start("somedomain.net") do |http|
resp = http.get("/flv/sample/sample.flv")
open("sample.flv", "wb") do |file|
file.write(resp.body)
end
end
puts "Done."
Edit: Changed. Thank You.
Edit2: The solution which saves part of a file while downloading:
# instead of http.get
f = open('sample.flv')
begin
http.request_get('/sample.flv') do |resp|
resp.read_body do |segment|
f.write(segment)
end
end
ensure
f.close()
end
How to save pictures from URL to disk
You are almost done. The only thing left is to store files. Let’s do it.
LOCATION = 'C:\pickaxe\pictures'
if !File.exist? LOCATION # create folder if it is not exist
require 'fileutils'
FileUtils.mkpath LOCATION
end
require 'net/http'
.... # your code with nokogiri etc.
links.each{|link|
Net::HTTP.start(PAGE_URL) do |http|
localname = link.gsub /.*\//, '' # left the filename only
resp = http.get link['src']
open("#{LOCATION}/#{localname}", "wb") do |file|
file.write resp.body
end
end
end
That’s it.
Memory issues with HTTParty and download large files
You can use Net::HTTP. See the documentation (in particular the section titled "Streaming Response Bodies").
Here's the example from the docs:
uri = URI('http://example.com/large_file')
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri.request_uri
http.request request do |response|
open 'large_file', 'w' do |io|
response.read_body do |chunk|
io.write chunk
end
end
end
end
Limit fetch size of Net::HTTP.request_get
I'm not sure when using Net::HTTP but using OpenURI i usually do the following:
require 'open-uri'
resource = open('http://google.com')
resource.read( 5120 )
=> # reads first 5120 characters, which i'm assuming would be 5KB.
hope this helps.
How can I use Net::Http to download a file with UTF-8 characters in it?
How can I: 1) Check the encoding of a remote file like that.
You can check the Content-Type
header of the response, which, if present, may look something like this:
Content-Type: text/plain; charset=utf-8
As you can see, the encoding is specified there. If there's no Content-Type header, or if the charset is not specified, or if the charset is specified incorrectly, then you can't know the encoding of the text. There are gems that can try to guess the encoding(with increasing accuracy), e.g. rchardet
, charlock_holmes
, but for complete accuracy, you have to know the encoding before reading the text.
This code somehow thinks all files that are downloaded are encoded in
ASCII 8-bit.
In ruby, ASCII-8BIT
is equivalent to binary
, which means the Net::HTTP library just gives you a string containing a series of single bytes, and it's up to you to decide how to interpret those bytes.
If you want to interpret those bytes as UTF-8, then you do that with String#force_encoding()
:
text = text.force_encoding("UTF-8")
You might want to do that if, for instance, you want to do some regex matching on the string, and you want to match full characters(which might be multi-byte) rather than just single bytes.
Encoding::UndefinedConversionError: "\x95" from ASCII-8BIT to UTF-8
Using String#encode('UTF-8')
to convert ASCII-8BIT to UTF-8 doesn't work for bytes whose ascii codes are greater than 127:
(0..255).each do |ascii_code|
str = ascii_code.chr("ASCII-8BIT")
#puts str.encoding #=>ASCII-8BIT
begin
str.encode("UTF-8")
rescue Encoding::UndefinedConversionError
puts "Can't encode char with ascii code #{ascii_code} to UTF-8."
end
end
--output:--
Can't encode char with ascii code 128 to UTF-8.
Can't encode char with ascii code 129 to UTF-8.
Can't encode char with ascii code 130 to UTF-8.
...
...
Can't encode char with ascii code 253 to UTF-8.
Can't encode char with ascii code 254 to UTF-8.
Can't encode char with ascii code 255 to UTF-8.
Ruby just reads one byte at a time from the ASCII-8BIT string and tries to convert the character in the byte to UTF-8. So, while 128 may be a legal byte in UTF-8 when part of a multi-byte character sequence, 128 is not a legal UTF-8 character as a single byte.
As for writing the strings to a file, instead of this:
f = open(filename)
...if you want to output UTF-8 to the file, you would write:
f = open(filename, "w:UTF-8")
By default, ruby uses whatever the value of Encoding.default_external
is to encode output to a file. The default_external encoding is pulled from your system's environment, or you can set it explicitly.
Related Topics
Ruby String to Date Conversion
Routing Nested Resources in Rails 3
Rails: How to Use Dependent: :Destroy in Rails
Including One Erb File into Another
Ruby Convert Array to Nested Hash
Paperclip Renaming Files After They'Re Saved
How to Combine Overlapping Time Ranges (Time Ranges Union)
How to Select Every Nth Item in an Array
Parsing String to Add to Url-Encoded Url
Ruby Gemspec Dependency: Is Possible Have a Git Branch Dependency
Heroku and Rails: Gem Load Error with Postgres, However It Is Specified in Gemfile
How to Cancel Scheduled Job with Delayed_Job in Rails
What Are the Differences Between "Private", "Public", and "Protected Methods"
How to Remove Gem from Ruby on Rails Application