How to get HTTP headers before downloading with Ruby's OpenUri
It seems what I wanted is not possible to archieve using OpenURI, at least not, as I said, without loading the whole file first.
I was able to do what I wanted using Net::HTTP's request_get
Here an example:
http.request_get('/largefile.jpg') {|response|
if (response['content-length'] < max_length)
response.read_body do |str| # read body now
# save to file
end
end
}
Note that this only works when using a block, doing it like:
response = http.request_get('/largefile.jpg')
the body will already be read.
Display HTTP headers using Open::URI?
Use the meta
method of the virtual filehandle:
open('http://google.com'){|f| pp f.meta }
{"x-frame-options"=>"SAMEORIGIN",
"expires"=>"-1",
"p3p"=>
"CP=\"This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info.\"",
"content-type"=>"text/html; charset=ISO-8859-1",
"date"=>"Mon, 17 Dec 2012 14:37:29 GMT",
"server"=>"gws",
"x-xss-protection"=>"1; mode=block",
"set-cookie"=>
"PREF=ID=d2fb8a93d369bcd2:FF=0:TM=1355755049:LM=1355755049:S=ONVSP6n2jtluFgll; expires=Wed, 17-Dec-2014 14:37:29 GMT; path=/; domain=.google.com, NID=67=OFEvvHCOa3C6wScQCUIKfu_89oL9MSmnFjwN-u5LX_foP8NLsX7G9dq48NLVrf4WUXhqOA1jb38s0e9qeRp1Iwx_LT_N8IuF0Qi6dXVtR2zdvA86INqtfg5uNrKvxJfJ; expires=Tue, 18-Jun-2013 14:37:29 GMT; path=/; domain=.google.com; HttpOnly",
"cache-control"=>"private, max-age=0",
"transfer-encoding"=>"chunked"}
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/open-uri/rdoc/OpenURI/Meta.html
How do I only read the header for a URL in Ruby?
As far as I'm aware this isn't possible with OpenURI but is with Net:HTTP (this may have changed). See this answer:
Get HTTP Headers before downloading with open-uri (Ruby)
which suggests you use Net::HTTP
's request_get
method.
How to specify http request header in OpenURI
According to the documentation, you can pass a hash of http headers as the second argument to open
:
open("http://www.ruby-lang.org/en/",
"User-Agent" => "Ruby/#{RUBY_VERSION}",
"From" => "foo@bar.invalid",
"Referer" => "http://www.ruby-lang.org/") {|f|
# ...
}
How to download a file from a URL that requires a 'Bearer Token' from the Rails console?
You can add a header in the second parameter as described in https://ruby-doc.org/stdlib-2.3.1/libdoc/open-uri/rdoc/OpenURI.html.
require 'open-uri'
token = "f00"
url = "http://via.placeholder.com/150"
open('image.png', 'wb') do |file|
file << open(url, "Authorization" => "Bearer #{token}").read
end
Downloading a specific part of a file from a http server
You can do this with any library that lets you set request headers (all curl -r
does is set the Range
header), which should be pretty much any HTTP library. Net::HTTP, for its part, has a set_range
convenience method that takes as arguments either a single Range object (e.g. 60...80
) or a start index and length:
require "net/http"
require "uri"
url = URI.parse("http://example.com/foo")
req = Net::HTTP::Get.new(url.path)
req.set_range(60, 20)
res = Net::HTTP.new(url.host, url.port).start do |http|
http.request(req)
end
puts res.body
How do I download a binary file over HTTP?
The simplest way is the platform-specific solution:
#!/usr/bin/env ruby
`wget http://somedomain.net/flv/sample/sample.flv`
Probably you are searching for:
require 'net/http'
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception.
Net::HTTP.start("somedomain.net") do |http|
resp = http.get("/flv/sample/sample.flv")
open("sample.flv", "wb") do |file|
file.write(resp.body)
end
end
puts "Done."
Edit: Changed. Thank You.
Edit2: The solution which saves part of a file while downloading:
# instead of http.get
f = open('sample.flv')
begin
http.request_get('/sample.flv') do |resp|
resp.read_body do |segment|
f.write(segment)
end
end
ensure
f.close()
end
OpenURI having issues when passed URLs with fragment identifiers
Problem
Fragment part of a URI should not be sent to server.
From Wikipedia: Fragment Identifier
The fragment identifier functions differently than the rest of the URI: namely, its processing is exclusively client-side with no participation from the web server — of course the server typically helps to determine the MIME type, and the MIME type determines the processing of fragments. When an agent (such as a Web browser) requests a web resource from a Web server, the agent sends the URI to the server, but does not send the fragment. Instead, the agent waits for the server to send the resource, and then the agent processes the resource according to the document type and fragment value.
Solution
Strip fragment part of a URI before passing it to open
.
require "uri"
u = URI.parse "http://example.com#fragment"
u.fragment = nil
u.to_s #=> "http://example.com"
Related Topics
How to Remove Duplicates in a Hash in Ruby on Rails
How to Give a Sub-Module the Same Name as a Top-Level Class
Mailchimp API Not Replacing Mc:Edit Content Sections (Using Ruby Library)
How to Split a String by Commas Except Inside Parenthesis, Using a Regular Expression
Obtaining a Facebook Auth Token for a Command-Line (Desktop) Application
/Config/Initializers/Secret_Token.Rb Not Being Generated. Why Not
How to Set a Variable from a Helper Method for Inclusion in a SASS SCSS Stylesheet
Rvm and Osx Lion - Rvm 'Forgets' Gemsets on System Restart
Override "Show" Resource Route in Rails
Check Method Call on Model Using Minitest
Can You Install Documentation for Existing Gems
In Ruby How to Use Class Level Local Variable? (A Ruby Newbie's Question)
Inherit Class-Level Instance Variables in Ruby
Is It Ever Necessary to Use 'Chomp' Before Using 'To_I' or 'To_F'