How to Get Http Headers Before Downloading with Ruby's Openuri

How to get HTTP headers before downloading with Ruby's OpenUri

It seems what I wanted is not possible to archieve using OpenURI, at least not, as I said, without loading the whole file first.

I was able to do what I wanted using Net::HTTP's request_get

Here an example:

http.request_get('/largefile.jpg') {|response|
  if (response['content-length'] < max_length)
    response.read_body do |str|   # read body now
      # save to file
    end
  end
}

Note that this only works when using a block, doing it like:

response = http.request_get('/largefile.jpg')

the body will already be read.

Display HTTP headers using Open::URI?

Use the meta method of the virtual filehandle:

open('http://google.com'){|f| pp f.meta  }
{"x-frame-options"=>"SAMEORIGIN",
 "expires"=>"-1",
 "p3p"=>
  "CP=\"This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info.\"",
 "content-type"=>"text/html; charset=ISO-8859-1",
 "date"=>"Mon, 17 Dec 2012 14:37:29 GMT",
 "server"=>"gws",
 "x-xss-protection"=>"1; mode=block",
 "set-cookie"=>
  "PREF=ID=d2fb8a93d369bcd2:FF=0:TM=1355755049:LM=1355755049:S=ONVSP6n2jtluFgll; expires=Wed, 17-Dec-2014 14:37:29 GMT; path=/; domain=.google.com, NID=67=OFEvvHCOa3C6wScQCUIKfu_89oL9MSmnFjwN-u5LX_foP8NLsX7G9dq48NLVrf4WUXhqOA1jb38s0e9qeRp1Iwx_LT_N8IuF0Qi6dXVtR2zdvA86INqtfg5uNrKvxJfJ; expires=Tue, 18-Jun-2013 14:37:29 GMT; path=/; domain=.google.com; HttpOnly",
 "cache-control"=>"private, max-age=0",
 "transfer-encoding"=>"chunked"}

http://www.ruby-doc.org/stdlib-1.9.3/libdoc/open-uri/rdoc/OpenURI/Meta.html

How do I only read the header for a URL in Ruby?

As far as I'm aware this isn't possible with OpenURI but is with Net:HTTP (this may have changed). See this answer:

Get HTTP Headers before downloading with open-uri (Ruby)

which suggests you use Net::HTTP's request_get method.

How to specify http request header in OpenURI

According to the documentation, you can pass a hash of http headers as the second argument to open:

open("http://www.ruby-lang.org/en/",
   "User-Agent" => "Ruby/#{RUBY_VERSION}",
   "From" => "foo@bar.invalid",
   "Referer" => "http://www.ruby-lang.org/") {|f|
   # ...
 }

How to download a file from a URL that requires a 'Bearer Token' from the Rails console?

You can add a header in the second parameter as described in https://ruby-doc.org/stdlib-2.3.1/libdoc/open-uri/rdoc/OpenURI.html.

require 'open-uri'

token = "f00"

url = "http://via.placeholder.com/150"

open('image.png', 'wb') do |file|
  file << open(url, "Authorization" => "Bearer #{token}").read
end

Downloading a specific part of a file from a http server

You can do this with any library that lets you set request headers (all curl -r does is set the Range header), which should be pretty much any HTTP library. Net::HTTP, for its part, has a set_range convenience method that takes as arguments either a single Range object (e.g. 60...80) or a start index and length:

require "net/http"
require "uri"

url = URI.parse("http://example.com/foo")
req = Net::HTTP::Get.new(url.path)

req.set_range(60, 20)

res = Net::HTTP.new(url.host, url.port).start do |http|
  http.request(req)
end
puts res.body

How do I download a binary file over HTTP?

The simplest way is the platform-specific solution:

 #!/usr/bin/env ruby
`wget http://somedomain.net/flv/sample/sample.flv`

Probably you are searching for:

require 'net/http'
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception.
Net::HTTP.start("somedomain.net") do |http|
    resp = http.get("/flv/sample/sample.flv")
    open("sample.flv", "wb") do |file|
        file.write(resp.body)
    end
end
puts "Done."

Edit: Changed. Thank You.

Edit2: The solution which saves part of a file while downloading:

# instead of http.get
f = open('sample.flv')
begin
    http.request_get('/sample.flv') do |resp|
        resp.read_body do |segment|
            f.write(segment)
        end
    end
ensure
    f.close()
end

OpenURI having issues when passed URLs with fragment identifiers

Problem

Fragment part of a URI should not be sent to server.

From Wikipedia: Fragment Identifier

The fragment identifier functions differently than the rest of the URI: namely, its processing is exclusively client-side with no participation from the web server — of course the server typically helps to determine the MIME type, and the MIME type determines the processing of fragments. When an agent (such as a Web browser) requests a web resource from a Web server, the agent sends the URI to the server, but does not send the fragment. Instead, the agent waits for the server to send the resource, and then the agent processes the resource according to the document type and fragment value.

Solution

Strip fragment part of a URI before passing it to open.

require "uri"

u = URI.parse "http://example.com#fragment"
u.fragment = nil
u.to_s #=> "http://example.com"

How to Get Http Headers Before Downloading with Ruby's Openuri