Ruby Net::Http - Following 301 Redirects

Ruby Net::HTTP - following 301 redirects

301 redirects are fairly common if you do not type the URL exactly as the web server expects it. They happen much more frequently than you'd think, you just don't normally ever notice them while browsing because the browser does all that automatically for you.

Two alternatives come to mind:

1: Use open-uri

open-uri handles redirects automatically. So all you'd need to do is:

require 'open-uri' 
...
response = open('http://xyz...').read

If you have trouble redirecting between HTTP and HTTPS, then have a look here for a solution:

Ruby open-uri redirect forbidden

2: Handle redirects with Net::HTTP

def get_response_with_redirect(uri)
r = Net::HTTP.get_response(uri)
if r.code == "301"
r = Net::HTTP.get_response(URI.parse(r['location']))
end
r
end

If you want to be even smarter you could try to add or remove missing backslashes to the URL when you get a 404 response. You could do that by creating a method like get_response_smart which handles this URL fiddling in addition to the redirects.

Ruby - net/http - following redirects

To follow redirects, you can do something like this (taken from ruby-doc)

Following Redirection

require 'net/http'
require 'uri'

def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0

url = URI.parse(uri_str)
req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
response = Net::HTTP.start(url.host, url.port, use_ssl: true) { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
response.error!
end
end

print fetch('http://www.ruby-lang.org/')

Ruby: Net::HTTP and redirects

It doesn't seem like Ruby issue but just www.defense.gov manner. https://www.defense.gov/News/Contracts/Contract-View/Article/14038760 gives redirect (301) and then 404 despite the way to get it.

https://www.defense.gov/News/Contracts/Contract-View/Article/14038760 seems like a url to some missing data but https://www.defense.gov/News/Contracts/Contract-View/Article/1403876/ works fine (actual for 26.17.2017 03:24 +7). Why do you think the url with id 14038760 is valid?

I've found out that https://www.defense.gov/News/Contracts/Contract-View/Article/1403876 redirects to https://www.defense.gov/News/Contracts/Contract-View/Article/1403876/ (the same url but with trailing slash) while the url with trailing slash gives 200 response immediately.

What you can do? Try to get here https://www.defense.gov/News/Contracts/source/nav/ a list of actual contracts first and then request each of them with separated requests.

Net::HTTP returning 404 when I know it's 301

According to the docs for Net::HTTP#request_head, you want to pass the path, not the full url, as the first parameter.

With that and a few other changes, here's one way to rewrite your method:

def obtain_final_url_in_chain(url)
uri = URI url
response = Net::HTTP.start(uri.host, uri.port) do |http|
http.request_head uri.path
end

case response
when Net::HTTPRedirection
obtain_final_url_in_chain response['location']
else
url
end
end

Net::HTTP follow maximum of three redirects?

Try this:

def self.get(url)
# TODO: test with https too
url = "http://#{url}" unless url.match(/^http/)

3.times do
uri = URI.parse(url)
if uri.respond_to?(:request_uri)
response = Net::HTTP.get_response(uri)
case response.code
when '301', '302'
url = response.header['location']
else
return response
end
end
end
end

net/http automatically redirects webpage to another language

This is called content negotiation - the web server redirects based on your request. pt (Portuguese) seems to be the default: (at least from my location)

$ curl -I https://www.zomato.com/grande-lisboa/fu-hao-massam%C3%A1
HTTP/1.1 301 Moved Permanently
Set-Cookie: zl=pt; ...
Location: https://www.zomato.com/pt/grande-lisboa/fu-hao-massam%C3%A1

You can request another language by sending an Accept-Language header. Here's the answer for Accept-Language: es (Spanish):

$ curl -I https://www.zomato.com/grande-lisboa/fu-hao-massam%C3%A1 -H "Accept-Language: es"
HTTP/1.1 301 Moved Permanently
Set-Cookie: zl=es_cl; ...
Location: https://www.zomato.com/es/grande-lisboa/fu-hao-massam%C3%A1

And here's the answer for Accept-Language: en (English):

$ curl -I https://www.zomato.com/grande-lisboa/fu-hao-massam%C3%A1 -H "Accept-Language: en"
HTTP/1.1 200 OK
Set-Cookie: zl=en; ...

This seems to be the resource you've been looking for.

In Ruby you'd use:

require 'nokogiri'
require 'open-uri'

url = 'https://www.zomato.com/grande-lisboa/fu-hao-massam%C3%A1'
headers = {'Accept-Language' => 'en'}

doc = Nokogiri::HTML(open(url, headers))
doc.at('html')[:lang]
#=> "en"


Related Topics



Leave a reply



Submit