Ruby Net::HTTP - following 301 redirects
301 redirects are fairly common if you do not type the URL exactly as the web server expects it. They happen much more frequently than you'd think, you just don't normally ever notice them while browsing because the browser does all that automatically for you.
Two alternatives come to mind:
1: Use open-uri
open-uri
handles redirects automatically. So all you'd need to do is:
require 'open-uri'
...
response = open('http://xyz...').read
If you have trouble redirecting between HTTP and HTTPS, then have a look here for a solution:
Ruby open-uri redirect forbidden
2: Handle redirects with Net::HTTP
def get_response_with_redirect(uri)
r = Net::HTTP.get_response(uri)
if r.code == "301"
r = Net::HTTP.get_response(URI.parse(r['location']))
end
r
end
If you want to be even smarter you could try to add or remove missing backslashes to the URL when you get a 404 response. You could do that by creating a method like get_response_smart
which handles this URL fiddling in addition to the redirects.
Ruby - net/http - following redirects
To follow redirects, you can do something like this (taken from ruby-doc)
Following Redirection
require 'net/http'
require 'uri'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(uri_str)
req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
response = Net::HTTP.start(url.host, url.port, use_ssl: true) { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
response.error!
end
end
print fetch('http://www.ruby-lang.org/')
Ruby: Net::HTTP and redirects
It doesn't seem like Ruby issue but just www.defense.gov manner. https://www.defense.gov/News/Contracts/Contract-View/Article/14038760
gives redirect (301) and then 404 despite the way to get it.
https://www.defense.gov/News/Contracts/Contract-View/Article/14038760
seems like a url to some missing data but https://www.defense.gov/News/Contracts/Contract-View/Article/1403876/
works fine (actual for 26.17.2017 03:24 +7). Why do you think the url with id 14038760 is valid?
I've found out that https://www.defense.gov/News/Contracts/Contract-View/Article/1403876
redirects to https://www.defense.gov/News/Contracts/Contract-View/Article/1403876/
(the same url but with trailing slash) while the url with trailing slash gives 200 response immediately.
What you can do? Try to get here https://www.defense.gov/News/Contracts/source/nav/
a list of actual contracts first and then request each of them with separated requests.
Net::HTTP returning 404 when I know it's 301
According to the docs for Net::HTTP#request_head, you want to pass the path, not the full url, as the first parameter.
With that and a few other changes, here's one way to rewrite your method:
def obtain_final_url_in_chain(url)
uri = URI url
response = Net::HTTP.start(uri.host, uri.port) do |http|
http.request_head uri.path
end
case response
when Net::HTTPRedirection
obtain_final_url_in_chain response['location']
else
url
end
end
Net::HTTP follow maximum of three redirects?
Try this:
def self.get(url)
# TODO: test with https too
url = "http://#{url}" unless url.match(/^http/)
3.times do
uri = URI.parse(url)
if uri.respond_to?(:request_uri)
response = Net::HTTP.get_response(uri)
case response.code
when '301', '302'
url = response.header['location']
else
return response
end
end
end
end
net/http automatically redirects webpage to another language
This is called content negotiation - the web server redirects based on your request. pt
(Portuguese) seems to be the default: (at least from my location)
$ curl -I https://www.zomato.com/grande-lisboa/fu-hao-massam%C3%A1
HTTP/1.1 301 Moved Permanently
Set-Cookie: zl=pt; ...
Location: https://www.zomato.com/pt/grande-lisboa/fu-hao-massam%C3%A1
You can request another language by sending an Accept-Language
header. Here's the answer for Accept-Language: es
(Spanish):
$ curl -I https://www.zomato.com/grande-lisboa/fu-hao-massam%C3%A1 -H "Accept-Language: es"
HTTP/1.1 301 Moved Permanently
Set-Cookie: zl=es_cl; ...
Location: https://www.zomato.com/es/grande-lisboa/fu-hao-massam%C3%A1
And here's the answer for Accept-Language: en
(English):
$ curl -I https://www.zomato.com/grande-lisboa/fu-hao-massam%C3%A1 -H "Accept-Language: en"
HTTP/1.1 200 OK
Set-Cookie: zl=en; ...
This seems to be the resource you've been looking for.
In Ruby you'd use:
require 'nokogiri'
require 'open-uri'
url = 'https://www.zomato.com/grande-lisboa/fu-hao-massam%C3%A1'
headers = {'Accept-Language' => 'en'}
doc = Nokogiri::HTML(open(url, headers))
doc.at('html')[:lang]
#=> "en"
Related Topics
How to Remove Gem from Ruby on Rails Application
Fail to Bundle Install Puma 4.3.5 or Gem Puma with Ruby-2.6.6 on MACos-10.15.6
Ruby on Rails: How to Edit Database.Yml for Postgresql
How to Send Email via Smtp with Ruby's Mail Gem
How to Generate a Human Readable Time Range Using Ruby on Rails
Differencebetween 'After_Create' and 'After_Save' and When to Use Which
Encoding::Undefinedconversionerror
How to Find the Unique Elements in an Array in Ruby
Rails and Os X: How to Install Rmagick
Rails 3 - No Such File to Load -- Openssl
Best Way to Highlight Current Page in Rails 3? - Apply a CSS Class to Links Conditionally
Gem.Source_Index Is Deprecated, Use Specification. Should I Re-Install Gem or Rails
Simple Encryption in Ruby Without External Gems
Rails on Windows Is So Slow (Rails -V Takes 4 Seconds)
Forming Sanitary Shell Commands or System Calls in Ruby
Rails: Postgres Permission Denied to Create Database on Rake Db:Create:All