How do I prevent Python's urllib(2) from following a redirect
You could do a couple of things:
- Build your own HTTPRedirectHandler that intercepts each redirect
- Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.
This is a quick little thing that shows both
import urllib2
#redirect_handler = urllib2.HTTPRedirectHandler()
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Cookie Manip Right Here"
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
response =urllib2.urlopen("WHEREEVER")
print response.read()
print cookieprocessor.cookiejar
Is there an easy way to request a URL in python and NOT follow redirects?
Here is the Requests way:
import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])
How to make python urllib2 follow redirect and keep post method
This is actually a really bad thing to do the more I thought about it. For instance, if I submit a form to
http://example.com/add (with post data to add a item)
and the response is a 302 redirect to http://example.com/add and I post the same data that I posted the first time I will end up in an infinite loop. Not sure why I didn't think of this before. I'll leave the question here just as a warning to anyone else thinking about doing this.
python 3.7 urllib.request doesn't follow redirect URL
The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
Back to the question - given that data
has been assigned, this automatically results in get_method
returning POST
(as per how this method was implemented), and since that the request method is POST
, and the response code is 307
, an HTTPError
is raised instead as per the above specification. In the context of Python's urllib
, this specific section of the urllib.request
module raises the exception.
For an experiment, try the following code:
import urllib.request
import urllib.parse
url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello' # comment out to not trigger manual redirect handling
try:
resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
if e.status != 307:
raise # not a status code that can be handled here
redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
resp = urllib.request.urlopen(redirected_url)
print('Redirected -> %s' % redirected_url) # the original redirected url
print('Response URL -> %s ' % resp.url) # the final url
Running the code as is may produce the following
Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get
Note the subsequent redirect to get
was done automatically, as the subsequent request was a GET
request. Commenting out req.data
assignment line will result in the lack of the "Redirected" output line.
Other notable things to note in the exception handling block, e.read()
may be done to retrieve the response body produced by the server as part of the HTTP 307
response (since data
was posted, there might be a short entity in the response that may be processed?), and that urljoin
is needed as the Location
header may be a relative URL (or simply has the host missing) to the subsequent resource.
Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:
- How to handle 307 redirection using urllib2 from http to https
- HTTP Error 307: Temporary Redirect in Python3 - INTRANET
- HTTP Error 307 - Temporary redirect in python script
How to avoid country-based redirects with urlopen or urllib2 in Python
I would use mechanize, http://wwwsearch.sourceforge.net/mechanize/
And you can use
# Don't handle Refresh redirections
br.set_handle_refresh(False)
Where 'br' is the variable associated with the open webpage. Mechanize also has proxy support
Related Topics
Numpy - Create Matrix with Rows of Vector
Why Isn't Python Very Good for Functional Programming
How to Make Urllib2 Requests Through Tor in Python
Sqlalchemy: What's the Difference Between Flush() and Commit()
Solving "Dll Load Failed: %1 Is Not a Valid Win32 Application." for Pygame
Save/Load Scipy Sparse Csr_Matrix in Portable Data Format
How to Set the Figure Title and Axes Labels Font Size in Matplotlib
Two Values from One Input in Python
How to Convert List of Key-Value Tuples into Dictionary
How to Print a List in Python "Nicely"
How to Use Pip to Install a Package from a Private Github Repository
Python-Requests Close Http Connection