Is there an easy way to request a URL in python and NOT follow redirects?
Here is the Requests way:
import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])
Python follow redirects and then download the page?
You might be better off with Requests library which has better APIs for controlling redirect handling:
https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history
Requests:
https://pypi.org/project/requests/ (urllib replacement for humans)
How to follow page redirects using requests
Use response.history
. From the documentation...
The Response.history list contains the Response objects that were
created in order to complete the request. The list is sorted from the
oldest to the most recent response.
So, to get the number of intermediate URLs, you could do something like:
response = requests.get(url)
print(len(response.history))
And to get what those URLs actually were and what their responses contain, you could do:
for resp in response.history:
print(resp.url, resp.text)
If needed, you can also submit a new request to the intermediate URLs with the optional parameter allow_redirects
set to False
:
r = requests.get(resp.url, allow_redirects=False)
Ignoring redirects when POSTing from Python Requests module?
By default, Django redirects to /accounts/profile/
after a successful login.
To stop requests from following the redirect, you should use allow_redirects=False
in the session.post()
call.
python 3.7 urllib.request doesn't follow redirect URL
The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
Back to the question - given that data
has been assigned, this automatically results in get_method
returning POST
(as per how this method was implemented), and since that the request method is POST
, and the response code is 307
, an HTTPError
is raised instead as per the above specification. In the context of Python's urllib
, this specific section of the urllib.request
module raises the exception.
For an experiment, try the following code:
import urllib.request
import urllib.parse
url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello' # comment out to not trigger manual redirect handling
try:
resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
if e.status != 307:
raise # not a status code that can be handled here
redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
resp = urllib.request.urlopen(redirected_url)
print('Redirected -> %s' % redirected_url) # the original redirected url
print('Response URL -> %s ' % resp.url) # the final url
Running the code as is may produce the following
Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get
Note the subsequent redirect to get
was done automatically, as the subsequent request was a GET
request. Commenting out req.data
assignment line will result in the lack of the "Redirected" output line.
Other notable things to note in the exception handling block, e.read()
may be done to retrieve the response body produced by the server as part of the HTTP 307
response (since data
was posted, there might be a short entity in the response that may be processed?), and that urljoin
is needed as the Location
header may be a relative URL (or simply has the host missing) to the subsequent resource.
Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:
- How to handle 307 redirection using urllib2 from http to https
- HTTP Error 307: Temporary Redirect in Python3 - INTRANET
- HTTP Error 307 - Temporary redirect in python script
Python requests not redirecting
Other answers mentioned before doesn't make your request redirect. The cause is you didn't use the correct request header. Try code below:
import requests
from bs4 import BeautifulSoup
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
}
page = requests.get('https://www.lexico.com/definition/agenesia', headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
print(page.url)
print(soup.find("span", {"class": "ind"}).get_text(), '\n')
print(soup.find("span", {"class": "pos"}).get_text())
And print:
https://www.lexico.com/definition/agenesis?s=t
Failure of development, or incomplete development, of a part of the body.
noun
Related Topics
Python - Pygame Error When Executing Exe File
Which Is More Preferable to Use: Lambda Functions or Nested Functions ('Def')
Opencv Real Time Streaming Video Capture Is Slow. How to Drop Frames or Get Synced with Real Time
What Does the Term "Broadcasting" Mean in Pandas Documentation
Count Consecutive Occurences of Values Varying in Length in a Numpy Array
Why Doesn't Os.Path.Join() Work in This Case
Calculating Arithmetic Mean (One Type of Average) in Python
Creating a Bat File for Python Script
Tkinter Vanishing Photoimage Issue
Conda Command Is Not Recognized on Windows 10
Nonlocal Keyword in Python 2.X
How to Request a Url in Python and Not Follow Redirects
Regex Error - Nothing to Repeat
Reactornotrestartable Error in While Loop with Scrapy
Appending to an Empty Dataframe in Pandas
What's the Difference Between Str.Isdigit, Isnumeric and Isdecimal in Python