How can I see the entire HTTP request that's being sent by my Python application?
A simple method: enable logging in recent versions of Requests (1.x and higher.)
Requests uses the http.client
and logging
module configuration to control logging verbosity, as described here.
Demonstration
Code excerpted from the linked documentation:
import requests
import logging
# These two lines enable debugging at httplib level (requests->urllib3->http.client)
# You will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.
# The only thing missing will be the response.body which is not logged.
try:
import http.client as http_client
except ImportError:
# Python 2
import httplib as http_client
http_client.HTTPConnection.debuglevel = 1
# You must initialize logging, otherwise you'll not see debug output.
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
requests.get('https://httpbin.org/headers')
Example Output
$ python requests-logging.py
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org
send: 'GET /headers HTTP/1.1\r\nHost: httpbin.org\r\nAccept-Encoding: gzip, deflate, compress\r\nAccept: */*\r\nUser-Agent: python-requests/1.2.0 CPython/2.7.3 Linux/3.2.0-48-generic\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: application/json
header: Date: Sat, 29 Jun 2013 11:19:34 GMT
header: Server: gunicorn/0.17.4
header: Content-Length: 226
header: Connection: keep-alive
DEBUG:requests.packages.urllib3.connectionpool:"GET /headers HTTP/1.1" 200 226
Python requests - print entire http request (raw)?
Since v1.2.3 Requests added the PreparedRequest object. As per the documentation "it contains the exact bytes that will be sent to the server".
One can use this to pretty print a request, like so:
import requests
req = requests.Request('POST','http://stackoverflow.com',headers={'X-Custom':'Test'},data='a=1&b=2')
prepared = req.prepare()
def pretty_print_POST(req):
"""
At this point it is completely built and ready
to be fired; it is "prepared".
However pay attention at the formatting used in
this function because it is programmed to be pretty
printed and may differ from the actual request.
"""
print('{}\n{}\r\n{}\r\n\r\n{}'.format(
'-----------START-----------',
req.method + ' ' + req.url,
'\r\n'.join('{}: {}'.format(k, v) for k, v in req.headers.items()),
req.body,
))
pretty_print_POST(prepared)
which produces:
-----------START-----------
POST http://stackoverflow.com/
Content-Length: 7
X-Custom: Test
a=1&b=2
Then you can send the actual request with this:
s = requests.Session()
s.send(prepared)
These links are to the latest documentation available, so they might change in content:
Advanced - Prepared requests and API - Lower level classes
log all http requests in python
See this answer from elsewhere on stackoverflow, if you're sure that all requests are using the requests
package:
https://stackoverflow.com/a/16337639/6709958
Essentially, you just need to activate logging.
Log all requests from the python-requests module
The underlying urllib3
library logs all new connections and URLs with the logging
module, but not POST
bodies. For GET
requests this should be enough:
import logging
logging.basicConfig(level=logging.DEBUG)
which gives you the most verbose logging option; see the logging HOWTO for more details on how to configure logging levels and destinations.
Short demo:
>>> import requests
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366
Depending on the exact version of urllib3, the following messages are logged:
INFO
: RedirectsWARN
: Connection pool full (if this happens often increase the connection pool size)WARN
: Failed to parse headers (response headers with invalid format)WARN
: Retrying the connectionWARN
: Certificate did not match expected hostnameWARN
: Received response with both Content-Length and Transfer-Encoding, when processing a chunked responseDEBUG
: New connections (HTTP or HTTPS)DEBUG
: Dropped connectionsDEBUG
: Connection details: method, path, HTTP version, status code and response lengthDEBUG
: Retry count increments
This doesn't include headers or bodies. urllib3
uses the http.client.HTTPConnection
class to do the grunt-work, but that class doesn't support logging, it can normally only be configured to print to stdout. However, you can rig it to send all debug information to logging instead by introducing an alternative print
name into that module:
import logging
import http.client
httpclient_logger = logging.getLogger("http.client")
def httpclient_logging_patch(level=logging.DEBUG):
"""Enable HTTPConnection debug logging to the logging framework"""
def httpclient_log(*args):
httpclient_logger.log(level, " ".join(args))
# mask the print() built-in in the http.client module to use
# logging instead
http.client.print = httpclient_log
# enable debugging
http.client.HTTPConnection.debuglevel = 1
Calling httpclient_logging_patch()
causes http.client
connections to output all debug information to a standard logger, and so are picked up by logging.basicConfig()
:
>>> httpclient_logging_patch()
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:http.client:send: b'GET /get?foo=bar&baz=python HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
DEBUG:http.client:reply: 'HTTP/1.1 200 OK\r\n'
DEBUG:http.client:header: Date: Tue, 04 Feb 2020 13:36:53 GMT
DEBUG:http.client:header: Content-Type: application/json
DEBUG:http.client:header: Content-Length: 366
DEBUG:http.client:header: Connection: keep-alive
DEBUG:http.client:header: Server: gunicorn/19.9.0
DEBUG:http.client:header: Access-Control-Allow-Origin: *
DEBUG:http.client:header: Access-Control-Allow-Credentials: true
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366
how to debug requests library?
Anatomy of an http response
Example (loading this page)
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Fri, 27 Sep 2013 19:22:41 GMT
Last-Modified: Fri, 27 Sep 2013 19:21:41 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Date: Fri, 27 Sep 2013 19:21:41 GMT
Content-Length: 12706
<!DOCTYPE html>
<html>
... truncated rest of body ...
- The first line is the status line and consists of the status code and status text.
- Headers are key/value pairs. Headers are ended with an empty new line. The empty line denotes there are no more headers and the start of the payload / body follows.
- body consumes the rest of the message.
The following explains how to extract the 3 parts:
Status Line
Use the following to get the status line sent back from the server
>>> bad_r = requests.get('http://httpbin.org/status/404')
>>> bad_r.status_code
404
>>> bad_r.raise_for_status()
Traceback (most recent call last):
File "requests/models.py", line 832, in raise_for_status
raise http_error
requests.exceptions.HTTPError: 404 Client Error
(source)
Headers:
r = requests.get('http://en.wikipedia.org/wiki/Monty_Python')
# response headers:
r.headers
# request headers:
r.request.headers
Body
Use r.text
.
Post Request Encoding
The 'content-type' you send to the server in the request should match the content-type you're actually sending. In your case, you are sending json but telling the server you're sending form data (which is the default if you do not specify).
From the headers you show above:
"Content-Type":"application/x-www-form-urlencoded",
But your request.post call sets data=json.dumps(data)
which is JSON. The headers should say:
"Content-type": "application/json",
How do I read a response from Python Requests?
Requests doesn't have an equivalent to Urlib2's read()
.
>>> import requests
>>> response = requests.get("http://www.google.com")
>>> print response.content
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head>....'
>>> print response.content == response.text
True
It looks like the POST request you are making is returning no content. Which is often the case with a POST request. Perhaps it set a cookie? The status code is telling you that the POST succeeded after all.
Edit for Python 3:
Python now handles data types differently. response.content
returns a sequence of bytes
(integers that represent ASCII) while response.text
is a string
(sequence of chars).
Thus,
>>> print response.content == response.text
False
>>> print str(response.content) == response.text
True
scraping python requests soraredata
Here is a solution I got to work.
import http.client
import json
import socket
import ssl
import urllib.request
hostname = "www.soraredata.com"
path = "/api/stats/newFullRankings/all/false/all/7/0/sr_football"
http_msg = "GET {path} HTTP/1.1\r\nHost: {host}\r\nAccept-Encoding: identity\r\nUser-Agent: python-urllib3/1.26.7\r\n\r\n".format(
host=hostname,
path=path
).encode("utf-8")
sock = socket.create_connection((hostname, 443), timeout=3.1)
context = ssl.create_default_context()
with sock:
with context.wrap_socket(sock, server_hostname=hostname) as ssock:
ssock.sendall(urllib3_msg)
response = http.client.HTTPResponse(ssock, method="GET")
response.begin()
print(response.status, response.reason)
data = response.read()
resp_data = json.loads(data.decode("utf-8"))
What was perplexing is that the HTTP message I used was the exact same one used by urllib3, as indicated when debugging the following code. (See the this answer for how to set up logging to debug requests, which also works for urllib3.)
Yet, this code gave a 403 HTTP status code.
import urllib3
http = urllib3.PoolManager()
r = http.request(
"GET",
"https://www.soraredata.com/api/stats/newFullRankings/all/false/all/7/0/sr_football",
)
assert r.status == 403
Moreover http.client
also gave a 403 status code, and it seems to be doing pretty much what I did above: wrap a socket in an SSL context and send the request.
conn = http.client.HTTPSConnection(hostname)
conn.request("GET", path)
res = conn.getresponse()
assert res.status == 403
Related Topics
What Is the Correct Way to Include Localisation in Python Packages
Settingwithcopywarning Even When Using .Loc[Row_Indexer,Col_Indexer] = Value
How to Rotate an Image Around an Off Center Pivot in Pygame
Running Unittest with Typical Test Directory Structure
Is It Worth Using Python's Re.Compile
Creating a Range of Dates in Python
Tkinter.Tclerror: Image "Pyimage3" Doesn't Exist
Encrypt & Decrypt Using Pycrypto Aes 256
How to Use Python Requests to Fake a Browser Visit A.K.A and Generate User Agent
How to One Hot Encode in Python
How to Efficiently Compare Two Unordered Lists (Not Sets)
Getting a Hidden Password Input
Python Dictionary: Are Keys() and Values() Always the Same Order
Create Own Colormap Using Matplotlib and Plot Color Scale