How to See the Entire Http Request That's Being Sent by My Python Application

How can I see the entire HTTP request that's being sent by my Python application?

A simple method: enable logging in recent versions of Requests (1.x and higher.)

Requests uses the http.client and logging module configuration to control logging verbosity, as described here.

Demonstration

Code excerpted from the linked documentation:

import requests
import logging

# These two lines enable debugging at httplib level (requests->urllib3->http.client)
# You will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.
# The only thing missing will be the response.body which is not logged.
try:
import http.client as http_client
except ImportError:
# Python 2
import httplib as http_client
http_client.HTTPConnection.debuglevel = 1

# You must initialize logging, otherwise you'll not see debug output.
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

requests.get('https://httpbin.org/headers')

Example Output

$ python requests-logging.py 
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org
send: 'GET /headers HTTP/1.1\r\nHost: httpbin.org\r\nAccept-Encoding: gzip, deflate, compress\r\nAccept: */*\r\nUser-Agent: python-requests/1.2.0 CPython/2.7.3 Linux/3.2.0-48-generic\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: application/json
header: Date: Sat, 29 Jun 2013 11:19:34 GMT
header: Server: gunicorn/0.17.4
header: Content-Length: 226
header: Connection: keep-alive
DEBUG:requests.packages.urllib3.connectionpool:"GET /headers HTTP/1.1" 200 226

Python requests - print entire http request (raw)?

Since v1.2.3 Requests added the PreparedRequest object. As per the documentation "it contains the exact bytes that will be sent to the server".

One can use this to pretty print a request, like so:

import requests

req = requests.Request('POST','http://stackoverflow.com',headers={'X-Custom':'Test'},data='a=1&b=2')
prepared = req.prepare()

def pretty_print_POST(req):
"""
At this point it is completely built and ready
to be fired; it is "prepared".

However pay attention at the formatting used in
this function because it is programmed to be pretty
printed and may differ from the actual request.
"""
print('{}\n{}\r\n{}\r\n\r\n{}'.format(
'-----------START-----------',
req.method + ' ' + req.url,
'\r\n'.join('{}: {}'.format(k, v) for k, v in req.headers.items()),
req.body,
))

pretty_print_POST(prepared)

which produces:

-----------START-----------
POST http://stackoverflow.com/
Content-Length: 7
X-Custom: Test

a=1&b=2

Then you can send the actual request with this:

s = requests.Session()
s.send(prepared)

These links are to the latest documentation available, so they might change in content:
Advanced - Prepared requests and API - Lower level classes

log all http requests in python

See this answer from elsewhere on stackoverflow, if you're sure that all requests are using the requests package:

https://stackoverflow.com/a/16337639/6709958

Essentially, you just need to activate logging.

Log all requests from the python-requests module

The underlying urllib3 library logs all new connections and URLs with the logging module, but not POST bodies. For GET requests this should be enough:

import logging

logging.basicConfig(level=logging.DEBUG)

which gives you the most verbose logging option; see the logging HOWTO for more details on how to configure logging levels and destinations.

Short demo:

>>> import requests
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366

Depending on the exact version of urllib3, the following messages are logged:

  • INFO: Redirects
  • WARN: Connection pool full (if this happens often increase the connection pool size)
  • WARN: Failed to parse headers (response headers with invalid format)
  • WARN: Retrying the connection
  • WARN: Certificate did not match expected hostname
  • WARN: Received response with both Content-Length and Transfer-Encoding, when processing a chunked response
  • DEBUG: New connections (HTTP or HTTPS)
  • DEBUG: Dropped connections
  • DEBUG: Connection details: method, path, HTTP version, status code and response length
  • DEBUG: Retry count increments

This doesn't include headers or bodies. urllib3 uses the http.client.HTTPConnection class to do the grunt-work, but that class doesn't support logging, it can normally only be configured to print to stdout. However, you can rig it to send all debug information to logging instead by introducing an alternative print name into that module:

import logging
import http.client

httpclient_logger = logging.getLogger("http.client")

def httpclient_logging_patch(level=logging.DEBUG):
"""Enable HTTPConnection debug logging to the logging framework"""

def httpclient_log(*args):
httpclient_logger.log(level, " ".join(args))

# mask the print() built-in in the http.client module to use
# logging instead
http.client.print = httpclient_log
# enable debugging
http.client.HTTPConnection.debuglevel = 1

Calling httpclient_logging_patch() causes http.client connections to output all debug information to a standard logger, and so are picked up by logging.basicConfig():

>>> httpclient_logging_patch()
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:http.client:send: b'GET /get?foo=bar&baz=python HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
DEBUG:http.client:reply: 'HTTP/1.1 200 OK\r\n'
DEBUG:http.client:header: Date: Tue, 04 Feb 2020 13:36:53 GMT
DEBUG:http.client:header: Content-Type: application/json
DEBUG:http.client:header: Content-Length: 366
DEBUG:http.client:header: Connection: keep-alive
DEBUG:http.client:header: Server: gunicorn/19.9.0
DEBUG:http.client:header: Access-Control-Allow-Origin: *
DEBUG:http.client:header: Access-Control-Allow-Credentials: true
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366

how to debug requests library?

Anatomy of an http response

Example (loading this page)

HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Fri, 27 Sep 2013 19:22:41 GMT
Last-Modified: Fri, 27 Sep 2013 19:21:41 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Date: Fri, 27 Sep 2013 19:21:41 GMT
Content-Length: 12706

<!DOCTYPE html>
<html>
... truncated rest of body ...
  1. The first line is the status line and consists of the status code and status text.
  2. Headers are key/value pairs. Headers are ended with an empty new line. The empty line denotes there are no more headers and the start of the payload / body follows.
  3. body consumes the rest of the message.

The following explains how to extract the 3 parts:

Status Line

Use the following to get the status line sent back from the server

>>> bad_r = requests.get('http://httpbin.org/status/404')
>>> bad_r.status_code
404

>>> bad_r.raise_for_status()
Traceback (most recent call last):
File "requests/models.py", line 832, in raise_for_status
raise http_error
requests.exceptions.HTTPError: 404 Client Error

(source)

Headers:

r = requests.get('http://en.wikipedia.org/wiki/Monty_Python')
# response headers:
r.headers
# request headers:
r.request.headers

Body

Use r.text.

Post Request Encoding

The 'content-type' you send to the server in the request should match the content-type you're actually sending. In your case, you are sending json but telling the server you're sending form data (which is the default if you do not specify).

From the headers you show above:

"Content-Type":"application/x-www-form-urlencoded",

But your request.post call sets data=json.dumps(data) which is JSON. The headers should say:

"Content-type": "application/json",

How do I read a response from Python Requests?

Requests doesn't have an equivalent to Urlib2's read().

>>> import requests
>>> response = requests.get("http://www.google.com")
>>> print response.content
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head>....'
>>> print response.content == response.text
True

It looks like the POST request you are making is returning no content. Which is often the case with a POST request. Perhaps it set a cookie? The status code is telling you that the POST succeeded after all.

Edit for Python 3:

Python now handles data types differently. response.content returns a sequence of bytes (integers that represent ASCII) while response.text is a string (sequence of chars).

Thus,

>>> print response.content == response.text
False

>>> print str(response.content) == response.text
True

scraping python requests soraredata

Here is a solution I got to work.

import http.client
import json
import socket
import ssl
import urllib.request

hostname = "www.soraredata.com"
path = "/api/stats/newFullRankings/all/false/all/7/0/sr_football"
http_msg = "GET {path} HTTP/1.1\r\nHost: {host}\r\nAccept-Encoding: identity\r\nUser-Agent: python-urllib3/1.26.7\r\n\r\n".format(
host=hostname,
path=path
).encode("utf-8")

sock = socket.create_connection((hostname, 443), timeout=3.1)
context = ssl.create_default_context()

with sock:
with context.wrap_socket(sock, server_hostname=hostname) as ssock:
ssock.sendall(urllib3_msg)
response = http.client.HTTPResponse(ssock, method="GET")
response.begin()
print(response.status, response.reason)
data = response.read()

resp_data = json.loads(data.decode("utf-8"))

What was perplexing is that the HTTP message I used was the exact same one used by urllib3, as indicated when debugging the following code. (See the this answer for how to set up logging to debug requests, which also works for urllib3.)

Yet, this code gave a 403 HTTP status code.

import urllib3

http = urllib3.PoolManager()

r = http.request(
"GET",
"https://www.soraredata.com/api/stats/newFullRankings/all/false/all/7/0/sr_football",
)
assert r.status == 403

Moreover http.client also gave a 403 status code, and it seems to be doing pretty much what I did above: wrap a socket in an SSL context and send the request.

conn = http.client.HTTPSConnection(hostname)
conn.request("GET", path)
res = conn.getresponse()
assert res.status == 403


Related Topics



Leave a reply



Submit