How to Send a Head Http Request in Python 2

How do you send a HEAD HTTP request in Python 2?

edit: This answer works, but nowadays you should just use the requests library as mentioned by other answers below.


Use httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

There's also a getheader(name) to get a specific header.

Making HTTP HEAD request with urllib2 from Python 2

This works just fine:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
print response.info()

Tested with quick and dirty HTTPd hacked in python:

Server: BaseHTTP/0.3 Python/2.6.6
Date: Sun, 12 Dec 2010 11:52:33 GMT
Content-type: text/html
X-REQUEST_METHOD: HEAD

I've added a custom header field X-REQUEST_METHOD to show it works :)

Here is HTTPd log:

Sun Dec 12 12:52:28 2010 Server Starts - localhost:8080
localhost.localdomain - - [12/Dec/2010 12:52:33] "HEAD / HTTP/1.1" 200 -

Edit: there is also httplib2

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')

How do I make a HTTP HEAD request using requests?

You use the head() function:

r = requests.head(INSERT_URL_HERE)

How do you send a HEAD HTTP request with Tornado?

Add method="HEAD" to your AsyncHTTPClient.fetch() call.

response = await http_client.fetch("http://example.com", method="HEAD")

Getting HEAD content with Python Requests

By definition, the responses to HEAD requests do not contain a message-body.

Send a GET request if you want to, well, get a response body. Send a HEAD request iff you are only interested in the response status code and headers.

HTTP transfers arbitrary content; the HTTP term header is completely unrelated to an HTML <head>. However, HTTP can be advised to download only a part of the document. If you know the length of the HTML <head> code (or an upper boundary therefor), you can include an HTTP Range header in your request that advises the remote server to only return a certain number of bytes. If the remote server supports HTTP ranges, it will then serve the reduced answer.

What is the fastest way to send 100,000 HTTP requests in Python?

Twistedless solution:

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

concurrent = 200

def doWork():
while True:
url = q.get()
status, url = getStatus(url)
doSomethingWithResult(status, url)
q.task_done()

def getStatus(ourl):
try:
url = urlparse(ourl)
conn = httplib.HTTPConnection(url.netloc)
conn.request("HEAD", url.path)
res = conn.getresponse()
return res.status, ourl
except:
return "error", ourl

def doSomethingWithResult(status, url):
print status, url

q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
for url in open('urllist.txt'):
q.put(url.strip())
q.join()
except KeyboardInterrupt:
sys.exit(1)

This one is slighty faster than the twisted solution and uses less CPU.

HEAD request in python not working as desired

The problem you see has nothing to do with Python. The website itself seems to require something more than just a HEAD request. Even a simple telnet session results in the error:

$ telnet www.nativeseeds.org 80
Trying 208.113.230.85...
Connected to www.nativeseeds.org (208.113.230.85).
Escape character is '^]'.
HEAD / HTTP/1.1
Host: www.nativeseeds.org

HTTP/1.1 503 Service Temporarily Unavailable
Date: Wed, 26 Sep 2012 14:29:33 GMT
Server: Apache
Vary: Accept-Encoding
Connection: close
Content-Type: text/html; charset=iso-8859-1

Try adding some more headers; the http command line client does get a 200 response:

$ http -v head http://www.nativeseeds.org
HEAD / HTTP/1.1
Host: www.nativeseeds.org
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Accept-Encoding: identity, deflate, compress, gzip
Accept: */*
User-Agent: HTTPie/0.2.2

HTTP/1.1 200 OK
Date: Wed, 26 Sep 2012 14:33:21 GMT
Server: Apache
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires: Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control: post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: f65129b0cd2c5e10c387f919ac90ad66=34hOijDSzeskKYtULx9V83; path=/
Last-Modified: Wed, 26 Sep 2012 14:33:23 GMT
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 20
Content-Type: text/html; charset=utf-8

Python HTTP HEAD - dealing with redirects properly?

Good question! If you're set on using urllib2, you'll want to look at this answer about the construction of your own redirect handler.

In short (read: blatantly stolen from the previous answer):

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Cookie Manip Right Here"
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar

Also, as mentioned in the errata, you can use Python Requests.

Why am I able to read a HEAD http request in python 3 urllib.request?

The http://www.google.com URL redirects:

$ curl -D - -X HEAD http://www.google.com
HTTP/1.1 302 Found
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Location: http://www.google.co.uk/?gfe_rd=cr&ei=A8sXVZLOGvHH8ge1jYKwDQ
Content-Length: 261
Date: Sun, 29 Mar 2015 09:50:59 GMT
Server: GFE/2.0
Alternate-Protocol: 80:quic,p=0.5

and urllib.request has followed the redirect, issuing a GET request to that new location:

>>> import urllib.request
>>> req = urllib.request.Request("http://www.google.com", method="HEAD")
>>> resp = urllib.request.urlopen(req)
>>> resp.url
'http://www.google.co.uk/?gfe_rd=cr&ei=ucoXVdfaJOTH8gf-voKwBw'

You'd have to build your own handler stack to prevent this; the HTTPRedirectHandler isn't smart enough to not handle a redirect when issuing a HEAD method action. Adapting the example from Alan Duan from How do I prevent Python's urllib(2) from following a redirect to Python 3 would give you:

import urllib.request

class NoRedirection(urllib.request.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response

opener = urllib.request.build_opener(NoRedirection)

req = urllib.request.Request("http://www.google.com", method="HEAD")
resp = opener.open(req)

You'd be better of using the requests library; it explicitly sets allow_redirects=False when using the requests.head() or requests.Session().head() callables, so there you can see the original result:

>>> import requests
>>> requests.head('http://www.google.com')
<Response [302]>
>>> _.headers['Location']
'http://www.google.co.uk/?gfe_rd=cr&ei=FcwXVbepMvHH8ge1jYKwDQ'

and even if redirection is enabled the response.history list gives you access to the intermediate requests, and requests uses the correct method for the redirected call too:

>>> response = requests.head('http://www.google.com', allow_redirects=True)
>>> response.url
'http://www.google.co.uk/?gfe_rd=cr&ei=8e0XVYfGMubH8gfJnoKoDQ'
>>> response.history
[<Response [302]>]
>>> response.history[0].url
'http://www.google.com/'
>>> response.request.method
'HEAD'


Related Topics



Leave a reply



Submit