Python urllib2 with keep alive
Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:
>>> import urllib2
>>> from urlgrabber.keepalive import HTTPHandler
>>> keepalive_handler = HTTPHandler()
>>> opener = urllib2.build_opener(keepalive_handler)
>>> urllib2.install_opener(opener)
>>>
>>> fo = urllib2.urlopen('http://www.python.org')
Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1
There is a port of the keepalive module to Python 3.
What is the best way to use HTTP Keep-Alive in Python 2.7
I would suggest to use requests library. It has Keep-Alive support in addition to many other features.
python keep alive response object
@ScottHunter in your solution also we are reading the lines which read already,only thing we are doing is reading lines and skipping if it already read.
So the solution I implemented is- read limited characters at a time using readline with character limit
from urllib2 import urlopen
response = urlopen(url)
while True:
line = response.readline(4096)
if not line:
break
do_some_job(line)
response.close()
Persistence of urllib.request connections to a HTTP server
urllib.request
doesn't support persistent connections. There is 'Connection: close'
hardcoded in the code. But http.client
partially supports persistent connections (including legacy http/1.0 keep-alive
). So the question title might be misleading.
I want to do some performance testing on one of our web servers, to see how the server handles a lot of persistent connections. Unfortunately, I'm not terribly familiar with HTTP and web testing.
You could use an existing http testing tools such as slowloris, httperf instead of writing one yourself.
How do I keep these connections alive?
To close http/1.1 connection a client should explicitly specify Connection: close
header otherwise the connection is considered persistent by the server (though it may close it at any moment and http.client
won't know about it until it tries to read/write to the connection).
conn.connect()
returns almost immediately and your thread ends. To force each thread to maintain an http connection to the server you could:
import time
def make_http_connection(*args, **kwargs):
while True: # make new http connections
h = http.client.HTTPConnection(*args, **kwargs)
while True: # make multiple requests using a single connection
try:
h.request('GET', '/') # send request; make conn. on the first run
response = h.getresponse()
while True: # read response slooowly
b = response.read(1) # read 1 byte
if not b:
break
time.sleep(60) # wait a minute before reading next byte
#note: the whole minute might pass before we notice that
# the server has closed the connection already
except Exception:
break # make new connection on any error
Note: if the server returns 'Connection: close'
then there is a single request per connection.
(Also, on an unrelated note, is there a better procedure for waiting for a keyboard interrupt than the ugly while True: block at the end of my code?)
To wait until all threads finish or KeyboardInterrupt
happens you could:
while threads:
try:
for t in threads[:]: # enumerate threads
t.join(.1) # timeout 0.1 seconds
if not t.is_alive():
threads.remove(t)
except KeyboardInterrupt:
break
Or something like this:
while threading.active_count() > 1:
try:
main_thread = threading.current_thread()
for t in threading.enumerate(): # enumerate all alive threads
if t is not main_thread:
t.join(.1)
except KeyboardInterrupt:
break
The later might not work for various reasons e.g., if there are dummy threads such as threads that started in C extensions without using threading
module.
concurrent.futures.ThreadPoolExecutor provides a higher abstraction level than threading
module and it can hide some complexity.
Instead of thread per connection model you could open multiple connections concurrently in a single thread e.g., using requests.async
or gevent
directly.
How to Speed Up Python's urllib2 when doing multiple requests
If you switch to httplib, you will have finer control over the underlying connection.
For example:
import httplib
conn = httplib.HTTPConnection(url)
conn.request('GET', '/foo')
r1 = conn.getresponse()
r1.read()
conn.request('GET', '/bar')
r2 = conn.getresponse()
r2.read()
conn.close()
This would send 2 HTTP GETs on the same underlying TCP connection.
Why aren't persistent connections supported by URLLib2?
It's a well-known limit of urllib2 (and urllib as well). IMHO the best attempt so far to fix it and make it right is Garry Bodsworth's coda_network for Python 2.6 or 2.7 -- replacement, patched versions of urllib2 (and some other modules) to support keep-alive (and a bunch of other smaller but quite welcome fixes).
Python urllib2 - Freezes when connection temporarily dies
According to the docs, the default timeout is, indeed, no timeout. You can specify a timeout when calling urlopen though. :)
page = urllib2.urlopen(req, timeout=30)
Related Topics
Splitting on Last Delimiter in Python String
Instance Attribute Attribute_Name Defined Outside _Init_
Having Trouble Making a List of Lists of a Designated Size
Find Nearest Indices for One Array Against All Values in Another Array - Python/Numpy
Different Meanings of Brackets in Python
Calculate Time Difference Between Pandas Dataframe Indices
Read a Small Random Sample from a Big CSV File into a Python Data Frame
How to Make an Image with a Transparent Backround in Pygame
How to Edit a Seaborn Legend Title and Labels for Figure-Level Functions
How to Change Effective Process Name in Python
How to Show Explosion Image When Collision Happens
Solving Embarassingly Parallel Problems Using Python Multiprocessing
Python - Download Images from Google Image Search
Using a Dictionary to Select Function to Execute
Opencv Python Rotate Image by X Degrees Around Specific Point