What Is the Fastest Way to Send 100,000 Http Requests in Python

What is the fastest way to send 100,000 HTTP requests in Python?

Twistedless solution:

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

concurrent = 200

def doWork():
while True:
url = q.get()
status, url = getStatus(url)
doSomethingWithResult(status, url)
q.task_done()

def getStatus(ourl):
try:
url = urlparse(ourl)
conn = httplib.HTTPConnection(url.netloc)
conn.request("HEAD", url.path)
res = conn.getresponse()
return res.status, ourl
except:
return "error", ourl

def doSomethingWithResult(status, url):
print status, url

q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
for url in open('urllist.txt'):
q.put(url.strip())
q.join()
except KeyboardInterrupt:
sys.exit(1)

This one is slighty faster than the twisted solution and uses less CPU.

What is the fastest way to send thousands of post requests with Python?

The way you've written your code, it waits for the response to one request before sending the next. (On top of that, it may not reuse the HTTP connections, meaning you have to deal with the socket creation/shutdown overhead for each request. Then again, depending on what you're testing, there's a good chance that actually makes it a better test.)

The simplest way to make multiple requests at the same time is to use threads. And the easiest way to do that is with concurrent.futures (or futures from PyPI, if you're using 2.x or 3.1):

with concurrent.futures.ThreadPoolExecutor(max_workers=10) as pool:
results = pool.map(post, words)
concurrent.futures.wait(results)

If you prefer, you can write your own threads and just give each thread 1/10th of words and have it loop over calling post:

def posts(words):
for word in words:
post(word)

groupsize = len(words)/10
t = [threading.Thread(target=posts, args=[words[i*groupsize:(i+1)*groupsize]]
for i in range(10)]
for thread in t:
thread.start()
for thread in t:
thread.join()

Either way, obviously I just pulled that number 10 out of thin air (because it's a little more than the max simultaneous connections most browsers or web service clients will allow you to create), but you'll want to do some performance testing to find the best value.

If it turns out that the best value is huge, like 500 or something, you may be running into the limits of what you can do with threading. In that case, you should consider using greenlets. The simplest way to do this is with gevent—and the simplest to do that is to rewrite your code to use grequests instead of urllib2.

Meanwhile, if the actual reads are wasting time, and you don't actually need the responses, and they're reasonably big, and you're not trying to test the server's ability to send real responses, you may want to close the socket as soon as you know you're going to get the right data. You can do this with urllib2 by writing your own handlers, but that sounds like a lot of work. I think it would actually be simpler, in this case, to just drop down to the level of sockets. First, record the request that gets sent for each POST, and the expected 200 line that you get back when things work. Then do something like this:

with closing(socket.socket()) as c:
c.connect(('127.0.0.1', 8000))
c.send(REQUEST_STRING_FORMAT.format([word]))
with c.makefile() as f:
response = f.readline()
if response != RESPONSE_200_STRING:
response += f.read()
with open('error.html','w') as k:
k.write(response)


Related Topics



Leave a reply



Submit