A Very Simple Multithreading Parallel Url Fetching (Without Queue)

A very simple multithreading parallel URL fetching (without queue)

Simplifying your original version as far as possible:

import threading
import urllib2
import time

start = time.time()
urls = ["http://www.google.com", "http://www.apple.com", "http://www.microsoft.com", "http://www.amazon.com", "http://www.facebook.com"]

def fetch_url(url):
urlHandler = urllib2.urlopen(url)
html = urlHandler.read()
print "'%s\' fetched in %ss" % (url, (time.time() - start))

threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
for thread in threads:
thread.start()
for thread in threads:
thread.join()

print "Elapsed Time: %s" % (time.time() - start)

The only new tricks here are:

  • Keep track of the threads you create.
  • Don't bother with a counter of threads if you just want to know when they're all done; join already tells you that.
  • If you don't need any state or external API, you don't need a Thread subclass, just a target function.

J2EE environment- Multithreading - Fetching data from different services in parallel

A simple and elegant way is to use a fixed thread pool and Guava's ListenableFuture on which you can call Futures.successfulAsList:

private MyResult getResult(MyRequest request) {
ExecutorService es = Executors.newFixedThreadPool(3);
ListeningExecutorService les = MoreExecutorslisteningDecorator(es);

ListenableFuture<?> lf1 = les.submit(getCallableForService1(request));
ListenableFuture<?> lf2 = les.submit(getCallableForService2(request));
ListenableFuture<?> lf3 = les.submit(getCallableForService3(request));
ListenableFuture<List<?>> lfs = Futures.successfulAsList(lf1, lf2, lf3);

// wait 7 sec for results
List<?> res = lfs.get(7, TimeUnit.SEONDS);

return extractRes(res);
}

You should of course handle the correct types for the Callables.

Speed up multiple downloads with urllib2

import threading
import Queue # the correct module name is Queue

MAX_THREADS = 10
urls = Queue.Queue()

def downloadFile():
while not urls.empty()
u = urls.get_nowait()
job(u)

for url in your_url_list:
urls.put(url)

for i in range(0, MAX_THREADS + 1):
t = threading.Thread(target=downloadFile)
t.start()

Basically it imports threading and queu module, the Queu object will hold the data to be used across multiple threads, and each thread will execute the downloadFile() function.

Easy to understand, if it does not, let me know.



Related Topics



Leave a reply



Submit