A very simple multithreading parallel URL fetching (without queue)
Simplifying your original version as far as possible:
import threading
import urllib2
import time
start = time.time()
urls = ["http://www.google.com", "http://www.apple.com", "http://www.microsoft.com", "http://www.amazon.com", "http://www.facebook.com"]
def fetch_url(url):
urlHandler = urllib2.urlopen(url)
html = urlHandler.read()
print "'%s\' fetched in %ss" % (url, (time.time() - start))
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print "Elapsed Time: %s" % (time.time() - start)
The only new tricks here are:
- Keep track of the threads you create.
- Don't bother with a counter of threads if you just want to know when they're all done;
join
already tells you that. - If you don't need any state or external API, you don't need a
Thread
subclass, just atarget
function.
J2EE environment- Multithreading - Fetching data from different services in parallel
A simple and elegant way is to use a fixed thread pool and Guava's ListenableFuture
on which you can call Futures.successfulAsList
:
private MyResult getResult(MyRequest request) {
ExecutorService es = Executors.newFixedThreadPool(3);
ListeningExecutorService les = MoreExecutorslisteningDecorator(es);
ListenableFuture<?> lf1 = les.submit(getCallableForService1(request));
ListenableFuture<?> lf2 = les.submit(getCallableForService2(request));
ListenableFuture<?> lf3 = les.submit(getCallableForService3(request));
ListenableFuture<List<?>> lfs = Futures.successfulAsList(lf1, lf2, lf3);
// wait 7 sec for results
List<?> res = lfs.get(7, TimeUnit.SEONDS);
return extractRes(res);
}
You should of course handle the correct types for the Callable
s.
Speed up multiple downloads with urllib2
import threading
import Queue # the correct module name is Queue
MAX_THREADS = 10
urls = Queue.Queue()
def downloadFile():
while not urls.empty()
u = urls.get_nowait()
job(u)
for url in your_url_list:
urls.put(url)
for i in range(0, MAX_THREADS + 1):
t = threading.Thread(target=downloadFile)
t.start()
Basically it imports threading and queu module, the Queu object will hold the data to be used across multiple threads, and each thread will execute the downloadFile() function.
Easy to understand, if it does not, let me know.
Related Topics
How to Set Env Variable in Jupyter Notebook
How to Join Two Wav Files Using Python
Python Datetime Object Show Wrong Timezone Offset
Why Don't Methods Have Reference Equality
List() Uses Slightly More Memory Than List Comprehension
Valueerror: Numpy.Dtype Has the Wrong Size, Try Recompiling
Importerror: No Module Named Crypto.Cipher
How to Extract Top-Level Domain Name (Tld) from Url
Generate a Random Letter in Python
How to Loop Through All But the Last Item of a List
Getting One Value from a Tuple
How to Extract an Arbitrary Line of Values from a Numpy Array
Python: Best Way to Add to Sys.Path Relative to the Current Running Script
Generating Matplotlib Graphs Without a Running X Server
Opencv Error: (-215)Size.Width>0 && Size.Height>0 in Function Imshow