Reusing the Same Curl Handle. Big Performance Increase

Reusing the same curl handle. Big performance increase?

It depends on if the urls are on same servers or not. If they are, concurrent requests to same server will reuse the connection. see CURLOPT_FORBID_REUSE.

If the urls are sometimes on same server you need to sort the urls as the default connection cache is limited to ten or twenty connections.

If they are on different servers there is no speed advantage on using the same handle.

With curl_multi_exec you can connect to different servers at a same time (parallel). Even then you need some queuing to not use thousands of simultaneous connections.

Will PHP's CURL ever reuse connection for curl requests of different handles?

Maybe this would answer your question:

curl.haxx.se/docs/faq.html#What_about_Keep_Alive_or_persist

curl and libcurl have excellent support for persistent connections when
transferring several files from the same server. Curl will attempt to reuse
connections for all URLs specified on the same command line/config file, and
libcurl will reuse connections for all transfers that are made using the same
libcurl handle.

Should I close cURL or not?

There's a performance increase to reusing the same handle. See: Reusing the same curl handle. Big performance increase?

If you don't need the requests to be synchronous, consider using the curl_multi_* functions (e.g. curl_multi_init, curl_multi_exec, etc.) which also provide a big performance boost.

UPDATE:

I tried benching curl with using a new handle for each request and using the same handle with the following code:

ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
for ($i = 0; $i < 100; ++$i) {
$rand = rand();
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.google.com/?rand=" . $rand);
curl_exec($ch);
curl_close($ch);
}
$end_time = microtime(true);
ob_end_clean();
echo 'Curl without handle reuse: ' . ($end_time - $start_time) . '<br>';

ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
$ch = curl_init();
for ($i = 0; $i < 100; ++$i) {
$rand = rand();
curl_setopt($ch, CURLOPT_URL, "http://www.google.com/?rand=" . $rand);
curl_exec($ch);
}
curl_close($ch);
$end_time = microtime(true);
ob_end_clean();
echo 'Curl with handle reuse: ' . ($end_time - $start_time) . '<br>';

and got the following results:

Curl without handle reuse: 8.5690529346466
Curl with handle reuse: 5.3703031539917

So reusing the same handle actually provides a substantial performance increase when connecting to the same server multiple times. I tried connecting to different servers:

$url_arr = array(
'http://www.google.com/',
'http://www.bing.com/',
'http://www.yahoo.com/',
'http://www.slashdot.org/',
'http://www.stackoverflow.com/',
'http://github.com/',
'http://www.harvard.edu/',
'http://www.gamefaqs.com/',
'http://www.mangaupdates.com/',
'http://www.cnn.com/'
);
ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
foreach ($url_arr as $url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
curl_close($ch);
}
$end_time = microtime(true);
ob_end_clean();
echo 'Curl without handle reuse: ' . ($end_time - $start_time) . '<br>';

ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
$ch = curl_init();
foreach ($url_arr as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
}
curl_close($ch);
$end_time = microtime(true);
ob_end_clean();
echo 'Curl with handle reuse: ' . ($end_time - $start_time) . '<br>';

And got the following result:

Curl without handle reuse: 3.7672290802002
Curl with handle reuse: 3.0146431922913

Still quite a substantial performance increase.

Improving cURL performance (PHP Library)

Set CURLOPT_NOBODY to 1 (see curl documentation) tell curl not to ask for the body of the response. This will contact the web server and issue a HEAD request. The response code will tell you if the URL is valid or not, and won't transfer the bulk of the data back.

If that's still too slow, then you'll likely see a vast improvement by running N threads (or processes) each doing 1/Nth of the work. The bottleneck may not be in your code, but in the response times of the remote servers. If they're slow to respond, then your loop will be slow to run.

Does PHP's curl_reset() close the underlying connection?

No. The curl_reset engine code calls the libcurl method curl_easy_reset whose documentation explicitly states:

... does not change the following information kept in the handle: live connections, the Session ID cache, the DNS cache, the cookies and shares.



Related Topics



Leave a reply



Submit