Php: Scandir() Is Too Slow

PHP: scandir() is too slow

You can use readdir which may be faster, something like this:

function readDirectory($Directory,$Recursive = true)
{
if(is_dir($Directory) === false)
{
return false;
}

try
{
$Resource = opendir($Directory);
$Found = array();

while(false !== ($Item = readdir($Resource)))
{
if($Item == "." || $Item == "..")
{
continue;
}

if($Recursive === true && is_dir($Item))
{
$Found[] = readDirectory($Directory . $Item);
}else
{
$Found[] = $Directory . $Item;
}
}
}catch(Exception $e)
{
return false;
}

return $Found;
}

May require some tweeking but this is essentially what scandir does, and it should be faster, if not please write an update as i would like to see if i can make a faster solution.

Another issue is if your reading a very large directory your filling an array up within the internal memory and that may be where your memory is going.

You could try and create a function that reads in offsets so that you can return 50 files at a time!

reading chunks of files at a time would be just as simple to use, would be like so:

$offset = 0;
while(false !== ($Batch = ReadFilesByOffset("/tmp",$offset)))
{
//Use $batch here which contains 50 or less files!

//Increment the offset:
$offset += 50;
}

Is it possible to speed up a recursive file scan in PHP?

PHP just cannot perform as fast as C, plain and simple.

scandir points to disk root

So apparently I can't spell.
As @arkascha mentioned

$foo = scandir('\css');

should not be relative to my web root, the actual code was

$foo = scandir('css\');

which works as expected, obviously

file_exists() versus in_array() of scandir() -- which is faster?

Im my opinion, I believe the scandir() will be faster as it only reads the directory once, in addition file_exists() is known to be quite slow.

Furthermore, you could use glob(). This will list all files in a directory that match a particular pattern. See here

Regardless of my opinion, you can run a simple script like so to test the speed:

<?php

// Get the start time
$time_start = microtime(true);

// Do the glob() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'glob()\' finished in ' . $time . 'seconds';

// Do the file_exists() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'file_exists()\' finished in ' . $time . 'seconds';

// Do the scandir() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'scandir()\' finished in ' . $time . 'seconds';

?>

Not sure how the above script will behave with the cache, you may have to separate the tests into separate files and run individually

Update 1

You could also implement the function memory_get_usage() to return the amount of memory currently allocated to the PHP script. You may find this useful. See here for more details.

Update 2

As for your second question, there are several ways you can list all files in a directory, including sub-directories. See the answers to this question:

Scan files in a directory and sub-directory and store their path in array using php

How to perform a statement inside a query to check existence of a file that has the query id as name with less resources in laravel

Do you have good reason to believe that scandir on a directory with a large number of folders will actually slow you down?

You can do your query like this:

if(Input::has('field')){

$filenames = scandir('img/folders');

$query = Model::whereIn('id', $filenames)->get();
}

Edit 1

You may find these links useful:

PHP: scandir() is too slow

Get the Files inside a directory

Edit 2

There are some really good suggestions in the links which you should be able to use for guidance to make your own implementation. As I see it, based on the links included from the first edit I made, your options are use DirectoryIterator, readdir or chunking with scandir.

This is a very basic way of doing it but I guess you could do something with readdir like this:

$ids = Model::lists('id');

$matches = [];

if($handle = opendir('path/to/folders'))
{
while (($entry = readdir($handle)) !== false)
{
if(count($ids) === 0)
{
break;
}

if ($entry != "." && $entry != "..")
{
foreach ($ids as $key => $value)
{
if($value === $entry)
{
$matches[] = $entry;

unset($ids[$key]);
}
}
}
}

closedir($handle);
}

return $matches;

How to improve the speed of a for loop in PHP?

Don't call glob(). Just use a loop that processes each file that matches the pattern in numeric order. You can stop the loop when the file doesn't exist.

I assume there are no gaps in your numeric sequence of filenames.

if (($handle = fopen("../RC_PRODUCT_HUB.csv", "r")) !== FALSE) {
fgets($handle); // skip header line
while (($data = fgetcsv($handle, 9000000, ";")) !== FALSE){
if ($data[0] != null) {
for ($i = 1; file_exists($fileName = $path.$data[6].'_'.$data[7].'-'.$i.'.JPG'); ++$i) {
if (!in_array($fileName, $dataImage)){
$dataImage[$data[6] . '_' . $data[7]]['file'][$i] = $fileName;
$fileName = str_replace($path, '', $fileName);
if (!in_array($fileName, $dataImageTmp)){
$dataImageTmp[] = $fileName;
}
}
if (isset($dataImage[$data[6] . '_' . $data[7]]['TOTAL'])) {
$dataImage[$data[6] . '_' . $data[7]]['TOTAL']++;
} else {
$dataImage[$data[6] . '_' . $data[7]]['TOTAL'] = 1;
}
}
}
}
}


Related Topics



Leave a reply



Submit