Php: Scandir() Is Too Slow

PHP: scandir() is too slow

You can use readdir which may be faster, something like this:

function readDirectory($Directory,$Recursive = true)
{
    if(is_dir($Directory) === false)
    {
        return false;
    }

    try
    {
        $Resource = opendir($Directory);
        $Found = array();

        while(false !== ($Item = readdir($Resource)))
        {
            if($Item == "." || $Item == "..")
            {
                continue;
            }

            if($Recursive === true && is_dir($Item))
            {
                $Found[] = readDirectory($Directory . $Item);
            }else
            {
                $Found[] = $Directory . $Item;
            }
        }
    }catch(Exception $e)
    {
        return false;
    }

    return $Found;
}

May require some tweeking but this is essentially what scandir does, and it should be faster, if not please write an update as i would like to see if i can make a faster solution.

Another issue is if your reading a very large directory your filling an array up within the internal memory and that may be where your memory is going.

You could try and create a function that reads in offsets so that you can return 50 files at a time!

reading chunks of files at a time would be just as simple to use, would be like so:

$offset = 0;
while(false !== ($Batch = ReadFilesByOffset("/tmp",$offset)))
{
    //Use $batch here which contains 50 or less files!

    //Increment the offset:
    $offset += 50;
}

Is it possible to speed up a recursive file scan in PHP?

PHP just cannot perform as fast as C, plain and simple.

scandir points to disk root

So apparently I can't spell.
As @arkascha mentioned

$foo = scandir('\css');

should not be relative to my web root, the actual code was

$foo = scandir('css\');

which works as expected, obviously

file_exists() versus in_array() of scandir() -- which is faster?

Im my opinion, I believe the scandir() will be faster as it only reads the directory once, in addition file_exists() is known to be quite slow.

Furthermore, you could use glob(). This will list all files in a directory that match a particular pattern. See here

Regardless of my opinion, you can run a simple script like so to test the speed:

<?php

// Get the start time
$time_start = microtime(true);

// Do the glob() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'glob()\' finished in ' . $time . 'seconds';

// Do the file_exists() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'file_exists()\' finished in ' . $time . 'seconds';

// Do the scandir() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'scandir()\' finished in ' . $time . 'seconds';

?>

Not sure how the above script will behave with the cache, you may have to separate the tests into separate files and run individually

Update 1

You could also implement the function memory_get_usage() to return the amount of memory currently allocated to the PHP script. You may find this useful. See here for more details.

Update 2

As for your second question, there are several ways you can list all files in a directory, including sub-directories. See the answers to this question:

Scan files in a directory and sub-directory and store their path in array using php

How to perform a statement inside a query to check existence of a file that has the query id as name with less resources in laravel

Do you have good reason to believe that scandir on a directory with a large number of folders will actually slow you down?

You can do your query like this:

if(Input::has('field')){

   $filenames = scandir('img/folders');

   $query = Model::whereIn('id', $filenames)->get();
}

Edit 1

You may find these links useful:

PHP: scandir() is too slow

Get the Files inside a directory

Edit 2

There are some really good suggestions in the links which you should be able to use for guidance to make your own implementation. As I see it, based on the links included from the first edit I made, your options are use DirectoryIterator, readdir or chunking with scandir.

This is a very basic way of doing it but I guess you could do something with readdir like this:

$ids = Model::lists('id');

$matches = [];

if($handle = opendir('path/to/folders'))
{
    while (($entry = readdir($handle)) !== false) 
    {
        if(count($ids) === 0)
        {
            break;
        }

        if ($entry != "." && $entry != "..") 
        {
            foreach ($ids as $key => $value) 
            {
                if($value === $entry)
                {
                    $matches[] = $entry;

                    unset($ids[$key]);
                }
            }
        }
    }

    closedir($handle);
}

return $matches;

How to improve the speed of a for loop in PHP?

Don't call glob(). Just use a loop that processes each file that matches the pattern in numeric order. You can stop the loop when the file doesn't exist.

I assume there are no gaps in your numeric sequence of filenames.

if (($handle = fopen("../RC_PRODUCT_HUB.csv", "r")) !== FALSE) {
    fgets($handle); // skip header line
    while (($data = fgetcsv($handle, 9000000, ";")) !== FALSE){
        if ($data[0] != null) {
            for ($i = 1; file_exists($fileName = $path.$data[6].'_'.$data[7].'-'.$i.'.JPG'); ++$i) {
                if (!in_array($fileName, $dataImage)){
                    $dataImage[$data[6] . '_' . $data[7]]['file'][$i] = $fileName;
                    $fileName = str_replace($path, '', $fileName);
                    if (!in_array($fileName, $dataImageTmp)){
                        $dataImageTmp[] = $fileName;
                    }
                }
                if (isset($dataImage[$data[6] . '_' . $data[7]]['TOTAL'])) {
                    $dataImage[$data[6] . '_' . $data[7]]['TOTAL']++;
                } else {
                    $dataImage[$data[6] . '_' . $data[7]]['TOTAL'] = 1;
                }
            }
        }
    }
}

Php: Scandir() Is Too Slow