How to Use Regexiterator in PHP

How to use RegexIterator in PHP

There are a couple of different ways of going about something like this, I'll give two quick approaches for you to choose from: quick and dirty, versus longer and less dirty (though, it's a Friday night so we're allowed to go a little bit crazy).

1. Quick (and dirty)

This involves just writing a regular expression (could be split into multiple) to use to filter the collection of files in one quick swoop.

(Only the two commented lines are really important to the concept.)

$directory = new RecursiveDirectoryIterator(__DIR__);
$flattened = new RecursiveIteratorIterator($directory);

// Make sure the path does not contain "/.Trash*" folders and ends eith a .php or .html file
$files = new RegexIterator($flattened, '#^(?:[A-Z]:)?(?:/(?!\.Trash)[^/]+)+/[^/]+\.(?:php|html)$#Di');

foreach($files as $file) {
echo $file . PHP_EOL;
}

This approach has a number of issues, though it is quick to implement being just a one-liner (though the regex might be a pain to decipher).

2. Less quick (and less dirty)

A more re-usable approach is to create a couple of bespoke filters (using regex, or whatever you like!) to whittle down the list of available items in the initial RecursiveDirectoryIterator down to only those that you want. The following is only one example, written quickly just for you, of extending the RecursiveRegexIterator.

We start with a base class whose main job is to keep a hold of the regex that we want to filter with, everything else is deferred back to the RecursiveRegexIterator. Note that the class is abstract since it doesn't actually do anything useful: the actual filtering is to be done by the two classes which will extend this one. Also, it may be called FilesystemRegexFilter but there is nothing forcing it (at this level) to filter filesystem-related classes (I'd have chosen a better name, if I weren't quite so sleepy).

abstract class FilesystemRegexFilter extends RecursiveRegexIterator {
protected $regex;
public function __construct(RecursiveIterator $it, $regex) {
$this->regex = $regex;
parent::__construct($it, $regex);
}
}

These two classes are very basic filters, acting on the file name and directory name respectively.

class FilenameFilter extends FilesystemRegexFilter {
// Filter files against the regex
public function accept() {
return ( ! $this->isFile() || preg_match($this->regex, $this->getFilename()));
}
}

class DirnameFilter extends FilesystemRegexFilter {
// Filter directories against the regex
public function accept() {
return ( ! $this->isDir() || preg_match($this->regex, $this->getFilename()));
}
}

To put those into practice, the following iterates recursively over the contents of the directory in which the script resides (feel free to edit this!) and filters out the .Trash folders (by making sure that folder names do match the specially crafted regex), and accepting only PHP and HTML files.

$directory = new RecursiveDirectoryIterator(__DIR__);
// Filter out ".Trash*" folders
$filter = new DirnameFilter($directory, '/^(?!\.Trash)/');
// Filter PHP/HTML files
$filter = new FilenameFilter($filter, '/\.(?:php|html)$/');

foreach(new RecursiveIteratorIterator($filter) as $file) {
echo $file . PHP_EOL;
}

Of particular note is that since our filters are recursive, we can choose to play around with how to iterate over them. For example, we could easily limit ourselves to only scanning up to 2 levels deep (including the starting folder) by doing:

$files = new RecursiveIteratorIterator($filter);
$files->setMaxDepth(1); // Two levels, the parameter is zero-based.
foreach($files as $file) {
echo $file . PHP_EOL;
}

It is also super-easy to add yet more filters (by instantiating more of our filtering classes with different regexes; or, by creating new filtering classes) for more specialised filtering needs (e.g. file size, full-path length, etc.).

P.S. Hmm this answer babbles a bit; I tried to keep it as concise as possible (even removing vast swathes of super-babble). Apologies if the net result leaves the answer incoherent.

PHP SPL RegexIterator how files would be ordered?

RecursiveDirectoryIterator uses opendir, which doesn't sort its results.

If you want sorted results, you can use scandir, but it's not recursive or iterative.

RecursiveIterator return array with file extension

Try to var_dump $files and you will see. If you dont want to put both elements of the $file array into your $fileList then dont use the array_merge simply do:

   foreach($files as $file) {
$fileList[] = $file[0];
}

And for pretty rough and ready fix to the \ do a str_replace or similar. Something like:

   foreach($files as $file) {
$fileList[] = str_replace('/','\\',$file[0]);
}

RecursiveDirectoryIterator + RecursiveIteratorIterator + RegexIterator are not working like they should

The comment doesn't tell the whole truth in this sentence

$Regex will contain a single index array for each PHP file.

You actually need to iterate over $Regex, as a dump won't give you back a usual array

foreach($Regex as $file) {
var_dump($file);
}

Using literal numbers seems to break RegexIterator

What looks to be happening here is that the directory itself in $IMAGES_DIR is included in the pattern returned to $r in your iteration. Using your working pattern, if you print_r($r); inside the loop you'll see the matched patterns:

array(6) {
[0]=>
string(19) "./images/test/4.png"
[1]=>
string(19) "./images/test/6.png"
[2]=>
string(19) "./images/test/5.png"
[3]=>
string(14) "./images/3.png"
[4]=>
string(14) "./images/1.png"
[5]=>
string(14) "./images/2.png"
}

So, you need to construct your expression to either incorporate the directory, or to ignore it and not anchor with ^. Your pattern as attempted matches exactly patterns like 1.png but the input string it is testing is actuall ./images/1.png.

Instead I would recommend using

$IMG_MASK = '#/[1-3]\.png$#';

This pattern does not ^ anchor the start of the string, and instead begins matching at the / before the digit.

If you are interested in getting the full paths, restore your .+ to the start, and use DIRECTORY_SEPARATOR just before the digit:

$IMG_MASK = '#.+' . DIRECTORY_SEPARATOR . '[1-3]\.png$#';

This will match anything (.+) up to a / (or your platform's separator), then match the single digit and .png. The result is an array like:

Array
(
[0] => ./images/3.png
[1] => ./images/1.png
[2] => ./images/2.png
)

Of course if you want those images in ./images/test/ adjust the regex to use \d\.png to match any digit instead of just [1-3].

The pattern

$IMG_MASK = '#.+' . DIRECTORY_SEPARATOR . '\d\.png$#';

...produces:

Array
(
[0] => ./images/test/4.png
[1] => ./images/test/6.png
[2] => ./images/test/5.png
[3] => ./images/3.png
[4] => ./images/1.png
[5] => ./images/2.png
)

How to use RecursiveDirectoryIterator with a Modified Date filter?

Ok you may try this example:

class FilesystemDateFilter extends RecursiveFilterIterator 
{
protected $earliest_date;

public function __construct(RecursiveIterator $it, $earliest_date)
{
$this->earliest_date = $earliest_date;
parent::__construct($it);
}

public function accept()
{
return ( ! $this->isFile() || $this->getMTime() >= $this->earliest_date );
}

public function getChildren()
{
return new static ( $this->getInnerIterator ()->getChildren (), $this->earliest_date );
}
}

$directory = new RecursiveDirectoryIterator("c:\\www");
$filter = new FilesystemDateFilter($directory, strtotime('2012-12-31'));

foreach(new RecursiveIteratorIterator($filter) as $filename => $file) {
echo $filename . PHP_EOL;
}

Note http://php.net/manual/en/directoryiterator.getmtime.php returns timestamp so you need also give it.

What you was missing was overwriting getChildern which passes parameter down to to children.



Related Topics



Leave a reply



Submit