How to Get All Captures of Subgroup Matches with Preg_Match_All()

Get repeated matches with preg_match_all()

According to Kobi (see comments above):

PHP has no support for captures of the same group

Therefore this question has no solution.

PHP preg_match_all() not capturing subgroups

I recommend you use DOM (or SimpleXML) for parsing RSS/Atom feeds. You will get way better results than with regular expressions.

Here's an example (using SimpleXML):

$rss_feed = file_get_contents('http://stackoverflow.com/feeds/question/4187945');
$sxml = new SimpleXMLElement($rss_feed);

$title = $sxml->entry[0]->title;
echo $title;

PHP Regex, capture repetition matches

So as discussed in the comments (and to stop people posting rules that match the text (SERIOUSLY, read the Q)) I shall post the "solution" here.

I use this rule:

^([a-z]+)>>(.*)::([a-z]+)$

(Or something to that effect)

Then I can use preg_match_all on the middle capture and extract the data that way. Annoyingly this doesn't check for commas. But I can scrap that requirement.

So something like:

 preg_match_all("([a-z]+)\(([a-z]+)\)",...

On that.

How to capture multiple repeated groups?

With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).

Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:

^([A-Z]+),([A-Z]+),([A-Z]+)$

Highlight match result in subject string from preg_match_all()

This seems to behave right for all the examples I've thrown at it so far. Note that I've broken the abstract highlighting part from the HTML-mangling part for reusability in other situations:

<?php

/**
* Runs a regex against a string, and return a version of that string with matches highlighted
* the outermost match is marked with [0]...[/0], the first sub-group with [1]...[/1] etc
*
* @param string $regex Regular expression ready to be passed to preg_match_all
* @param string $input
* @return string
*/
function highlight_regex_matches($regex, $input)
{
$matches = array();
preg_match_all($regex, $input, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);

// Arrange matches into groups based on their starting and ending offsets
$matches_by_position = array();
foreach ( $matches as $sub_matches )
{
foreach ( $sub_matches as $match_group => $match_data )
{
$start_position = $match_data[1];
$end_position = $start_position + strlen($match_data[0]);

$matches_by_position[$start_position]['START'][] = $match_group;

$matches_by_position[$end_position]['END'][] = $match_group;
}
}

// Now proceed through that array, annotoating the original string
// Note that we have to pass through BACKWARDS, or we break the offset information
$output = $input;
krsort($matches_by_position);
foreach ( $matches_by_position as $position => $matches )
{
$insertion = '';

// First, assemble any ENDING groups, nested highest-group first
if ( is_array($matches['END']) )
{
krsort($matches['END']);
foreach ( $matches['END'] as $ending_group )
{
$insertion .= "[/$ending_group]";
}
}

// Then, any STARTING groups, nested lowest-group first
if ( is_array($matches['START']) )
{
ksort($matches['START']);
foreach ( $matches['START'] as $starting_group )
{
$insertion .= "[$starting_group]";
}
}

// Insert into output
$output = substr_replace($output, $insertion, $position, 0);
}

return $output;
}

/**
* Given a regex and a string containing unescaped HTML, return a blob of HTML
* with the original string escaped, and matches highlighted using <span> tags
*
* @param string $regex Regular expression ready to be passed to preg_match_all
* @param string $input
* @return string HTML ready to display :)
*/
function highlight_regex_as_html($regex, $raw_html)
{
// Add the (deliberately non-HTML) highlight tokens
$highlighted = highlight_regex_matches($regex, $raw_html);

// Escape the HTML from the input
$highlighted = htmlspecialchars($highlighted);

// Substitute the match tokens with desired HTML
$highlighted = preg_replace('#\[([0-9]+)\]#', '<span class="match\\1">', $highlighted);
$highlighted = preg_replace('#\[/([0-9]+)\]#', '</span>', $highlighted);

return $highlighted;
}

NOTE: As hakra has pointed out to me on chat, if a sub-group in the regex can occur multiple times within one overall match (e.g. '/a(b|c)+/'), preg_match_all will only tell you about the last of those matches - so highlight_regex_matches('/a(b|c)+/', 'abc') returns '[0]ab[1]c[/1][/0]' not '[0]a[1]b[/1][1]c[/1][/0]' as you might expect/want. All matching groups outside that will still work correctly though, so highlight_regex_matches('/a((b|c)+)/', 'abc') gives '[0]a[1]b[2]c[/2][/1][/0]' which is still a pretty good indication of how the regex matched.

Can you retrieve multiple regex matches in JavaScript?

No, this is not possible in JavaScript (and most other regex flavors except Perl 6 and .NET). Repeated capturing groups always store the last value that was matched. Only .NET and Perl allow you to access those matches individually (match.Groups(i).Captures in .NET, for example).

You need two passes, the first to find the strings, the second to iterate over the matches and scan those for their sub-values.

Or make the regex explicit:

/^([0-9]{1,2}:)?([0-9]{1,2}:)?([0-9]{1,2}:)?([0-9]{0,2})?$/

Regex quantified capture

Try this one out:

preg_match_all("@(?:/m)?/([^/]+)(?:/t)?@", "/m/part/other-part/another-part/t", $m);
var_dump($m);

It gives:

array(2) {
[0]=>
array(3) {
[0]=>
string(7) "/m/part"
[1]=>
string(11) "/other-part"
[2]=>
string(15) "/another-part/t"
}
[1]=>
array(3) {
[0]=>
string(4) "part"
[1]=>
string(10) "other-part"
[2]=>
string(12) "another-part"
}
}

//EDIT

IMO the best way to do what you want is to use preg_match() from @stema and explode result by / to get list of parts you want.

Generate all possible matches for regex pattern in PHP

Method

  1. You need to strip out the variable patterns; you can use preg_match_all to do this

    preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches);

    /* Regex:

    /(\[\w+\]|\([\w|]+\))/
    / : Pattern delimiter
    ( : Start of capture group
    \[\w+\] : Character class pattern
    | : OR operator
    \([\w|]+\) : Capture group pattern
    ) : End of capture group
    / : Pattern delimiter

    */
  2. You can then expand the capture groups to letters or words (depending on type)

    $array = str_split($cleanString, 1); // For a character class
    $array = explode("|", $cleanString); // For a capture group
  3. Recursively work your way through each $array

Code

function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);

foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}

function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);

if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}

$regex = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";

preg_match_all($matchPattern, $regex, $matches);

printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);


Additional functionality

Expanding nested groups

In use you would put this before the "preg_match_all".

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';

echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;

Output:

This happen(s|ed) to (become|be|have|having) test case 1?

Matching single letters

The bones of this would be to update the regex:

$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

and add an else to the prepOptions function:

} else {
$array = [$cleanString];
}

Full working example

function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);

foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}

function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);

if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
} else {
$array = [$cleanString];
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}

$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex);

preg_match_all($matchPattern, $regex, $matches);

printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);

Output:

This happens to become test case 1
This happens to become test case
This happens to be test case 1
This happens to be test case
This happens to have test case 1
This happens to have test case
This happens to having test case 1
This happens to having test case
This happened to become test case 1
This happened to become test case
This happened to be test case 1
This happened to be test case
This happened to have test case 1
This happened to have test case
This happened to having test case 1
This happened to having test case


Related Topics



Leave a reply



Submit