How to Return Only Named Groups with Preg_Match or Preg_Match_All

How to return only named groups with preg_match or preg_match_all?

I do not think you can make preg_* do it, but you can do it with a simple loop. But I don't see why those elements pose a problem.

How can I get only named captures from preg_match?

preg_match will always return the numeric indexes regardless of named capturing groups

PHP preg_match_all named groups issue

/user/(?P<user>[^/]+)/(?P<action>[^/]+)

http://regex101.com/r/gL1aS2

Just to explain a couple problems with your original regex:

  • [.*]+ means a positive number of occurrences of a dot and an asterisk only, example: *.*.* or . or ......; [^/]+ describes a positive number of any characters but slashes.
  • No need to escape slashes, as they're not special characters when you're using ~ as delimiters.
  • Your regex also required /app at the beginning, which wasn't present in your string.

Return matches with preg_match without items with numerical keys

There is no "simple" way as preg_match has no such an option to only output named groups.

If you must remove the items with numerical keys from the array and do not want to use explicit loops you may use this array_filter workaround:

$str = 'foobar: 2008';
if (preg_match('/(?P<name>\w+): (?P<digit>\d+)/', $str, $matches)) {
print_r(
array_filter($matches, function ($key) { return !is_int($key); }, ARRAY_FILTER_USE_KEY)
);
} // => Array ( [name] => foobar [digit] => 2008 )

See the PHP demo

PHP regex, how can I make my regex only return one group?

preg_match_all

If you want a different captured string, you need to change your regex. Here I'm looking for anything not a double quote " between two quote " characters behind a : colon character.

<?php

$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!(?<=:")[^"]+(?=")!';

preg_match_all($pattern,$string,$matches);

echo $matches[0][0];

?>

Output

abcdefg

If you were to print_r($matches) you would see that you have the default array and the matches in their own additional arrays. So to access the string you would need to use $matches[0][0] which provides the two keys to access the data. But you're always going to have to deal with arrays when you're using preg_match_all.

Array
(
[0] => Array
(
[0] => abcdefg
)

)

preg_replace

Alternatively, if you were to use preg_replace instead, you could replace all of the contents of the string except for your capture group, and then you wouldn't need to deal with arrays (but you need to know a little more about regex).

<?php

$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!^[^:]+:"([^"]+)".+$!s';

$new_string = preg_replace($pattern,"$1",$string);

echo $new_string;

?>

Output

abcdefg

PHP preg_match_all subpattern names in a pattern

You may get a list of all valid named capture group names using

"~(?<!\\\\)(?:\\\\{2})*\(\?(?|P?<([_A-Za-z]\w{0,31})>|'([_A-Za-z]\w{0,31})')~"

See the regex and an online PHP demo.

The point is to match an unescaped ( that is followed with a ? that is then followed with either P< or < and then has a group name pattern ending with > or ' followed with the group name pattern and then '.

$rx = "~(?<!\\\\)(?:\\\\{2})*\(\?(?|P?<([_A-Za-z]\w{0,31})>|'([_A-Za-z]\w{0,31})')~";
$s = "(?P<name>\w+): (?<name2>\w+): (?'digit'\d+)";
preg_match_all($rx, $s, $res);
print_r($res[1]);

yields

Array
(
[0] => name
[1] => name2
[2] => digit
)

Pattern details

  • (?<!\\) - no \ immediately to the left of the current location
  • (?:\\\\)* - 0+ double backslashes (to allow any escaped backslash before ()
  • \( - a (
  • \? - a ?
  • (?|P?<([_A-Za-z]\w{0,31})>|'([_A-Za-z]\w{0,31})') - a branch reset group:

    • P?<([_A-Za-z]\w{0,31})> - an optional P, <, a _ or an ASCII letter, 0 to 31 word chars (digits/letters/_) (captured into Group 1), and >
    • | - or
    • '([_A-Za-z]\w{0,31})' - ', a _ or an ASCII letter, 0 to 31 word chars (digits/letters/_) (also captured into Group 1), and then '

The group name patterns are all captured into Group 1, you just need to get $res[1].

RegEx Capture Group with PHP preg_match Not Returning Values

The problem is that you require a CR character \r. Also you should make the search lazy inside the capturing group and use print_r to output the array. Like this:

$pattern = "/<\/th>.*<td>(.*?)<\/td>$/";

You can see it in action here: http://codepad.viper-7.com/djRJ0e

Note that it's recommended to parse html with a proper html parser rather than using regex.

preg_match_all to return pattern matches rather than subject matches

I believe the original answer, although it solves the sample problem stated by the OP, doesn't address the primary question formulated in the title.

Imagine someone desperately looking for an answer to this/some problem up, finding this page on Stack Overflow... and the answer wouldn't address the original problem stated in the title but would be an "alternative solution" to the possibly isolated and thus reduced problem.

Anyway...

This is how I would go about this.

<?php

$subject = "The quick brown fox jumps over the lazy dog";

$needles = [
'Dog',
'Clown',
'Brown',
'Fox',
'Dude',
];

// Optional: If you want to search for the needles as they are,
// literally, let's escape possible control characters.
$needles = array_map('preg_quote', $needles);

// Build our regular expression with matching groups which we can then evaluate.
$pattern = sprintf("#(%s)#i", implode(')|(', $needles));

// In this case the result regexp. would be:
// #(Dog)|(Clown)|(Brown)|(Fox)|(Dude)#i

// So let's match it!
$pregMatchCount = preg_match_all($pattern, $subject, $m);

// Get rid of the first item as it represents all matches.
array_shift($m);

// Go through each of the matched sub-groups...
foreach ($m as $i => $group) {

// ...and if this sub-group is not empty, we know that the needle
// with the index of this sub-group is present in the results.
if (array_filter($group)) {
$foundNeedles[] = $needles[$i];
}

}

print_r($foundNeedles);

The result being:

Array
(
[0] => Dog
[1] => Brown
[2] => Fox
)

How can I use groups with preg_match?

In order to match the strings with the format you described, you need

preg_match_all('/^([a-z]+(?:\s+[a-z]+)?)\s+([0-9]+)\s+([a-z]+(?:\s+[a-z]+)?)\s+([0-9]+)$/im', $game, $info);

See the regex demo

IDEONE demo:

$re = '~^([a-z]+(?:\s+[a-z]+)?)\s+([0-9]+)\s+([a-z]+(?:\s+[a-z]+)?)\s+([0-9]+)$~im'; 
$game = "Word 123 Word 456\nWord 1234 Word Word 3456\nWord Word 3455 Word 4566\nWord Word 4434 Word Word 44332";
preg_match_all($re, $game, $info);
print_r($info);

The regex explanation:

  • ^ - start of string
  • ([a-z]+(?:\s+[a-z]+)?) - Group 1 for Word Word or Word pattern
  • \s+ - one or more whitespaces
  • ([0-9]+) - Group 2 for Number
  • \s+ - one or more whitespaces
  • ([a-z]+(?:\s+[a-z]+)?) - Group 3 for Word Word or Word pattern
  • \s+ - one or more whitespaces
  • ([0-9]+) - Group 4 for Number pattern
  • $ - end of string

The /i modifier makes the pattern case-insensitive. /m modifier is used for testing only (it makes ^ and $ match start and end of a line, not the whole string).

The [a-z]+(?:\s+[a-z]+)? subpattern means *match one or more letters with [a-z]+ and then match one or zero occurrence of a sequence of one or more whitespaces (\s+) followed with one or more letters ([a-z]+). Thus, this pattern effectively matches 1 or 2 words separated with a whitespace.



Related Topics



Leave a reply



Submit