Can Someone Explain the /E Regex Modifier

Can someone explain the /e regex modifier?

The e Regex Modifier in PHP with example vulnerability & alternatives

What e does, with an example...

The e modifier is a deprecated regex modifier which allows you to use PHP code within your regular expression. This means that whatever you parse in will be evaluated as a part of your program.

For example, we can use something like this:

$input = "Bet you want a BMW.";
echo preg_replace("/([a-z]*)/e", "strtoupper('\\1')", $input);

This will output BET YOU WANT A BMW.

Without the e modifier, we get this very different output:

strtoupper('')Bstrtoupper('et')strtoupper('') strtoupper('you')strtoupper('') strtoupper('want')strtoupper('') strtoupper('a')strtoupper('') strtoupper('')Bstrtoupper('')Mstrtoupper('')Wstrtoupper('').strtoupper('')

Potential security issues with e...

The e modifier is deprecated for security reasons. Here's an example of an issue you can run into very easily with e:

$password = 'secret';
...
$input = $_GET['input'];
echo preg_replace('|^(.*)$|e', '"\1"', $input);

If I submit my input as "$password", the output to this function will be secret. It's very easy, therefore, for me to access session variables, all variables being used on the back-end and even take deeper levels of control over your application (eval('cat /etc/passwd');?) through this simple piece of poorly written code.

Like the similarly deprecated mysql libraries, this doesn't mean that you cannot write code which is not subject to vulnerability using e, just that it's more difficult to do so.

What you should use instead...

You should use preg_replace_callback in nearly all places you would consider using the e modifier. The code is definitely not as brief in this case but don't let that fool you -- it's twice as fast:

$input = "Bet you want a BMW.";
echo preg_replace_callback(
"/([a-z]*)/",
function($matches){
foreach($matches as $match){
return strtoupper($match);
}
},
$input
);

On performance, there's no reason to use e...

Unlike the mysql libraries (which were also deprecated for security purposes), e is not quicker than its alternatives for most operations. For the example given, it's twice as slow: preg_replace_callback (0.14 sec for 50,000 operations) vs e modifier (0.32 sec for 50,000 operations)

Can anyone explain how this PHP code gets executed as preg_replace is the only function

The /e is a PCRE modifier (PREG_REPLACE_EVAL), which evaluates the string as PHP before the replacement. As no replacement takes place, it evaluates exaclty what you copied. After PHP 5.5 it triggers DEPRECATED, after 7.0, it has been removed, due to security issues.

You can find the corresponding documentation about PCRE modifiers here.

PHP preg_replace 'e' modifier dilemma

Use preg_replace_callback instead! 'e' option can be dangerous. Example with preg_replace_callback :

preg_replace_callback('`\[pdf\]([^\[]+)\[/pdf\]`i', function ($matches) { return "<a href=\"{$matches[1]}\">PDF</a> " . filesize($matches[1]); }, $string);

You can obviously make a more elaborate function that checks if the file exists or whatever. This code uses PHP 5.3 anonymous function, so if you don't run on PHP 5.3, you have to use the older syntax.

Edit: I wrote this answer before seeing the comments and OP's edited post, so I may have missed something.

Edit2: What is your size function? It looks like it prints the value instead of returning it.

function test() {
echo "world";
}

echo "hello " . test(); // worldhello

How can I disable the 'e' PREG_REPLACE_EVAL modifier in PHP?

The Suhosin extension provides an option to disable the /e modifier.

disable_functions = eval by the way won't do what you expect (as eval is not a function, but a language construct). Again the Suhosin extension provides an option to disable eval.

Replace preg_replace() e modifier with preg_replace_callback

In a regular expression, you can "capture" parts of the matched string with (brackets); in this case, you are capturing the (^|_) and ([a-z]) parts of the match. These are numbered starting at 1, so you have back-references 1 and 2. Match 0 is the whole matched string.

The /e modifier takes a replacement string, and substitutes backslash followed by a number (e.g. \1) with the appropriate back-reference - but because you're inside a string, you need to escape the backslash, so you get '\\1'. It then (effectively) runs eval to run the resulting string as though it was PHP code (which is why it's being deprecated, because it's easy to use eval in an insecure way).

The preg_replace_callback function instead takes a callback function and passes it an array containing the matched back-references. So where you would have written '\\1', you instead access element 1 of that parameter - e.g. if you have an anonymous function of the form function($matches) { ... }, the first back-reference is $matches[1] inside that function.

So a /e argument of

'do_stuff(\\1) . "and" . do_stuff(\\2)'

could become a callback of

function($m) { return do_stuff($m[1]) . "and" . do_stuff($m[2]); }

Or in your case

'strtoupper("\\2")'

could become

function($m) { return strtoupper($m[2]); }

Note that $m and $matches are not magic names, they're just the parameter name I gave when declaring my callback functions. Also, you don't have to pass an anonymous function, it could be a function name as a string, or something of the form array($object, $method), as with any callback in PHP, e.g.

function stuffy_callback($things) {
return do_stuff($things[1]) . "and" . do_stuff($things[2]);
}
$foo = preg_replace_callback('/([a-z]+) and ([a-z]+)/', 'stuffy_callback', 'fish and chips');

As with any function, you can't access variables outside your callback (from the surrounding scope) by default. When using an anonymous function, you can use the use keyword to import the variables you need to access, as discussed in the PHP manual. e.g. if the old argument was

'do_stuff(\\1, $foo)'

then the new callback might look like

function($m) use ($foo) { return do_stuff($m[1], $foo); }

Gotchas

  • Use of preg_replace_callback is instead of the /e modifier on the regex, so you need to remove that flag from your "pattern" argument. So a pattern like /blah(.*)blah/mei would become /blah(.*)blah/mi.
  • The /e modifier used a variant of addslashes() internally on the arguments, so some replacements used stripslashes() to remove it; in most cases, you probably want to remove the call to stripslashes from your new callback.

What Raku regex modifier makes a dot match a newline (like Perl's /s)?

TL;DR The Raku equivalent for "Perl dot matches newline" is ., and for \Q...\E it's ....

There are ways to get better answers (more authoritative, comprehensive, etc than SO ones) to most questions like these more easily (typically just typing the search term of interest) and quickly (typically seconds, couple minutes tops). I address that in this answer.

What is Raku equivalent for "Perl dot matches newline"?

Just .

If you run the following Raku program:

/./s

you'll see the following error message:

Unsupported use of /s.  In Raku please use: .  or \N.

If you type . in the doc site's search box it lists several entries. One of them is . (regex). Clicking it provides examples and says:

An unescaped dot . in a regex matches any single character. ...

Notably . also matches a logical newline \n

My guess is you either didn't look for answers before asking here on SO (which is fair enough -- I'm not saying don't; that said you can often easily get good answers nearly instantly if you look in the right places, which I'll cover in this answer) or weren't satisfied by the answers you got (in which case, again, read on).

In case I've merely repeated what you've already read, or it's not enough info, I will provide a better answer below, after I write up an initial attempt to give a similar answer for your \Q...\E question -- and fail when I try the doc step.

What is Raku equivalent for Perl \Q...\E?

'...', or $foo if the ... was metasyntax for a variable name.

If you run the following Raku program:

/\Qfoo\E/

you'll see the following error message:

Unsupported use of \Q as quotemeta.  In Raku please use: quotes or
literal variable match.

If you type \Q...\E in the doc site's search box it lists just one entry: Not in Index (try site search). If you go ahead and try the search as suggested, you'll get matching pages according to google. For me the third page/match listed (Perl to Raku guide - in a nutshell: "using String::ShellQuote (because \Q…\E is not completely right) ...") is the only true positive match of \Q...\E among 27 matches. And it's obviously not what you're interested in.

So, searching the doc for \S...\E appears to be a total bust.


How does one get answers to a question like "what is the Raku equivalent of Perl's \Q...\E?" if the doc site ain't helpful (and one doesn't realize Rakudo happens to have a built in error message dedicated to the exact thing of interest and/or isn't sure what the error message means)? What about questions where neither Rakudo nor the doc site are illuminating?

SO is one option, but what lets folk interested in Raku frequently get good/great answers to their questions easily and quickly when they can't get them from the doc site because the answer is hard to find or simply doesn't exist in the docs?

Easily get better answers more quickly than asking SO Qs

The docs website doesn't always yield a good answer to simple questions. Sometimes, as we clearly see with the \Q...\E case, it doesn't yield any answer at all for the relevant search term.

Fortunately there are several other easily searchable sources of rich and highly relevant info that often work when the doc site does not for certain kinds of info/searches. This is especially likely if you've got precise search terms in mind such as /s or \Q...\E and/or are willing browse info provided it's high signal / low noise. I'll introduce two of these resources in the remainder of this answer.

Archived "spec" docs

Raku's design was written up in a series of "spec" docs written principally by Larry Wall over a 2 decade period.

(The word "specs" is short for "specification speculations". It's both ultra authoritative detailed and precise specifications of the Raku language, authored primarily by Larry Wall himself, and mere speculations -- because it was all subject to implementation. And the two aspects are left entangled, and now out-of-date. So don't rely on them 100% -- but don't ignore them either.)

The "specs", aka design docs, are a fantastic resource. You can search them using google by entering your search terms in the search box at design.raku.org.


A search for /s lists 25 pages. The only useful match is Synopsis 5: Regexes and Rules ("24 Jun 2002 — There are no /s or /m modifiers (changes to the meta-characters replace them - see below)." Click it. Then do an in-page search for /s (note the space). You'll see 3 matches:

There are no /s or /m modifiers (changes to the meta-characters replace them - see below)

A dot . now matches any character including newline. (The /s modifier is gone.)

. matches an anything, while \N matches an anything except what \n matches. (The /s modifier is gone.) In particular, \N matches neither carriage return nor line feed.


A search for \Q...\E lists 7 pages. The only useful match is again Synopsis 5: Regexes and Rules ("24 Jun 2002 — \Q$var\E / ..."). Click it. Then do an in-page search for \Q. You'll see 2 matches:

In Raku / $var / is like a Perl / \Q$var\E /

\Q...\E sequences are gone.

Chat logs

I've expanded the Quicker answers section of my answer to one of your earlier Qs to discuss searching the Raku "chat logs". They are an incredibly rich mine of info with outstanding search features. Please read that section of my prior answer for clear general guidance. The rest of this answer will illustrate for /s and \Q...\E.


A search for the regex / newline . ** ^200 '/s' / in the old Raku channel from 2010 thru 2015 found this match:

. matches an anything, while \N matches an anything except what \n matches. (The /s modifier is gone.) In particular, \N matches neither carriage return nor line feed.

Note the shrewdness of my regex. The pattern is the word "newline" (which is hopefully not too common) followed within 200 characters by the two character sequence /s (which I suspect is more common than newline). And I constrained to 2010-2014 because a search for that regex of the entire 15 years of the old Raku channel would tax Liz's server and time out. I got that hit I've quoted above within a couple minutes of trying to find some suitable match of /s (not end-of-sarcasm!).


A search for \Q in the old Raku channel was an immediate success. Within 30 seconds of the thought "I could search the logs" I had a bunch of useful matches.

what does || symbol means in Preg_replace() PHP

| separates alternatives in the regexp; e.g. /abc|def|ghi/ matches either abc, def, or ghi.

When you write 1st=||//e the resulting regexp will be /site||//e. Two of the alternatives are empty strings, which will match the empty strings before and after each character. So this will call phpinfo() for each character in $ka.

Actually, you should get an error because you have two / at the end of the regexp. It should be 1st=/e or 1st=||/e.



Related Topics



Leave a reply



Submit