Another Way Instead of Escaping Regex Patterns

Another way instead of escaping regex patterns?

Regexp.new(Regexp.quote('http://www.microsoft.com/'))

Regexp.quote simply escapes any characters that have special regexp meaning; it takes and returns a string. Note that . is also special. After quoting, you can append to the regexp as needed before passing to the constructor. A simple example:

Regexp.new(Regexp.quote('http://www.microsoft.com/') + '(.*)')

This adds a capturing group for the rest of the path.

Handling regex escape replacement text that contains the dollar character

From MSDN:

The replacement parameter specifies the string that is to replace each match in input. replacement can consist of any combination of literal text and substitutions.

The following substitutions are defined:

  • $number
  • ${name}
  • $$
  • $&
  • $`
  • $'
  • $+
  • $_

Substitutions are the only special constructs recognized in a replacement pattern. None of the other regular expression language elements, including character escapes and the period (.), which matches any character, are supported. Similarly, substitution language elements are recognized only in replacement patterns and are never valid in regular expression patterns.

So it look like it's only the $ character that needs to be escaped.

Escaping special characters in Java Regular Expressions

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?

If you are looking for a way to create constants that you can use in your regex patterns, then just prepending them with "\\" should work but there is no nice Pattern.escape('.') function to help with this.

So if you are trying to match "\\d" (the string \d instead of a decimal character) then you would do:

// this will match on \d as opposed to a decimal character
String matchBackslashD = "\\\\d";
// as opposed to
String matchDecimalDigit = "\\d";

The 4 slashes in the Java string turn into 2 slashes in the regex pattern. 2 backslashes in a regex pattern matches the backslash itself. Prepending any special character with backslash turns it into a normal character instead of a special one.

matchPeriod = "\\.";
matchPlus = "\\+";
matchParens = "\\(\\)";
...

In your post you use the Pattern.quote(string) method. This method wraps your pattern between "\\Q" and "\\E" so you can match a string even if it happens to have a special regex character in it (+, ., \\d, etc.)

Escaping a String from getting regex parsed in Java

String.contains does not use regex, so there isn't a problem in this case.

Where a regex is required, rather rejecting strings with regex special characters, use java.util.regex.Pattern.quote to escape them.

How to escape text for regular expression in Java

Since Java 1.5, yes:

Pattern.quote("$5");

What special characters must be escaped in regular expressions?

Which characters you must and which you mustn't escape indeed depends on the regex flavor you're working with.

For PCRE, and most other so-called Perl-compatible flavors, escape these outside character classes:

.^$*+?()[{\|

and these inside character classes:

^-]\

For POSIX extended regexes (ERE), escape these outside character classes (same as PCRE):

.^$*+?()[{\|

Escaping any other characters is an error with POSIX ERE.

Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally, e.g.:

[]^-]

In POSIX basic regular expressions (BRE), these are metacharacters that you need to escape to suppress their meaning:

.^$*[\

Escaping parentheses and curly brackets in BREs gives them the special meaning their unescaped versions have in EREs. Some implementations (e.g. GNU) also give special meaning to other characters when escaped, such as \? and +. Escaping a character other than .^$*(){} is normally an error with BREs.

Inside character classes, BREs follow the same rule as EREs.

If all this makes your head spin, grab a copy of RegexBuddy. On the Create tab, click Insert Token, and then Literal. RegexBuddy will add escapes as needed.

Escaping partial regex pattern

Many regex flavors have a utility method that automatically escapes meta characters. Java does this using Pattern.quote(String) and PHP has a similar function: preg_quote(string). Many PCRE implementations also support the \Q and \E escape sequences. \Q will let the regex engine interpret all characters after it as plain literals until the next \E.

Example:

a\Q+*\Eb+

will match the string a+*bbb.

How can Python's regular expressions work with patterns that have escaped special characters?

Why on earth are you applying re.escape to the string?! You want to find the "special" characters in that! If you just apply it to the pattern, you'll get a match:

>>> import re
>>> string = r'This a string with ^g\.$s'
>>> pattern = r'^g\.$s'
>>> re.search(re.escape(pattern), re.escape(string)) # nope
>>> re.search(re.escape(pattern), string) # yep
<_sre.SRE_Match object at 0x025089F8>

For bonus points, notice that you just need to re.escape the pattern one more times than the string:

>>> re.search(re.escape(re.escape(pattern)), re.escape(string))
<_sre.SRE_Match object at 0x025D8DE8>

Is there a RegExp.escape function in JavaScript?

The function linked in another answer is insufficient. It fails to escape ^ or $ (start and end of string), or -, which in a character group is used for ranges.

Use this function:

function escapeRegex(string) {
return string.replace(/[/\-\\^$*+?.()|[\]{}]/g, '\\$&');
}

While it may seem unnecessary at first glance, escaping - (as well as ^) makes the function suitable for escaping characters to be inserted into a character class as well as the body of the regex.

Escaping / makes the function suitable for escaping characters to be used in a JavaScript regex literal for later evaluation.

As there is no downside to escaping either of them, it makes sense to escape to cover wider use cases.

And yes, it is a disappointing failing that this is not part of standard JavaScript.



Related Topics



Leave a reply



Submit