Escape Special Character in Regex

How to escape special characters in regular expressions


SOLUTION

You can add and use a function escapeRegExp() that will escape special
characters as found in MDN - Regular Expressions article:

function escapeRegExp(string){
return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}

// ... skipped ...

table.column('myColumn:name').search(escapeRegExp(searchString), true, false, true).draw();

DEMO

See this jsFiddle for code and demonstration.

Java Regular Expression special character escape

You only need to escape ^ when you want to match it literally, that is, you want to look for text containing the ^ character.

If you intend to use the ^ with its special meaning (the start of a line/string) then there is no need to escape it. Simply type

"^[a-zA-Z0-9!~`@#$%\\^]"

in your source code. The backslashes towards the end of this regular expression do not matter. You need to type 2 backslashes because of the special meaning of the backslash in Java but that has nothing to do with its treatment regular expressions. The regular expression engine receives a single backslash which it uses to read the following character as literal but ^ is a literal within brackets anyway.

To elaborate on your comment about [ and ]:

The brackets have a special meaning in regular expressions as they basically form the boundaries of the character list given by a pattern (the mentioned characters form a so called character class). Let's decompose the regular expression from above to make things clear.

^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\ Backslash. Regular expression engine only receives single backslash as the other backslash is consumed by Java's syntax for Strings. Would be used to mark following character as literal but ^ is a literal in character class definitions anyway so theses backslashes are ignored.
^ Caret, literally
] Closing boundary of your character class

The order of patterns within the character class definition is irrelevant.
The expression above matches matches if the first character of the examined text is part of your character class definition. It depends on how you use the regular expression if the other characters in the examined text matter.

When you start with regular expressions you should always use multiple test texts to match a against and verify the behaviour. It is also advisable to make these test cases a unit test to get high confidence of the correct behaviour of your program.

A simple code sample to test the expression is as follows:

public class Test {
public static void main(String[] args) {
String regexp = "^[ a-zA-Z0-9!~`@#$%\\\\^\\[\\]]+$";
String[] testdata = new String[] {
"abc",
"2332",
"some@test",
"test [ and ] test end",
// Following sample will not match the pattern.
"äöüßµøł"
};
for (String toExamine : testdata) {
if (toExamine.matches(regexp)) {
System.out.println("Match: " + toExamine);
} else {
System.out.println("No match: " + toExamine);
}
}
}
}

Note the I use a modified pattern here. It ensures all characters in the examined string are matching your character class. I did extend the character class to allow for a \ and space and [ and ].
The decomposed description is:

^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\\\ Backslash, literally. Regular expression engine only receives 2 backslashes as every other backslash is consumed by Java's syntax for Strings. The first backslash is seen as marking the second backslash a occurring literally in the string.
^ Caret, literally
\\[ Opening bracket, literally. The backslash makes the bracket loose its meaning as opening a character class definition.
\\] Closing bracket, literally. The backslash makes the bracket loose its meaning as closing a character class definition.
] Closing boundary of your character class
+ Means any number of characters matching your character class definition can occur, but at least 1 such character needs to be present for a match
$ Matches the start of the text

One thing I don't get though is why one would use the characters of American keyboards as criteria for validation.

What does `escape a string` mean in Regex? (Javascript)

Many characters in regular expressions have special meanings. For instance, the dot character '.' means "any one character". There are a great deal of these specially-defined characters, and sometimes, you want to search for one, not use its special meaning.

See this example to search for any filename that contains a '.':

/^[^.]+\..+/

In the example, there are 3 dots, but our description says that we're only looking for one. Let's break it down by the dots:

  • Dot #1 is used inside a "character class" (the characters inside the square brackets), which tells the regex engine to search for "any one character" that is not a '.', and the "+" says to keep going until there are no more characters or the next character is the '.' that we're looking for.
  • Dot #2 is preceded by a backslash, which says that we're looking for a literal '.' in the string (without the backslash, it would be using its special meaning, which is looking for "any one character"). This dot is said to be "escaped", because it's special meaning is not being used in this context - the backslash immediately before it made that happen.
  • Dot #3 is simply looking for "any one character" again, and the '+' following it says to keep doing that until it runs out of characters.

So, the backslash is used to "escape" the character immediately following it; as such, it's called the "escape character". That just means that the character's special meaning is taken away in that one place.

Now, escaping a string (in regex terms) means finding all of the characters with special meaning and putting a backslash in front of them, including in front of other backslash characters. When you've done this one time on the string, you have officially "escaped the string".

Is there a RegExp.escape function in JavaScript?

The function linked in another answer is insufficient. It fails to escape ^ or $ (start and end of string), or -, which in a character group is used for ranges.

Use this function:

function escapeRegex(string) {
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}

While it may seem unnecessary at first glance, escaping - (as well as ^) makes the function suitable for escaping characters to be inserted into a character class as well as the body of the regex.

Escaping / makes the function suitable for escaping characters to be used in a JavaScript regex literal for later evaluation.

As there is no downside to escaping either of them, it makes sense to escape to cover wider use cases.

And yes, it is a disappointing failing that this is not part of standard JavaScript.

How can I escape special characters in regex and Javascript

On this line:

if (userInput.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~ ()]/g,"").match(new RegExp('(^|\\s)'+key.toLowerCase()+'(\\s|$)')))  {{

you remove spaces (there is a space in the character class in /[.,\/#!$%\^&\*;:{}=\-_`~ ()]/g). That is why your regex (that you build with new RegExp('(^|\\s)'+key.toLowerCase()+'(\\s|$)')) matches only when the string is equal to key, it expects the key in between whitespaces or start/end of string.

You need to remove the space from replacement and apply this operation both on the input and key:

if (userInput.replace(/[.,\/#!$%^&*;:{}=\-_`~()\\]/g,"").match(new RegExp('(^|\\s)'+key.replace(/[.,\/#!$%^&*;:{}=\-_`~()\\]/g,"")+'(\\s|$)', 'i')))  {{

Note ^ and ; need no escaping. I also added a backslash to the special char character class.

Note there is no need to turn the case to lower, you can simply pass the i case insensitive flag to regex.

Escape special characters for regex pattern in Powershell

Doing this with a single regex makes for a complex and hard to read regex. Make several smaller tests, and they are easier to read - and you can provide a good error message because you can tell which one failed:

at least 7 characters

$pass.Length -ge 7

at least 1 upper case letter

$pass -cmatch '[A-Z]' (cmatch is case sensitive)

at least 1 lower case letter

$pass -cmatch '[a-z]'

and a special character to include white space.

$pass -match '\W' (\W is not word characters; not a letter or digit)


There is also [regex]::Escape($Text) which will escape characters in a string that could be interpreted by the regex engine as patterns. You would still need to handle quotes and backticks when writing the $Text variable so that the PowerShell string processor does not get confused; use a single quoted string and you only need to escape single quotes inside it.


Do note that NIST password guidelines recommend against this kind of password complexity testing, and instead recommend only:

  • at least 12 characters.
  • checked a list of passwords found in breaches, rejected if it's one of those.

RegEx for escaping special characters


My doubt is [ \ ^ $ . | ? * + ( ) all these need to be escaped before passing new RegExp() or only (backslashes \) alone need to be escaped. Which one need to be escaped or not be escaped is not clear to me?

Your question is answered right at the start of the document section you refer to. Read that again:

If you need to use any of the special characters literally (actually searching for a '*', for instance), you must escape it …

Conversely, if you need any of the special characters to have its special meaning, you must not escape it.

Besides the above, any backslash which is to be placed in the string has to be doubled if assigned from a string literal.



Related Topics



Leave a reply



Submit