What Is the Use of Pattern.Quote Method

What is the use of Pattern.quote method?

\Q means "start of literal text" (i.e. regex "open quote")

\E means "end of literal text" (i.e. regex "close quote")

Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal. For example, Pattern.quote(".*") would match a dot and then an asterisk:

System.out.println("foo".matches(".*")); // true
System.out.println("foo".matches(Pattern.quote(".*"))); // false
System.out.println(".*".matches(Pattern.quote(".*"))); // true

The method's purpose is to not require the programmer to have to remember the special terms \Q and \E and to add a bit of readability to the code - regex is hard enough to read already. Compare:

someString.matches(Pattern.quote(someLiteral));
someString.matches("\\Q" + someLiteral + "\\E"));

Referring to the javadoc:

Returns a literal pattern String for the specified String.

This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.

Metacharacters or escape sequences in the input sequence will be given no special meaning.

Difference between Pattern.quote() and its String concatenation equivalent?

The statement in the answer that:

Calling the Pattern.quote() method wraps the string in \Q...\E, which turns the text is into a regex literal.

Is strictly speaking not correct. Indeed. Because that would give weird results if \Q and \E are already in the original string.

If you call for instance Pattern.quote("\\Q[r.e.g.e.x]\\E") it will produce "\\Q\\Q[r.e.g.e.x]\\E\\\\E\\Q\\E".

As a result wrapping "\\Q" and "\\E" is obviously incorrect (for some edge-cases, I admit that). You better use Pattern.quote if you want to be safe.

The wrapping with "\\Q" and "\\E" you do yourself will be a bit faster (since you save on a method call, an indexOf(..) and an if statement in case there is no "\\E"), but usually you better use libraries since they tend to contain less bugs, and if there are bugs, these are resolved eventually.

You can find the source code here:

public static String quote(String s) {
int slashEIndex = s.indexOf("\\E");
if (slashEIndex == -1)
return "\\Q" + s + "\\E";

StringBuilder sb = new StringBuilder(s.length() * 2);
sb.append("\\Q");
slashEIndex = 0;
int current = 0;
while ((slashEIndex = s.indexOf("\\E", current)) != -1) {
sb.append(s.substring(current, slashEIndex));
current = slashEIndex + 2;
sb.append("\\E\\\\E\\Q");
}
sb.append(s.substring(current, s.length()));
sb.append("\\E");
return sb.toString();
}

So as long as there is no "\\E", we are fine. But in the other case, we have to substitute every "\\E" with "\\E\\\\E\\Q"...

What is the equivalent of Pattern.quote() for MessageFormat?

There isn’t any method for it, but enclosing the entire text in ASCII single-quote characters will accomplish the same thing. You can do ''' substitution as you’ve described, then surround the text with '. From the MessageFormat documentation:

For example, pattern string "'{''}'" is interpreted as a sequence of '{ (start of quoting and a left curly brace), '' (a single quote), and }' (a right curly brace and end of quoting), not '{' and '}' (quoted left and right curly braces): representing string "{'}", not "{}".

Java Pattern.quote

The expression ".*/live/.*" matches paths with the pattern you describe. You can create a Pattern with that.

Alternatively, as Peter said, you could simply ask path.contains("/live/");

Pattern.quote adds \\Q and \\E to the string java

Adding \Q and \E is exactly what Pattern.quote() does! Why would you not want that?

If you need to quote only some characters of that string, then you must do so manually.

Does including Pattern.LITERAL flag as part of the Pattern.compile(String regex, int flags) method in Java mitigate String regex injection?

Check the Pattern.LITERAL documentation:

When this flag is specified then the input string that specifies the pattern is treated as a sequence of literal characters. Metacharacters or escape sequences in the input sequence will be given no special meaning.

So, this flag makes any pattern a plain text. \s will match \s text, not any whitespace.

What you need to make sure of is:

  • Try to write patterns where each subsequent part cannot match the same text as the preceding part to avoid excessive backtracking
  • Escape the user-written literal parts of the pattern using Pattern.quote.

In your case, you can use

Pattern patternCheck = Pattern.compile("check\\s+test\\s+([\\w\\s-]+)cd(\\s+" + Pattern.quote(variable1) + "|\\s+abc\\s+" + Pattern.quote(variable2) + ")\\s+to\\s+(abc|xyz)\\s+test\\s+ab\\s+xyz", Pattern.CASE_INSENSITIVE);

Does using Pattern.LITERAL mean the same as Pattern.quote?

Given the question as is, the answer is no, because of setting x=Pattern.LITERAL leading to quoting s twice in the second expression. With double quoting and s="A" the String "A" won't be matched, but the String "\\QA\\E" will. However,

Pattern.compile(s, x | Pattern.LITERAL)

seem to be equivalent to

Pattern.compile(Pattern.quote(s), x & ~Pattern.LITERAL)

When replacing backslashes with Java regex, why does the Pattern class not recognize single backslashes?

The problem is that \ is also used as escape character within the regular expression. To match a single \ you need a literal regular expression \\ which must be specified as the Java string literal "\\\\". Ugly, I know, but that's how it is.



Related Topics



Leave a reply



Submit