Escaping special characters in Java Regular Expressions
Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?
If you are looking for a way to create constants that you can use in your regex patterns, then just prepending them with "\\"
should work but there is no nice Pattern.escape('.')
function to help with this.
So if you are trying to match "\\d"
(the string \d
instead of a decimal character) then you would do:
// this will match on \d as opposed to a decimal character
String matchBackslashD = "\\\\d";
// as opposed to
String matchDecimalDigit = "\\d";
The 4 slashes in the Java string turn into 2 slashes in the regex pattern. 2 backslashes in a regex pattern matches the backslash itself. Prepending any special character with backslash turns it into a normal character instead of a special one.
matchPeriod = "\\.";
matchPlus = "\\+";
matchParens = "\\(\\)";
...
In your post you use the Pattern.quote(string)
method. This method wraps your pattern between "\\Q"
and "\\E"
so you can match a string even if it happens to have a special regex character in it (+
, .
, \\d
, etc.)
List of all special characters that need to be escaped in a regex
You can look at the javadoc of the Pattern class: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
You need to escape any char listed there if you want the regular char and not the special meaning.
As a maybe simpler solution, you can put the template between \Q and \E - everything between them is considered as escaped.
Java Regular Expression special character escape
You only need to escape ^
when you want to match it literally, that is, you want to look for text containing the ^ character.
If you intend to use the ^ with its special meaning (the start of a line/string) then there is no need to escape it. Simply type
"^[a-zA-Z0-9!~`@#$%\\^]"
in your source code. The backslashes towards the end of this regular expression do not matter. You need to type 2 backslashes because of the special meaning of the backslash in Java but that has nothing to do with its treatment regular expressions. The regular expression engine receives a single backslash which it uses to read the following character as literal but ^ is a literal within brackets anyway.
To elaborate on your comment about [ and ]:
The brackets have a special meaning in regular expressions as they basically form the boundaries of the character list given by a pattern (the mentioned characters form a so called character class). Let's decompose the regular expression from above to make things clear.
^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\ Backslash. Regular expression engine only receives single backslash as the other backslash is consumed by Java's syntax for Strings. Would be used to mark following character as literal but ^ is a literal in character class definitions anyway so theses backslashes are ignored.
^ Caret, literally
] Closing boundary of your character class
The order of patterns within the character class definition is irrelevant.
The expression above matches matches if the first character of the examined text is part of your character class definition. It depends on how you use the regular expression if the other characters in the examined text matter.
When you start with regular expressions you should always use multiple test texts to match a against and verify the behaviour. It is also advisable to make these test cases a unit test to get high confidence of the correct behaviour of your program.
A simple code sample to test the expression is as follows:
public class Test {
public static void main(String[] args) {
String regexp = "^[ a-zA-Z0-9!~`@#$%\\\\^\\[\\]]+$";
String[] testdata = new String[] {
"abc",
"2332",
"some@test",
"test [ and ] test end",
// Following sample will not match the pattern.
"äöüßµøł"
};
for (String toExamine : testdata) {
if (toExamine.matches(regexp)) {
System.out.println("Match: " + toExamine);
} else {
System.out.println("No match: " + toExamine);
}
}
}
}
Note the I use a modified pattern here. It ensures all characters in the examined string are matching your character class. I did extend the character class to allow for a \ and space and [ and ].
The decomposed description is:
^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\\\ Backslash, literally. Regular expression engine only receives 2 backslashes as every other backslash is consumed by Java's syntax for Strings. The first backslash is seen as marking the second backslash a occurring literally in the string.
^ Caret, literally
\\[ Opening bracket, literally. The backslash makes the bracket loose its meaning as opening a character class definition.
\\] Closing bracket, literally. The backslash makes the bracket loose its meaning as closing a character class definition.
] Closing boundary of your character class
+ Means any number of characters matching your character class definition can occur, but at least 1 such character needs to be present for a match
$ Matches the start of the text
One thing I don't get though is why one would use the characters of American keyboards as criteria for validation.
Java Regex double backslash escaping special characters
Use 4 backslashes:
Pattern.compile("((([a-zA-Z0-9])([a-zA-Z0-9 ]*)\\\\?)+)")
^^^^
- You need to match a backslash char:
\
. - A backslash is a special char for regexps (used for predefined classes such as
\d
for example), which needs to be escaped by another backslash:\\
. - As Java uses string literals for regexps, and a backslash also is a special char for string literals (used for the line feed char
\n
for example), each backslash needs to be escaped by another backslash:\\\\
.
How to escape special characters in a regex pattern in java?
You can use this regex with a capturing group:
String myString = "Patient:\n${ss.patient.howard.firstName} ${ss.patient.howard.lastName}\nGender: ${ss.patient.howard.sex}\nBirthdate: ${ss.patient.howard.dob}\n${ss.patient.howard.addressLine1}\nPhone: (801)546-4765";
myString = myString.replaceAll("\\$\\{[^}]+?\\.([^.}]+)}", "$1");
System.err.println(myString);
([^.}]+)
is the capturing group before }
and after the last DOT.
RegEx Demo
Output:
Patient:
firstName lastName
Gender: sex
Birthdate: dob
addressLine1
Phone: (801)546-4765
Regex pattern including all special characters
Please don't do that... little Unicode BABY ANGEL
s like this one are dying! ◕◡◕ (← these are not images) (nor is the arrow!)
☺
And you are killing 20 years of DOS :-) (the last smiley is called WHITE SMILING FACE
... Now it's at 263A
... But in ancient times it was ALT-1
)
and his friend
☻
BLACK SMILING FACE
... Now it's at 263B
... But in ancient times it was ALT-2
Try a negative match:
Pattern regex = Pattern.compile("[^A-Za-z0-9]");
(this will ok only A-Z
"standard" letters and "standard" 0-9
digits.)
How to escape [] chars in regular expressions
You escape special characters with \
. Note that \
is itself a special character. So something like
.map(l -> l.replaceAll("[,.!?\\[\\]:;]", "")
Skip some special character
Yes, you could do this with a Pattern
and a regular expression. Like,
// Note that the literal [](s) have to be escaped below,
String specialCharacters = "[!#$%&'()*+,.:;=?@\\[\\]^`{|}~]";
String val = "a{b}c";
Pattern p = Pattern.compile(specialCharacters);
System.out.println(p.matcher(val).replaceAll(""));
Which outputs
abc
Related Topics
How to Fix "Unsupported Class File Major Version 60" in Intellij Idea
Count Words in a String Method
Is the Java Hashmap Keyset() Iteration Order Consistent
Sum Values from Specific Field of the Objects in a List
Is There a Way in Java to Determine If a Path Is Valid Without Attempting to Create a File
Hibernate Saveorupdate Behavior
How to Save Preference User Settings in Java
How Much Data Can a List Can Hold at the Maximum
How Many Threads Are Spawned in Parallelstream in Java 8
How to Draw in JPAnel? (Swing/Graphics Java)
How to Run a Java Program Without Main Method
How to Find the Console Width with Java
How to Resize Jlabel Imageicon