Why String.Replaceall() in Java Requires 4 Slashes "\\\\" in Regex to Actually Replace "\"

Why String.replaceAll() in java requires 4 slashes \\\\ in regex to actually replace \?

@Peter Lawrey's answer describes the mechanics. The "problem" is that backslash is an escape character in both Java string literals, and in the mini-language of regexes. So when you use a string literal to represent a regex, there are two sets of escaping to consider ... depending on what you want the regex to mean.

But why is it like that?

It is a historical thing. Java originally didn't have regexes at all. The syntax rules for Java String literals were borrowed from C / C++, which also didn't have built-in regex support. Awkwardness of double escaping didn't become apparent in Java until they added regex support in the form of the Pattern class ... in Java 1.4.

So how do other languages manage to avoid this?

They do it by providing direct or indirect syntactic support for regexes in the programming language itself. For instance, in Perl, Ruby, Javascript and many other languages, there is a syntax for patterns / regexs (e.g. '/pattern/') where string literal escaping rules do not apply. In C# and Python, they provide an alternative "raw" string literal syntax in which backslashes are not escapes. (But note that if you use the normal C# / Python string syntax, you have the Java problem of double escaping.)


Why do text.replaceAll("\n","/"), text.replaceAll("\\n","/"), and text.replaceAll("\\\n","/") all give the same output?

The first case is a newline character at the String level. The Java regex language treats all non-special characters as matching themselves.

The second case is a backslash followed by an "n" at the String level. The Java regex language interprets a backslash followed by an "n" as a newline.

The final case is a backslash followed by a newline character at the String level. The Java regex language doesn't recognize this as a specific (regex) escape sequence. However in the regex language, a backslash followed by any non-alphabetic character means the latter character. So, a backslash followed by a newline character ... means the same thing as a newline.

Java: replaceAll doesn't work well with backslash?

You need to double each backslash (again) as the Pattern class that is used by replaceAll() treats it as a special character:

String jarPath = "\\\\xyz\\abc\\wtf\\lame\\";
jarPath = jarPath.replaceAll("\\\\\\\\xyz\\\\abc", "z:");

A Java string treats backslash as an escape character so what replaceAll sees is: \\\\xyz\\abc. But replaceAll also treats backslash as an escape character so the regular expression becomes the characters: \ \ x y z \ a b c

String replace a Backslash

sSource = sSource.replace("\\/", "/");
  • String is immutable - each method you invoke on it does not change its state. It returns a new instance holding the new state instead. So you have to assign the new value to a variable (it can be the same variable)
  • replaceAll(..) uses regex. You don't need that.

String's replaceAll() method and escape characters

When replacing characters using regular expressions, you're allowed to use backreferences, such as \1 to replace a using a grouping within the match.

This, however, means that the backslash is a special character, so if you actually want to use a backslash it needs to be escaped.

Which means it needs to actually be escaped twice when using it in a Java string. (First for the string parser, then for the regex parser.)

How to replace \ with \\ in java

Don't use String.replaceAll in this case - that's specified in terms of regular expressions, which means you'd need even more escaping. This should be fine:

String escaped = original.replace("\\", "\\\\");

Note that the backslashes are doubled due to being in Java string literals - so the actual strings involved here are "single backslash" and "double backslash" - not double and quadruple.

replace works on simple strings - no regexes involved.

Replacing all dots in a string with backslashes in Java

Java Strings are made of characters. To allow java programmers to enter strings as 'constants' and part of the Java code, the language allows you to type them in as characters surrounded by '"' quotes.....

 String str = "this is a string";

Some characters are hard to type in to the program, like a newline or tab character. Java introduces an escape mechanism to allow the programmer to enter these characters in to a String. The escape mechanism is the '\' backslash.

 String str = "this contains a tab\t and newline\n";

The problem is that now there is no easy way to enter a backslash, so to enter the backslash has to escape itself:

 String str = "this contains a backslash \\"

The next problem is that Regular Expressions are complicated things, and they also use the backslash \ as an escape character.

Now in, for example, perl, the regular expression \. would match the exact character '.' because in regular expressions the '.' is special, and needs to be escaped with a '\'. To capture that sequence \. in a Java program (as a string constant in the program) we will need to escape the '\' as \\ and our Java equivalent regular expression is \\.. Now, in perl, again, the regular expression to match the actual backslash character is \\. Similarly, we need to escape both of these in Java in the actual code, and it is \\\\.

So, the significance here is that the file-separator character in windows is the backslash \. This single character is stored in the field File.separator. If we want to type the same character in from a Java program, we would have to escape it as \\, but the '\' is already stored in the field, so we do not need to re-escape it for the Java program, but we DO need to escape it for the regular expression....

There are two ways to escape it for the regular expression. You can elect to add a backslash before it with:

"\\" + File.separator 

But this is a bad way to do it because it will not work on Unix (where the separator does not need to be escaped. It is even worse to do what you have done which is to double the file separator:

File.separator+File.separator

The right way to do it is to correctly escape the replacement side of the regular expression with Matcher.quoteReplacement(...)

System.out.println(Test.class.getName().replaceAll("\\.",
Matcher.quoteReplacement(File.separator)) + ".class ")

How Java replaceAll operation works with backslashes?

The documentation of String.replaceAll(regex, replacement) states:

Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll.

The documentation of Matcher.replaceAll(replacement) then states:

backslashes are used to escape literal characters in the replacement string

So to put this more clearly, when you replace with \,, it is as if you were escaping the comma. But what you want is really the \ character, so you should escape it with \\,. Since that in Java, \ also needs to be escaped, the replacement String becomes \\\\,.

If you are having a hard time remembering all this, you can use the method Matcher.quoteReplacement(s), whose goal is to correctly escape the replacement part. Your code would become:

String replacedValue = neName.replaceAll(",", Matcher.quoteReplacement("\\,"));


Related Topics



Leave a reply



Submit