String.Replaceall(Regex) Makes the Same Replacement Twice

String.replaceAll(regex) makes the same replacement twice

This is not an anomaly: .* can match anything.

You ask to replace all occurrences:

  • the first occurrence does match the whole string, the regex engine therefore starts from the end of input for the next match;
  • but .* also matches an empty string! It therefore matches an empty string at the end of the input, and replaces it with a.

Using .+ instead will not exhibit this problem since this regex cannot match an empty string (it requires at least one character to match).

Or, use .replaceFirst() to only replace the first occurrence:

"test".replaceFirst(".*", "a")
^^^^^^^^^^^^

Now, why .* behaves like it does and does not match more than twice (it theoretically could) is an interesting thing to consider. See below:

# Before first run
regex: |.*
input: |whatever
# After first run
regex: .*|
input: whatever|
#before second run
regex: |.*
input: whatever|
#after second run: since .* can match an empty string, it it satisfied...
regex: .*|
input: whatever|
# However, this means the regex engine matched an empty input.
# All regex engines, in this situation, will shift
# one character further in the input.
# So, before third run, the situation is:
regex: |.*
input: whatever<|ExhaustionOfInput>
# Nothing can ever match here: out

Note that, as @A.H. notes in the comments, not all regex engines behave this way. GNU sed for instance will consider that it has exhausted the input after the first match.

Why does my Regex.Replace string contain the replacement value twice?

There are actually 2 matches in your Regex. You defined your match like this:

string match = "(.*)";

It means match zero or more characters, so you have 2 matches - empty string and your text. In order to fix it change the pattern to

string match = "(.+)";

It means match one or more characters - in that case you will only get a single match

How can I more efficiently call replaceAll twice on a single string

Because of the first line, output is basically the equivalent of

Pattern.compile("(\\r|\\n|\\t)").matcher(obj).replaceAll("")

Because of that, you can replace the variable output in the second line with Pattern.compile("(\\r|\\n|\\t)").matcher(obj).replaceAll(""). Then the line would become

Pattern.compile("[^\\p{Print}]").matcher(Pattern.compile("(\\r|\\n|\\t)").matcher(obj).replaceAll("")).replaceAll(replacement);

However, this does not really improve performance, and has a negative impact on readability. Unless you have a really good reason, it would be best to just use the first two lines.

Why does String.replaceAll( .* , REPLACEMENT ) give unexpected behavior in Java 8?

// specify start and end of line

String regexStr = "^.*$";
String replacementStr = "REPLACEMENT"
String initialStr = "hello";
String finalStr = initialStr.replaceAll(regexStr, replacementStr);

Replace multiple characters in one replace call

If you want to replace multiple characters you can call the String.prototype.replace() with the replacement argument being a function that gets called for each match. All you need is an object representing the character mapping that you will use in that function.

For example, if you want a replaced with x, b with y, and c with z, you can do something like this:

const chars = {'a':'x','b':'y','c':'z'};
let s = '234abc567bbbbac';
s = s.replace(/[abc]/g, m => chars[m]);
console.log(s);

Output: 234xyz567yyyyxz

replaceAll() works once, but not twice?

Because . mean anything so escape it.

. in regex will match any character so it will replace everything in string so simply you should take advantage of replace instead of costly regex

 book.replace(",", "");

or
remove both , and . in single step

 book.replaceAll("[.,]", "");

[.,] : [] mean a character class which mean match both comma and dot


Just in case , if you want to use replace to remove single-single character then you can apply a chain of replace function as

String book ="The .bo..ok of ,eli..";
book.replace(",","").replace(".",""); // The book of eli

How to replace a String twice with oldValues having same text within?

You should use a regex, like this: (VB, tested)

Regex.Replace(str, "(Chairman\s+)?(Joe\s+)?Smith", _
"<a href='smithbio.html'>$0</a>")

$0 is one of several expressions that can be included in the replacement string.

If you only know the names at runtime, you should make sure to call Regex.Escape.

String.replaceAll single backslashes with double backslashes

The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:

string.replaceAll("\\\\", "\\\\\\\\");

But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:

string.replace("\\", "\\\\");

Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.

Python + Regex + Replace pattern with multiple copies of that pattern

You should use single-quotes, raw strings, and re.sub:

string = r'\\\" asdf \" \ \ \\"'
new_string = re.sub(r'(\\+)"', r'\1\1"', string)
print(new_string)

Output:

\\\\\\" asdf \\" \ \ \\\\"


The Pattern

To explain the pattern, first let's remove the parentheses; they don't affect what's matched, and we'll put them back later. The pattern r'\\+"' means "one or more backslashes followed by a double-quote". Even though it's a raw string, we still have to escape the backslash because backslashes have special meaning in regular expressions; that's why it's r'\\+"' instead of r'\+"'.

The Parentheses

The parentheses around the \\+ in the actual pattern just mean "capture the part of the match inside these parentheses". This will put the substring of all backslashes in this match into a capture group. We're going to use this capture group in the replacement string.

The Replacement String

The replacement string, r'\1\1"', just means "two copies of the first capture group followed by a double-quote" (in this case there's only one capture group, but there can be more). The reason the replacement string has a double-quote is because the match had a double-quote; since the entire match is replaced by the replacement string, if the replacement string didn't have a double-quote, the double-quotes would be removed.



Related Topics



Leave a reply



Submit