How to Replace Captured Groups Only

How to replace captured groups only?

A solution is to add captures for the preceding and following text:

str.replace(/(.*name="\w+)(\d+)(\w+".*)/, "$1!NEW_ID!$3")

Explanation

The parentheses are used to create "groups", which then get assigned a base-1 index, accessible in a replace with a $.

  • the first word (\w+) is in a group, and becomes $1
  • the middle part (\d+) is the second group (but gets ignored in the replace)
  • the third group (\w+".*) becomes $3

So when you give the replace string of "$1!new_ID!$3", the $1 and $3 are replaced automagically with the first group and third group, allowing the 2nd group to be replaced with the new string, maintaining the text surrounding it.

Using re.sub with capture groups to replace only portion of a match

Use a lookahead to match part of the string without replacing it.

pattern = r'\A\w+(?=[@+\-/*])'

You don't need a capture group when you're just removing the match; it's needed if you need to copy parts of the input text into the result. You also don't need [] around \w. And you should get rid of the * after [@+\-/*], since you want to require one of those characters.

You should generally use raw strings when creating regular expressions, so that the regexp escape sequences won't be confused for Python escape sequences. And you should escape - in a character set, otherwise it's used to create a range of characters.

Regex in C# How to replace only capture groups and not non-capture groups

Instead of trying to ignore the strings with words and!@#$%^&*()_- in them, I just included them in my search, placed an extra single quote on either end, and then remove all instances of two single quotes like so:

 // Find any string of words and !@#$%^&*()_- in and out of quotes.
Regex getwords = new Regex(@"(^(?!and\b)(?!or\b)(?!not\b)(?!empty\b)(?!notempty\b)(?!currentdate\b)([\w!@#$%^&*())_-]+)|((?!and\b)(?!or\b)(?!not\b)(?!empty\b)(?!notempty\b)(?!currentdate\b)(?<=\W)([\w!@#$%^&*()_-]+)|('[\w\s!@#$%^&*()_-]+')))", RegexOptions.IgnoreCase);
// Find all cases of two single quotes
Regex getQuotes = new Regex(@"('')");

// Get string from user
Console.WriteLine("Type in a string");
string search = Console.ReadLine();

// Execute Expressions.
search = getwords.Replace(search, "'$1'");
search = getQuotes.Replace(search, "'");

Replace only capturing group - regex

Update:
Final addition: while the expressions below will work in most cases, they can't cope with markup like this:

var example = "I want to <strong>replace</strong> all strong tags with the <i class='strong-text stronger'>better</i> b tag";

The tags will be replaced just fine, but note the class attribute: strong-text stronger will be replaced with "b-text ber". As far as stronger is concerned: adding word-boundaries will fix that issue, but strong-text will still cause problems.

We have to make sure that the matched substring "strong" is not an attribute of any kind. Thankfully, this is an easy fix: attribute values are preceded by an equal sign, and 99% of the time, single or double quotes. Using the following pattern, then, prevents us replacing attribute values:

example.replace(/(<[^>="']*?)\bstrong\b([^>]*>)/gi, "$1b$2");
//result:
//"I want to <b>replace</b> all strong tags with the <i class='strong-text stronger'>better</i> b tag"

Pattern explanation:
- (<[^>="']*?) same as below, but we've excluded =, ' and " from being the match, meaning <p class="strong"> won't match, as there is a = char between the opening < and strong.
- \bstrong\b: added word-boundaries (see below)
- The rest of the pattern remains unchanged.

Anyway, that's probably as close as you're going to get to a reliable pattern. Still: look into using an XML parser if you're planning on consuming a lot of markup, because RegExp is not the best tool for the job


Initial answer

You want to replace "strong" with "b", and leave everything else as-is, right? Well in that case, you should group everything except that which you are trying to replace, and use back-references to the groups in your replacement string:

"My name is <strong>Tariq</strong>".replace(/(<\s*\/?\s*)strong(\s*>)/gi,'$1b$2');

As ever: RegEx is not the best tool for consuming markup languages, and your pattern is not perfect: it can't handle tags with attributes, for example. Change the pattern to matching in a "everything-except" way, rather then "match this or that":

/(<[^>]*?)strong([^>]*>)/gi

How it works:

  • (<[^>]*?): Match and capture < followed by any char (0 or more) that is not >. Non greedy, match will end as soon as the rest of the pattern is found
  • strong: literal match for string
  • ([^>]*>): Match zero or more non > chars, and a closing >. This match is also captured

  • Replace entire match with $1b$2 or <group1>b<group2> this preserves any attributes and/or spaces the markup contained.

As a result, markup like this is processed correctly:

"My name is <strong id='someId'>Tariq</strong>".replace(/(<[^>]*?)strong([^>]*>)/gi,'$1b$2');
//output:
//My name is <b id='someId'>Tariq</b>

Inspired by Harpeet's (somewhat flawed) regex, you could also opt to use this pattern:

str.replace(/\bstrong\b(?=[^<>]*>)/gi, 'b')

If nothing else, it is a more elegant looking pattern.

Explained:

  • \bstrong\b: matches string literal, if it is not part of a word (\b are word boundaries)
  • (?=[^<>]*>): only if it is followed by 0 or more chars that aren't < or >, and a closing >. If we omit the < from the exclusion group, you risk replacing the word strong when it's not part of a string: 'a strong sense<br>'.replace(/strong(?=[^>]*>)/gi, 'b'); results in "a b sense".

Replacing only the captured group using re.sub and multiple replacements

You can use a lookbehind and lookahead based regex and then a lambda function to iterate through replacements words:

>>> words = ['Swimming', 'Eating', 'Jogging']
>>> pattern = re.compile(r'(?<=I love )\w+(?=\.)')
>>> print pattern.sub(lambda m: words.pop(0), string)
'I love Swimming. I love Eating. I love Jogging.'

Code Demo

Replace capture group of dynamic size

You can use the sticky flag y (but Internet Explorer doesn't support it):

s = s.replace(/(^https?:\/\/.*?\/path1\/?|(?!^))./gy, '$1*')

But the simplest (and that is supported everywhere), is to use a function as replacement parameter.

s = s.replace(/^(https?:\/\/.+\/path1\/?)(.*)/, function (_, m1, m2) {
return m1 + '*'.repeat(m2.length);
});

For the second case, you can simply check if there's an @ after the current position:

s = s.replace(/.(?=.*@)/g, '*');

sed - capture a group and replace only one character

$ cat ip.txt
foo 1XYZ00
xyz 1 2 3
hi 3XYZ00
1XYZ0A
cool 3ABC23

$ # matches any number followed by 3 uppercase and 2 digit characters
$ sed -E 's/[0-9]([A-Z]{3}[0-9]{2})/9\1/' ip.txt
foo 9XYZ00
xyz 1 2 3
hi 9XYZ00
1XYZ0A
cool 9ABC23

$ # matches digit '1' followed by 3 uppercase and 2 digit characters
$ sed -E 's/1([A-Z]{3}[0-9]{2})/9\1/' ip.txt
foo 9XYZ00
xyz 1 2 3
hi 3XYZ00
1XYZ0A
cool 3ABC23

Issue with OP's attempts:

  • 1{1}[A-Z]{3}[0-9]{2} is same as 1[A-Z]{3}[0-9]{2}
  • Using 9{1}[A-Z]{3}[0-9]{2} in replacement section will give you those characters literally. They don't have any special meaning.
  • s/1\(1{1}[A-Z]{3}[0-9]{2}\)/9\1/ this one does use capture groups but () shouldn't be escaped with -E option active and 1{1} shouldn't be part of the capture group

How to replace the captured group in Ruby

sub! will replace the first match every iteration on part_number which is outside of the loop.

What happens is:

In the first iteration, the first A will be replaced with A giving the same

R1L16SB#AA
^

In the second iteration, the first A will be replaced by B giving

R1L16SB#BA
^

In the third iteration, the first B will be replaced by C giving

R1L16SC#BA
^

One way to get the desired output is to put part_number = 'R1L16SB#AA' inside the loop.

Ruby demo

Replace capturing group with return value from passing capturing group to a function

Anyway, just try:

return p.sub(lambda match: translateWord(match.group(1)), sentence)

It looks like you got confused about what to pass as the second parameter to re.sub: you pass the actual function (in this case, the lambda expression), no need to try to embed that in a string.

If you want to change just a group though, the re methods don't give direct support to it - instead, you have to recreate the a single string with the whole match, replacing the groups you want to change yourself.

The easier way is to expand your "lambda" function into another multi-line function that will do that mangling for you. It can then use the .regs attribute on the match object it receives to know the groups limits (start and end), and build your replacing string:


def replace_group(match):
sentence = translateWord(match.group(1))
matched = match.group(0)
new_sentence = matched[:match.regs[1][0]] + sentence + matched[match.regs[1][1]:]
return new_sentence


Related Topics



Leave a reply



Submit