Handling Backreferences to Capturing Groups in Re.Sub Replacement Pattern

Handling backreferences to capturing groups in re.sub replacement pattern

You should be using raw strings for regex, try the following:

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)

With your current code, the backslashes in your replacement string are escaping the digits, so you are replacing all matches the equivalent of chr(1) + "," + chr(2):

>>> '\1,\2'
'\x01,\x02'
>>> print '\1,\2'
,
>>> print r'\1,\2' # this is what you actually want
\1,\2

Any time you want to leave the backslash in the string, use the r prefix, or escape each backslash (\\1,\\2).

Can't use '\1' backreference to capture-group in a function call in re.sub() repr expression

The reason the re.sub(r'([0-9])',A[int(r'\g<1>')],S) does not work is that \g<1> (which is an unambiguous representation of the first backreference otherwise written as \1) backreference only works when used in the string replacement pattern. If you pass it to another method, it will "see" just \g<1> literal string, since the re module won't have any chance of evaluating it at that time. re engine only evaluates it during a match, but the A[int(r'\g<1>')] part is evaluated before the re engine attempts to find a match.

That is why it is made possible to use callback methods inside re.sub as the replacement argument: you may pass the matched group values to any external methods for advanced manipulation.

See the re documentation:

re.sub(pattern, repl, string, count=0, flags=0)

If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.

Use

import re
S = '02143'
A = ['a','b','c','d','e']
print(re.sub(r'[0-9]',lambda x: A[int(x.group())],S))

See the Python demo

Note you do not need to capture the whole pattern with parentheses, you can access the whole match with x.group().

How to use python regex to replace using captured group?

You need to escape your backslash:

p.sub('gray \\1', s)

alternatively you can use a raw string as you already did for the regex:

p.sub(r'gray \1', s)

Python re.sub() is replacing the full match even when using non-capturing groups

The general solution for such problems is using a lambda in the replacement:

string = 'aBCDeFGH'

print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', lambda match: '+%s+%s' % (match.group(2), match.group(4)), string))

However, as bro-grammer has commented, you can use backreferences in this case:

print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', r'+\2+\4', string))

Backreferences to capturing groups in re.sub not working as expected

I think you are looking for something like this, as you requested in this part of your question:

I want to remove any "\" followed by digits

import re
s = r"somecharacters\15othercharacters"
s = re.sub(r"\\\d+", '', s)
print(s)

When run this outputs:

somecharactersothercharacters

Replacing named capturing groups with re.sub

def repl(matchobj):
if matchobj.group(3):
return matchobj.group(1)+matchobj.group(3)
else:
return matchobj.group(1)

my_str = "Here's some <first>sample stuff</first> in the " \
"<second>middle</second> of some other text."

pattern = r'(?P<text>.*?)(?:<(?P<tag>\w+)>(?P<content>.*)</(?P=tag)>|$)'
print re.sub(pattern, repl, my_str)

You can use the call function of re.sub.

Edit:
cleaned = re.sub(pattern, r'\g<text>\g<content>', my_str) this will not work as when the last bit of string matches i.e of some other text. there is \g<text> defined but no \g<content> as there is not content.But you still ask re.sub to do it.So it generates the error.If you use the string "Here's some <first>sample stuff</first> in the <second>middle</second>" then your print re.sub(pattern,r"\g<text>\g<content>", my_str) will work as \g<content> is defined all the time here.

python re.sub group: number after \number

The answer is:

re.sub(r'(foo)', r'\g<1>123', 'foobar')

Relevant excerpt from the docs:

In addition to character escapes and
backreferences as described above,
\g will use the substring
matched by the group named name, as
defined by the (?P...) syntax.
\g uses the corresponding
group number; \g<2> is therefore
equivalent to \2, but isn’t ambiguous
in a replacement such as \g<2>0. \20
would be interpreted as a reference to
group 20, not a reference to group 2
followed by the literal character '0'.
The backreference \g<0> substitutes in
the entire substring matched by the
RE.

Can patterns be used for replacement of a group in re.sub in python

So it appears that a lambda function here would do the job. The solution looks like that:

import re
x = "Insert into SEC_DATA (HEADER_ID,TIMESTMP,BOMLABEL,TI_TO_EX,HOLD_TI,STRIKE,STRIKE_FORM,SDVALUE,VALUE1,VALUE2) values ('Swaption-Volatilitäten',to_date('02/03/2016 00:00:00','DD/MM/YYYY HH24:MI:SS'),'BID','5400','10800','0','D','0,595','0','0');"
var = re.sub(r"(\d),(\d)", lambda match: "%s.%s" % (match.group(1), match.group(2)), x)

Contributor:
User pts gave an accurate answer to a similar question

Thanks and cheers!

re.sub using replacement text from dict

You can use lambda or functions in re.sub:

import re

d = {
'1': 'a',
'2': 'b'
}

print( re.sub('(\d), (\d)', lambda g: d[g.group(1)] + ', ' + d[g.group(2)], '1, 2') )

Prints:

a, b

From the documentation:

re.sub(pattern, repl, string, count=0, flags=0)

If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string. For example:

Python's re.sub returns data in wrong encoding from unicode

Because '\1' is the character with codepoint 1 (and its repr form is '\x01'). re.sub never saw your backslash, per the rules on string literals. Even if you did escape it, such as in r'\1' or '\\1', reference 1 isn't the right number; you need parenthesis to define groups. r'\g<0>' would work as described in the re.sub documentation.



Related Topics



Leave a reply



Submit