Python Re.Sub Group: Number After \Number

python re.sub group: number after \number

The answer is:

re.sub(r'(foo)', r'\g<1>123', 'foobar')

Relevant excerpt from the docs:

In addition to character escapes and
backreferences as described above,
\g will use the substring
matched by the group named name, as
defined by the (?P...) syntax.
\g uses the corresponding
group number; \g<2> is therefore
equivalent to \2, but isn’t ambiguous
in a replacement such as \g<2>0. \20
would be interpreted as a reference to
group 20, not a reference to group 2
followed by the literal character '0'.
The backreference \g<0> substitutes in
the entire substring matched by the
RE.

Python re.sub group of number after word

Try pattern r'\b\w{11}\b'

Ex:

import re

tweet = "any character like as123456789 or 12345678912"
print( re.sub(r'\b\w{11}\b','resi', tweet) )

Output:

any character like resi or resi

re.sub group value

It seems that re.sub(r'(%s\.Value\s*=\s*)([^;]+)' %name, r'\g<1>' + value, content) will do:

  • Remove the space after the backreference as you already capture all whitespace before 3 into Group 1, and
  • Use an unambiguous backreference (\g<n>) that allows using any digits after it.

See the Python demo:

import re
name = 'XYZ'
value = '5'
content = 'XYZ.Value = 5'
print(re.sub(r'(%s\.Value\s*=\s*)([^;]+)' %name, r'\g<1>' + value, content))
# => XYZ.Value = 5

Why does re.sub replace the entire pattern, not just a capturing group within it?

Because it's supposed to replace the whole occurrence of the pattern:

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.

If it were to replace only some subgroup, then complex regexes with several groups wouldn't work. There are several possible solutions:

  1. Specify pattern in full: re.sub('ab', 'ad', 'abc') - my favorite, as it's very readable and explicit.
  2. Capture groups which you want to preserve and then refer to them in the pattern (note that it should be raw string to avoid escaping): re.sub('(a)b', r'\1d', 'abc')
  3. Similar to previous option: provide a callback function as repl argument and make it process the Match object and return required result.
  4. Use lookbehinds/lookaheds, which are not included in the match, but affect matching: re.sub('(?<=a)b', r'd', 'abxb') yields adxb. The ?<= in the beginning of the group says "it's a lookahead".

Getting the match number when passing a function in re.sub

Based on @Barmar's answer, I tried this:

import re

def custom_replace(match, matchcount):
result = 'a' + str(matchcount.i)
matchcount.i += 1
return result

def any_request():
matchcount = lambda: None # an empty "object", see https://stackoverflow.com/questions/19476816/creating-an-empty-object-in-python/37540574#37540574
matchcount.i = 0 # benefit : it's a local variable that we pass to custom_replace "as reference
print(re.sub(r'o', lambda match: custom_replace(match, matchcount), "oh hello wow"))
# a0h hella1 wa2w

any_request()

and it seems to work.

Reason: I was a bit reluctant to use a global variable for this, because I'm using this inside a web framework, in a route function (called any_request() here).

Let's say there are many requests in parallel (in threads), I don't want a global variable to be "mixed" between different calls (since the operations are probably not atomic?)

Using re.sub to replace numeric part of string, with arithmetic manipulation of that number in Python?

You may use a lambda in replacement:

>>> mytext = "testing-1-6-180"
>>> s = re.sub(r'^(\D*\d+\D+)(\d+)', lambda m: m.group(1) + str(int(m.group(2)) + 5), mytext)
>>> print (s)
'testing-1-11-180'

Python re.sub grab a single character from a group

You can pass a function as the replacement to re.sub. The function will be called with a match object as its argument, which you can use to build the replacement string. For your situation, I'd try something like this:

re.sub(regex_pattern, lambda m: "{} {}".format(m.group(1), m.group(2)[0]), text)

Note that I've renamed your str variable to text, as it's a bad idea to use str as a variable name since it's also a builtin type.

re.sub after matching. all instances of a repeated matching group, python

I don't see any reason you can't just use re.sub instead of re.finditer here. Your repl gets applied once for each match, and the result of substituting each pattern with repl in string is returned, which is exactly what you want.

I can't actually run your example, because copying and pasting test gives me a SyntaxError, and copying and pasting ANY_NUMBER_SRCH gives me an error compiling the regex, and I don't want to go down a rabbit hole trying to fix all of your bugs, most of which probably aren't even in your real code. So let me give a simpler example:

>>> test = '3,254,236,948,348.884423 cold things and 8d523c'
>>> pattern = re.compile(r'[\d,]+')
>>> pattern.findall(test) # just to verify that it works
['3,254,236,948,348', '884423', '8', '523']
>>> pattern.sub(lambda match: match.group().replace(',', ''), test)
'3254236948348.884423 cold things and 8d523c'

Obviously your repl function will be a bit more complicated than just removing all of the commas—and you'll probably want to def it out-of-line rather than try to cram it into a lambda. But whatever your rule is, if you an write it as a function that takes a match object and returns the string you want in place of that match object, you can just pass that function to sub.

Using name group in re search and replace

Try this alternative syntax:

re.sub(r"(\d+) (\d+)", r"\g<2>2 \g<1>1", "23 24")

More here: https://docs.python.org/3.7/library/re.html#re.sub

Python - re.sub return pattern rather than replacing

If you just want to extract the numbers, you need to find them, not to replace:

re.findall("GraphImages_([0-9]{2,})", yourstring)[0]
#'99'

In fact, in your case a split may be a better choice:

yourstring.split("_")[1]
#'99'


Related Topics



Leave a reply



Submit