re.sub replace with matched content
Simply use \1
instead of $1
:
In [1]: import re
In [2]: method = 'images/:id/huge'
In [3]: re.sub(r'(:[a-z]+)', r'<span>\1</span>', method)
Out[3]: 'images/<span>:id</span>/huge'
Also note the use of raw strings (r'...'
) for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.
Replace matched susbtring using re sub
According to the documentation, re.sub
is defined as
re.sub(pattern, repl, string, count=0, flags=0)
If
repl
is a function, it is called for every non-overlapping occurrence of pattern.
This said, if you pass a lambda function, you can remain the code in one line. Furthermore, remember that the matched characters can be accessed easier to an individual group by: x[0]
.
I removed _
from the regex to reach the desired output.
txt = "/J&L/LK/Tac1_1/shareloc.pdf"
x = re.sub("[^0-9]", lambda x: '.' if x[0] is '_' else '', txt)
print(x)
Getting the match number when passing a function in re.sub
Based on @Barmar's answer, I tried this:
import re
def custom_replace(match, matchcount):
result = 'a' + str(matchcount.i)
matchcount.i += 1
return result
def any_request():
matchcount = lambda: None # an empty "object", see https://stackoverflow.com/questions/19476816/creating-an-empty-object-in-python/37540574#37540574
matchcount.i = 0 # benefit : it's a local variable that we pass to custom_replace "as reference
print(re.sub(r'o', lambda match: custom_replace(match, matchcount), "oh hello wow"))
# a0h hella1 wa2w
any_request()
and it seems to work.
Reason: I was a bit reluctant to use a global variable for this, because I'm using this inside a web framework, in a route function (called any_request()
here).
Let's say there are many requests in parallel (in threads), I don't want a global variable to be "mixed" between different calls (since the operations are probably not atomic?)
re.sub() - Replace with text from match without using capture groups?
To answer your first question, re.sub
allows you to use a function instead of a fixed replacement string. E.g.
>>> s = "omglolwtfbbq"
>>> regex = r"l[\w]"
>>> re.sub(regex, lambda x: "!%s!" % x.group(), s)
'omg!lo!!lw!tfbbq'
Note that the .group
method of a match object returns the whole match (whether or not capture groups are present). If you have capture groups, then .groups
returns those captured groups.
To answer your question about colouring specifically, I would recommend taking a look at colorama.
Python re.sub() is replacing the full match even when using non-capturing groups
The general solution for such problems is using a lambda in the replacement:
string = 'aBCDeFGH'
print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', lambda match: '+%s+%s' % (match.group(2), match.group(4)), string))
However, as bro-grammer has commented, you can use backreferences in this case:
print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', r'+\2+\4', string))
python regex sub replace the whole string
Match the rest of the string with a .*
import re
s = 'abcdefg'
s = re.sub(r'^abc.*', 'replacement', s)
print(s)
output:
replacement
Why re.sub() adds not matched string by default in Python?
You seem to have a misunderstanding of what sub does. it substitutes the matching regex. this regex r'(size:)\D+(\d+)\D+(\d+)\D+(\d+)'
matches part of your string and so ONLY THE MATCHING PART will be substituted, the capture groups do not effect this.
what you can do (if you don't want to add .*
in the beginning and the end is to use re.findall
like this
re.findall(
r'(size:)\D+(\d+)\D+(\d+)\D+(\d+)',
'START, size: 100Х200 x 50, END'
)
which will return [('size:', '100', '200', '50')]
, you can then format it as you wish.
one way to do is as one liner with no error handling is like this:
'{1}x{2}x{3}'.format(
*re.findall(
r'(size:)\D+(\d+)\D+(\d+)\D+(\d+)',
'START, size: 100Х200 x 50, END')[0]
)
Using re.sub with capture groups to replace only portion of a match
Use a lookahead to match part of the string without replacing it.
pattern = r'\A\w+(?=[@+\-/*])'
You don't need a capture group when you're just removing the match; it's needed if you need to copy parts of the input text into the result. You also don't need []
around \w
. And you should get rid of the *
after [@+\-/*]
, since you want to require one of those characters.
You should generally use raw strings when creating regular expressions, so that the regexp escape sequences won't be confused for Python escape sequences. And you should escape -
in a character set, otherwise it's used to create a range of characters.
How to replace only part of the match with python re.sub
re.sub(r'(?:_a)?\.([^.]*)$', r'_suff.\1', "long.file.name.jpg")
?:
starts a non matching group (SO answer), so (?:_a)
is matching the _a
but not enumerating it, the following question mark makes it optional.
So in English, this says, match the ending .<anything>
that follows (or doesn't) the pattern _a
Another way to do this would be to use a lookbehind (see here). Mentioning this because they're super useful, but I didn't know of them for 15 years of doing REs
Why does re.sub replace the entire pattern, not just a capturing group within it?
Because it's supposed to replace the whole occurrence of the pattern:
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.
If it were to replace only some subgroup, then complex regexes with several groups wouldn't work. There are several possible solutions:
- Specify pattern in full:
re.sub('ab', 'ad', 'abc')
- my favorite, as it's very readable and explicit. - Capture groups which you want to preserve and then refer to them in the pattern (note that it should be raw string to avoid escaping):
re.sub('(a)b', r'\1d', 'abc')
- Similar to previous option: provide a callback function as
repl
argument and make it process theMatch
object and return required result. - Use lookbehinds/lookaheds, which are not included in the match, but affect matching:
re.sub('(?<=a)b', r'd', 'abxb')
yieldsadxb
. The?<=
in the beginning of the group says "it's a lookahead".
Related Topics
Python Equivalent of Filter() Getting Two Output Lists (I.E. Partition of a List)
Builtin Function Not Working with Spyder
How to Make an Image with a Transparent Backround in Pygame
Is the Shortcircuit Behaviour of Python's Any/All Explicit
How to Redirect the Output of Print to a Txt File
Differencebetween an Opencv Bgr Image and Its Reverse Version Rgb Image[:,:,::-1]
Subsampling Every Nth Entry in a Numpy Array
How to Write String Literals in Python Without Having to Escape Them
Getting the Index of a Row in a Pandas Apply Function
Access to Table Objects on Webpage Using Python Selenium
Sphinx's Autodoc's Automodule Having Apparently No Effect
Schedule a Repeating Event in Python 3
Pandas Sum by Groupby, But Exclude Certain Columns
How to Concatenate Two Layers in Keras
List Comprehension and Lambdas in Python
Optimizing Database Queries in Django Rest Framework
Why Is Adding Attributes to an Already Instantiated Object Allowed