How to replace only part of the match with python re.sub
re.sub(r'(?:_a)?\.([^.]*)$', r'_suff.\1', "long.file.name.jpg")
?:
starts a non matching group (SO answer), so (?:_a)
is matching the _a
but not enumerating it, the following question mark makes it optional.
So in English, this says, match the ending .<anything>
that follows (or doesn't) the pattern _a
Another way to do this would be to use a lookbehind (see here). Mentioning this because they're super useful, but I didn't know of them for 15 years of doing REs
python re.sub, only replace part of match
You can use substitution groups:
>>> my_string = '<cross_sell id="123" sell_type="456"> --> <cross_sell>'
>>> re.sub(r'(\<[A-Za-z0-9_]+)(\s[A-Za-z0-9_="\s]+)', r"\1", my_string)
'<cross_sell> --> <cross_sell>'
Notice I put the first group (the one you want to keep) in parenthesis and then I kept that in the output by using the "\1"
modifier (first group) in the replacement string.
Why does re.sub replace the entire pattern, not just a capturing group within it?
Because it's supposed to replace the whole occurrence of the pattern:
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.
If it were to replace only some subgroup, then complex regexes with several groups wouldn't work. There are several possible solutions:
- Specify pattern in full:
re.sub('ab', 'ad', 'abc')
- my favorite, as it's very readable and explicit. - Capture groups which you want to preserve and then refer to them in the pattern (note that it should be raw string to avoid escaping):
re.sub('(a)b', r'\1d', 'abc')
- Similar to previous option: provide a callback function as
repl
argument and make it process theMatch
object and return required result. - Use lookbehinds/lookaheds, which are not included in the match, but affect matching:
re.sub('(?<=a)b', r'd', 'abxb')
yieldsadxb
. The?<=
in the beginning of the group says "it's a lookahead".
Using re.sub with capture groups to replace only portion of a match
Use a lookahead to match part of the string without replacing it.
pattern = r'\A\w+(?=[@+\-/*])'
You don't need a capture group when you're just removing the match; it's needed if you need to copy parts of the input text into the result. You also don't need []
around \w
. And you should get rid of the *
after [@+\-/*]
, since you want to require one of those characters.
You should generally use raw strings when creating regular expressions, so that the regexp escape sequences won't be confused for Python escape sequences. And you should escape -
in a character set, otherwise it's used to create a range of characters.
Replacing only the captured group using re.sub and multiple replacements
You can use a lookbehind and lookahead based regex and then a lambda
function to iterate through replacements words:
>>> words = ['Swimming', 'Eating', 'Jogging']
>>> pattern = re.compile(r'(?<=I love )\w+(?=\.)')
>>> print pattern.sub(lambda m: words.pop(0), string)
'I love Swimming. I love Eating. I love Jogging.'
Code Demo
Python re.sub() is replacing the full match even when using non-capturing groups
The general solution for such problems is using a lambda in the replacement:
string = 'aBCDeFGH'
print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', lambda match: '+%s+%s' % (match.group(2), match.group(4)), string))
However, as bro-grammer has commented, you can use backreferences in this case:
print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', r'+\2+\4', string))
Python replace only part of a re.sub match
Use this instead: re.sub("(?<=[^a-zA-Z])pi(?=[^a-zA-Z])", "(math.pi)", "2pi3 + supirse")
Visualization: http://regex101.com/r/fX5wX3
Python Regular Expression; replacing a portion of match
If you want to only remove zeros after letters, you may use:
([a-zA-Z])0+
Replace with \1
backreference. See the regex demo.
The ([a-zA-Z])
will capture a letter and 0+
will match 1 or more zeros.
Python demo:
import re
s = 'e004_n07'
res = re.sub(r'([a-zA-Z])0+', r'\1', s)
print(res)
Note that re.sub
will find and replace all non-overlapping matches (will perform a global search and replace). If there is no match, the string will be returned as is, without modifications. So, there is no need using additional re.match
/re.search
.
UDPATE
To keep 1 zero if the numbers only contain zeros, you may use
import re
s = ['e004_n07','e000_n00']
res = [re.sub(r'(?<=[a-zA-Z])0+(\d*)', lambda m: m.group(1) if m.group(1) else '0', x) for x in s]
print(res)
See the Python demo
Here, r'(?<=[a-zA-Z])0+(\d*)'
regex matches one or more zeros (0+
) that are after an ASCII letter ((?<=[a-zA-Z])
) and then any other digits (0 or more) are captured into Group 1 with (\d*)
. Then, in the replacement, we check if Group 1 is empty, and if it is empty, we insert 0
(there are only zeros), else, we insert Group 1 contents (the remaining digits after the first leading zeros).
python re.sub, only replace part of match
You can use substitution groups:
>>> my_string = '<cross_sell id="123" sell_type="456"> --> <cross_sell>'
>>> re.sub(r'(\<[A-Za-z0-9_]+)(\s[A-Za-z0-9_="\s]+)', r"\1", my_string)
'<cross_sell> --> <cross_sell>'
Notice I put the first group (the one you want to keep) in parenthesis and then I kept that in the output by using the "\1"
modifier (first group) in the replacement string.
How can I replace a string match with part of itself in Python?
Instead of directly using the re.sub()
method, you can use the re.findall()
method to find all substrings (in a non-greedy fashion) that begins and ends with the proper square brackets.
Then, iterate through the matches and use the str.replace()
method to replace each match in the string with the second character in the match:
import re
s = "alEhos[cr@e]sjt"
for m in re.findall("\[.*?\]", s):
s = s.replace(m, m[1])
print(s)
Output:
alEhoscsjt
Related Topics
Aes Python Encryption and Ruby Encryption - Different Behaviour
How to Import a JSON from a File on Cloud Storage to Bigquery
Rally APIs: How to Copy Test Folder and Member Test Cases
Python VS. Ruby for Metaprogramming
Which of These Scripting Languages Is More Appropriate for Pen-Testing
Is There Something Like Bpython for Ruby
Learning Ruby from Python; Differences and Similarities
If Monkey Patching Is Permitted in Both Ruby and Python, Why Is It More Controversial in Ruby
Programmatically Extract Data from an Excel Spreadsheet
What Programming Language Features Are Well Suited for Developing a Live Coding Framework
Swift Playground Error: Module 'Python' Has No Member Named 'Import'
Swift If Or/And Statement Like Python
How to Release Memory Used by a Pandas Dataframe
Different Yaml Array Representations
Find in Files Using Ruby or Python
Pyqt Showing Video Stream from Opencv