Use Python's string.replace vs re.sub
As long as you can make do with str.replace()
, you should use it. It avoids all the pitfalls of regular expressions (like escaping), and is generally faster.
Should I continue to use str.replace over re.sub, if the string manipulation becomes complicated
This question is difficult to answer because it is opinion-based. str.replace
is definitely faster. Using timeit in ipython
with Python 3.4.2:
In []: %timeit zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')
100000 loops, best of 3: 2.04 µs per loop
In []: %timeit re.sub('dealer.+Radius=10','',zz)
100000 loops, best of 3: 2.83 µs per loop
As Padraic Cunningham
pointed out, the difference is even greater in Python 2:
In []: %timeit zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')
100000 loops, best of 3: 2 µs per loop
In []: %timeit re.sub('dealer.+Radius=10','',zz)
100000 loops, best of 3: 3.11 µs per loop
Which one is better depends on the program. Generally, for Python, readability is more important than speed (because the standard PEP 8 style is based on the notion that code is read more than written). If speed is vital for the program, the faster option str.replace
would be better. Otherwise, the more readable option re.sub
would be better.
EDIT
As Anony-Mousse
pointed out, using re.compile
instead is both faster and more readable than both. (You added that you're using Python 2, but I'll put the Python 3 test first to reflect the order of my other tests above.)
With Python 3:
In []: z_match = re.compile('dealer.+Radius=10')
In []: %timeit z_match.sub('', zz)
1000000 loops, best of 3: 1.36 µs per loop
With Python 2:
In []: z_match = re.compile('dealer.+Radius=10')
In []: %timeit z_match.sub('', zz)
100000 loops, best of 3: 1.68 µs per loop
str.replace() or re.sub() continually until substring no longer present
While the looping solutions are probably the simplest, you can actually write a re.sub
call with a custom function to do all the transformations at once.
The key insight for this is that your rule (changing st
to ts
) will end up moving all s
s in a block of mixed s
s and t
s to the right of all the t
s. We can simply count the s
s and t
s and make an appropriate replacement:
def sub_func(match):
text = match.group(1)
return "t"*text.count("t") + "s"*text.count("s")
re.sub(r'(s[st]*t)', sub_func, text)
Can we use re.sub instead of using str.replace many time
Since you import re
package, I think you want to do this in regular expression way.
to_replace = ['%!', '%', '~', '_x000D_', '__', '\\']
to_replace = ")|(".join(map(re.escape, to_replace))
p = [re.sub(f'({to_replace})', '', a[i]) for i in range(len(a)) if a[i] != '']
It's recommend to use re.escape
to avoid invalid symbol in regexp.
Replace matched susbtring using re sub
According to the documentation, re.sub
is defined as
re.sub(pattern, repl, string, count=0, flags=0)
If
repl
is a function, it is called for every non-overlapping occurrence of pattern.
This said, if you pass a lambda function, you can remain the code in one line. Furthermore, remember that the matched characters can be accessed easier to an individual group by: x[0]
.
I removed _
from the regex to reach the desired output.
txt = "/J&L/LK/Tac1_1/shareloc.pdf"
x = re.sub("[^0-9]", lambda x: '.' if x[0] is '_' else '', txt)
print(x)
python regex sub replace the whole string
Match the rest of the string with a .*
import re
s = 'abcdefg'
s = re.sub(r'^abc.*', 'replacement', s)
print(s)
output:
replacement
python re.sub : replace substring with string
You mean this?
>>> import re
>>> m = re.sub(r'10', r'20', "hello number 10, Agosto 19")
>>> m
'hello number 20, Agosto 19'
OR
Using lookbehind,
>>> number = "20"
>>> number
'20'
>>> m = re.sub(r'(?<=number )\d+', number, "hello number 10, Agosto 19")
>>> m
'hello number 20, Agosto 19'
Python re.sub() is replacing the full match even when using non-capturing groups
The general solution for such problems is using a lambda in the replacement:
string = 'aBCDeFGH'
print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', lambda match: '+%s+%s' % (match.group(2), match.group(4)), string))
However, as bro-grammer has commented, you can use backreferences in this case:
print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', r'+\2+\4', string))
Related Topics
What Determines Which Strings Are Interned and When
When Are Objects Garbage Collected in Python
Tkinter Vanishing Photoimage Issue
Differencebetween .Quit and .Quit in Pygame
How to Catch a Numpy Warning Like It's an Exception (Not Just for Testing)
How to Normalize a Numpy Array to a Unit Vector
Progress Indicator During Pandas Operations
Ssl Insecureplatform Error When Using Requests Package
Adding a Legend to Pyplot in Matplotlib in the Simplest Manner Possible
Super() Raises "Typeerror: Must Be Type, Not Classobj" for New-Style Class
How to Parse a Website Using Selenium and Beautifulsoup in Python
A Fast Way to Find the Largest N Elements in an Numpy Array
Syntaxerror: Multiple Statements Found While Compiling a Single Statement