Use Python's String.Replace VS Re.Sub

Use Python's string.replace vs re.sub

As long as you can make do with str.replace(), you should use it. It avoids all the pitfalls of regular expressions (like escaping), and is generally faster.

Should I continue to use str.replace over re.sub, if the string manipulation becomes complicated

This question is difficult to answer because it is opinion-based. str.replace is definitely faster. Using timeit in ipython with Python 3.4.2:

In []: %timeit zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')
100000 loops, best of 3: 2.04 µs per loop

In []: %timeit re.sub('dealer.+Radius=10','',zz)
100000 loops, best of 3: 2.83 µs per loop

As Padraic Cunningham pointed out, the difference is even greater in Python 2:

In []: %timeit zz.replace(zz[36:zz.strip('&Log=0').rfind('&')],'')
100000 loops, best of 3: 2 µs per loop

In []: %timeit re.sub('dealer.+Radius=10','',zz)
100000 loops, best of 3: 3.11 µs per loop

Which one is better depends on the program. Generally, for Python, readability is more important than speed (because the standard PEP 8 style is based on the notion that code is read more than written). If speed is vital for the program, the faster option str.replace would be better. Otherwise, the more readable option re.sub would be better.

EDIT

As Anony-Mousse pointed out, using re.compile instead is both faster and more readable than both. (You added that you're using Python 2, but I'll put the Python 3 test first to reflect the order of my other tests above.)

With Python 3:

In []: z_match = re.compile('dealer.+Radius=10')
In []: %timeit z_match.sub('', zz)
1000000 loops, best of 3: 1.36 µs per loop

With Python 2:

In []: z_match = re.compile('dealer.+Radius=10')
In []: %timeit z_match.sub('', zz)
100000 loops, best of 3: 1.68 µs per loop

str.replace() or re.sub() continually until substring no longer present

While the looping solutions are probably the simplest, you can actually write a re.sub call with a custom function to do all the transformations at once.

The key insight for this is that your rule (changing st to ts) will end up moving all ss in a block of mixed ss and ts to the right of all the ts. We can simply count the ss and ts and make an appropriate replacement:

def sub_func(match):
text = match.group(1)
return "t"*text.count("t") + "s"*text.count("s")

re.sub(r'(s[st]*t)', sub_func, text)

Can we use re.sub instead of using str.replace many time

Since you import re package, I think you want to do this in regular expression way.

to_replace = ['%!', '%', '~', '_x000D_', '__', '\\']
to_replace = ")|(".join(map(re.escape, to_replace))
p = [re.sub(f'({to_replace})', '', a[i]) for i in range(len(a)) if a[i] != '']

It's recommend to use re.escape to avoid invalid symbol in regexp.

Replace matched susbtring using re sub

According to the documentation, re.sub is defined as

re.sub(pattern, repl, string, count=0, flags=0)

If repl is a function, it is called for every non-overlapping occurrence of pattern.

This said, if you pass a lambda function, you can remain the code in one line. Furthermore, remember that the matched characters can be accessed easier to an individual group by: x[0].

I removed _ from the regex to reach the desired output.

txt = "/J&L/LK/Tac1_1/shareloc.pdf"
x = re.sub("[^0-9]", lambda x: '.' if x[0] is '_' else '', txt)
print(x)

python regex sub replace the whole string

Match the rest of the string with a .*

import re

s = 'abcdefg'

s = re.sub(r'^abc.*', 'replacement', s)
print(s)

output:

replacement

python re.sub : replace substring with string

You mean this?

>>> import re
>>> m = re.sub(r'10', r'20', "hello number 10, Agosto 19")
>>> m
'hello number 20, Agosto 19'

OR

Using lookbehind,

>>> number = "20"
>>> number
'20'
>>> m = re.sub(r'(?<=number )\d+', number, "hello number 10, Agosto 19")
>>> m
'hello number 20, Agosto 19'

Python re.sub() is replacing the full match even when using non-capturing groups

The general solution for such problems is using a lambda in the replacement:

string = 'aBCDeFGH'

print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', lambda match: '+%s+%s' % (match.group(2), match.group(4)), string))

However, as bro-grammer has commented, you can use backreferences in this case:

print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', r'+\2+\4', string))


Related Topics



Leave a reply



Submit