How to Replace Two Things at Once in a String

How to replace multiple substrings of a string?

Here is a short example that should do the trick with regular expressions:

import re

rep = {"condition1": "", "condition2": "text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)

For example:

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

Replace multiple strings with multiple other strings

As an answer to:

looking for an up-to-date answer

If you are using "words" as in your current example, you might extend the answer of Ben McCormick using a non capture group and add word boundaries \b at the left and at the right to prevent partial matches.

\b(?:cathy|cat|catch)\b
  • \b A word boundary to prevent a partial match
  • (?: Non capture group
    • cathy|cat|catch match one of the alternatives
  • ) Close non capture group
  • \b A word boundary to prevent a partial match

Example for the original question:

let str = "I have a cat, a dog, and a goat.";
const mapObj = {
cat: "dog",
dog: "goat",
goat: "cat"
};
str = str.replace(/\b(?:cat|dog|goat)\b/gi, matched => mapObj[matched]);
console.log(str);

How to use str.replace to replace multiple pairs at once?

You can create a dictionary and pass it to the function replace() without needing to chain or name the function so many times.

replacers = {',':'','.':'','-':'','ltd':'limited'} #etc....
df1['CompanyA'] = df1['CompanyA'].replace(replacers)

How to replace two things at once in a string?

When you need to swap variables, say x and y, a common pattern is to introduce a temporary variable t to help with the swap: t = x; x = y; y = t.

The same pattern can also be used with strings:

>>> # swap a with b
>>> 'obama'.replace('a', '%temp%').replace('b', 'a').replace('%temp%', 'b')
'oabmb'

This technique isn't new. It is described in PEP 378 as a way to convert between American and European style decimal separators and thousands separators (for example from 1,234,567.89 to 1.234.567,89. Guido has endorsed this as a reasonable technique.

Best way to replace multiple characters in a string?

Replacing two characters

I timed all the methods in the current answers along with one extra.

With an input string of abc&def#ghi and replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace('&', '\&').replace('#', '\#').

Timings for each function:

  • a) 1000000 loops, best of 3: 1.47 μs per loop
  • b) 1000000 loops, best of 3: 1.51 μs per loop
  • c) 100000 loops, best of 3: 12.3 μs per loop
  • d) 100000 loops, best of 3: 12 μs per loop
  • e) 100000 loops, best of 3: 3.27 μs per loop
  • f) 1000000 loops, best of 3: 0.817 μs per loop
  • g) 100000 loops, best of 3: 3.64 μs per loop
  • h) 1000000 loops, best of 3: 0.927 μs per loop
  • i) 1000000 loops, best of 3: 0.814 μs per loop

Here are the functions:

def a(text):
chars = "&#"
for c in chars:
text = text.replace(c, "\\" + c)

def b(text):
for ch in ['&','#']:
if ch in text:
text = text.replace(ch,"\\"+ch)

import re
def c(text):
rx = re.compile('([&#])')
text = rx.sub(r'\\\1', text)

RX = re.compile('([&#])')
def d(text):
text = RX.sub(r'\\\1', text)

def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
esc(text)

def f(text):
text = text.replace('&', '\&').replace('#', '\#')

def g(text):
replacements = {"&": "\&", "#": "\#"}
text = "".join([replacements.get(c, c) for c in text])

def h(text):
text = text.replace('&', r'\&')
text = text.replace('#', r'\#')

def i(text):
text = text.replace('&', r'\&').replace('#', r'\#')

Timed like this:

python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"

Replacing 17 characters

Here's similar code to do the same but with more characters to escape (\`*_{}>#+-.!$):

def a(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)

def b(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)

import re
def c(text):
rx = re.compile('([&#])')
text = rx.sub(r'\\\1', text)

RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
text = RX.sub(r'\\\1', text)

def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
esc(text)

def f(text):
text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')

def g(text):
replacements = {
"\\": "\\\\",
"`": "\`",
"*": "\*",
"_": "\_",
"{": "\{",
"}": "\}",
"[": "\[",
"]": "\]",
"(": "\(",
")": "\)",
">": "\>",
"#": "\#",
"+": "\+",
"-": "\-",
".": "\.",
"!": "\!",
"$": "\$",
}
text = "".join([replacements.get(c, c) for c in text])

def h(text):
text = text.replace('\\', r'\\')
text = text.replace('`', r'\`')
text = text.replace('*', r'\*')
text = text.replace('_', r'\_')
text = text.replace('{', r'\{')
text = text.replace('}', r'\}')
text = text.replace('[', r'\[')
text = text.replace(']', r'\]')
text = text.replace('(', r'\(')
text = text.replace(')', r'\)')
text = text.replace('>', r'\>')
text = text.replace('#', r'\#')
text = text.replace('+', r'\+')
text = text.replace('-', r'\-')
text = text.replace('.', r'\.')
text = text.replace('!', r'\!')
text = text.replace('$', r'\$')

def i(text):
text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')

Here's the results for the same input string abc&def#ghi:

  • a) 100000 loops, best of 3: 6.72 μs per loop
  • b) 100000 loops, best of 3: 2.64 μs per loop
  • c) 100000 loops, best of 3: 11.9 μs per loop
  • d) 100000 loops, best of 3: 4.92 μs per loop
  • e) 100000 loops, best of 3: 2.96 μs per loop
  • f) 100000 loops, best of 3: 4.29 μs per loop
  • g) 100000 loops, best of 3: 4.68 μs per loop
  • h) 100000 loops, best of 3: 4.73 μs per loop
  • i) 100000 loops, best of 3: 4.24 μs per loop

And with a longer input string (## *Something* and [another] thing in a longer sentence with {more} things to replace$):

  • a) 100000 loops, best of 3: 7.59 μs per loop
  • b) 100000 loops, best of 3: 6.54 μs per loop
  • c) 100000 loops, best of 3: 16.9 μs per loop
  • d) 100000 loops, best of 3: 7.29 μs per loop
  • e) 100000 loops, best of 3: 12.2 μs per loop
  • f) 100000 loops, best of 3: 5.38 μs per loop
  • g) 10000 loops, best of 3: 21.7 μs per loop
  • h) 100000 loops, best of 3: 5.7 μs per loop
  • i) 100000 loops, best of 3: 5.13 μs per loop

Adding a couple of variants:

def ab(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
text = text.replace(ch,"\\"+ch)

def ba(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
if c in text:
text = text.replace(c, "\\" + c)

With the shorter input:

  • ab) 100000 loops, best of 3: 7.05 μs per loop
  • ba) 100000 loops, best of 3: 2.4 μs per loop

With the longer input:

  • ab) 100000 loops, best of 3: 7.71 μs per loop
  • ba) 100000 loops, best of 3: 6.08 μs per loop

So I'm going to use ba for readability and speed.

Addendum

Prompted by haccks in the comments, one difference between ab and ba is the if c in text: check. Let's test them against two more variants:

def ab_with_check(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)

def ba_without_check(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)

Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.

╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input ║ ab │ ab_with_check │ ba │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │ 4.22 │ 3.45 │ 8.01 │
│ Py3, short ║ 5.54 │ 1.34 │ 1.46 │ 5.34 │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long ║ 9.3 │ 7.15 │ 6.85 │ 8.55 │
│ Py3, long ║ 7.43 │ 4.38 │ 4.41 │ 7.02 │
└────────────╨──────┴───────────────┴──────┴──────────────────┘

We can conclude that:

  • Those with the check are up to 4x faster than those without the check

  • ab_with_check is slightly in the lead on Python 3, but ba (with check) has a greater lead on Python 2

  • However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There's not a huge difference between the slowest on Python 3 and fastest on Python 2!

Replace multiple characters in one replace call

If you want to replace multiple characters you can call the String.prototype.replace() with the replacement argument being a function that gets called for each match. All you need is an object representing the character mapping that you will use in that function.

For example, if you want a replaced with x, b with y, and c with z, you can do something like this:

const chars = {'a':'x','b':'y','c':'z'};
let s = '234abc567bbbbac';
s = s.replace(/[abc]/g, m => chars[m]);
console.log(s);

Output: 234xyz567yyyyxz

How can I replace multiple characters in a string?

Doesn't seem this even needs regex. Just 2 chained replacements would do.

var str = '[T] and [Z] but not [T] and [Z]';var result = str.replace('T',' ').replace('Z','');console.log(result);

Java Replacing multiple different substring in a string at once (or in the most efficient way)

If the string you are operating on is very long, or you are operating on many strings, then it could be worthwhile using a java.util.regex.Matcher (this requires time up-front to compile, so it won't be efficient if your input is very small or your search pattern changes frequently).

Below is a full example, based on a list of tokens taken from a map. (Uses StringUtils from Apache Commons Lang).

Map<String,String> tokens = new HashMap<String,String>();
tokens.put("cat", "Garfield");
tokens.put("beverage", "coffee");

String template = "%cat% really needs some %beverage%.";

// Create pattern of the format "%(cat|beverage)%"
String patternString = "%(" + StringUtils.join(tokens.keySet(), "|") + ")%";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(template);

StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, tokens.get(matcher.group(1)));
}
matcher.appendTail(sb);

System.out.println(sb.toString());

Once the regular expression is compiled, scanning the input string is generally very quick (although if your regular expression is complex or involves backtracking then you would still need to benchmark in order to confirm this!)

Most Efficient Way to Replace Multiple Characters in a String

Apart from the str.translate method, you could simply build a translation dict and run it yourself.

s1 = 'ACDCADBCDBABDCBDAACDCADCDAB'

def str_translate_method(s1):
try:
translationdict = str.maketrans("BC","CB")
except AttributeError: # python2
import string
translationdict = string.maketrans("BC","CB")
result = s1.translate(translationdict)
return result

def dict_method(s1):
from, to = "BC", "CB"
translationdict = dict(zip(from, to))
result = ' '.join([translationdict.get(c, c) for c in s1])
return result

replace multiple values at the same time - in order to convert a string to a number

I can think of two approaches.

The first is to use a bunch of nested replace() statements:

select replace(replace(replace(col, '$', ''), '£', ''), 'n/a', '')

and so on.

The second is to find the first digit and try converting from there. This requires complicated logic with patindex(). Here is an example:

select cast(left(substring(col, patindex('%[0-9]%', col), 1000),
patindex('%[^0-9]%', substring(col, patindex('%[0-9]%', col), 1000)) - 1
) as int)


Related Topics



Leave a reply



Submit