Replacing instances of a character in a string
Strings in python are immutable, so you cannot treat them as a list and assign to indices.
Use .replace()
instead:
line = line.replace(';', ':')
If you need to replace only certain semicolons, you'll need to be more specific. You could use slicing to isolate the section of the string to replace in:
line = line[:10].replace(';', ':') + line[10:]
That'll replace all semi-colons in the first 10 characters of the string.
Replace specific characters within strings
With a regular expression and the function gsub()
:
group <- c("12357e", "12575e", "197e18", "e18947")
group
[1] "12357e" "12575e" "197e18" "e18947"
gsub("e", "", group)
[1] "12357" "12575" "19718" "18947"
What gsub
does here is to replace each occurrence of "e"
with an empty string ""
.
See ?regexp
or gsub
for more help.
Best way to replace multiple characters in a string?
Replacing two characters
I timed all the methods in the current answers along with one extra.
With an input string of abc&def#ghi
and replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace('&', '\&').replace('#', '\#')
.
Timings for each function:
- a) 1000000 loops, best of 3: 1.47 μs per loop
- b) 1000000 loops, best of 3: 1.51 μs per loop
- c) 100000 loops, best of 3: 12.3 μs per loop
- d) 100000 loops, best of 3: 12 μs per loop
- e) 100000 loops, best of 3: 3.27 μs per loop
- f) 1000000 loops, best of 3: 0.817 μs per loop
- g) 100000 loops, best of 3: 3.64 μs per loop
- h) 1000000 loops, best of 3: 0.927 μs per loop
- i) 1000000 loops, best of 3: 0.814 μs per loop
Here are the functions:
def a(text):
chars = ""
for c in chars:
text = text.replace(c, "\\" + c)
def b(text):
for ch in ['&','#']:
if ch in text:
text = text.replace(ch,"\\"+ch)
import re
def c(text):
rx = re.compile('([])')
text = rx.sub(r'\\\1', text)
RX = re.compile('([])')
def d(text):
text = RX.sub(r'\\\1', text)
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('')
def e(text):
esc(text)
def f(text):
text = text.replace('&', '\&').replace('#', '\#')
def g(text):
replacements = {"&": "\&", "#": "\#"}
text = "".join([replacements.get(c, c) for c in text])
def h(text):
text = text.replace('&', r'\&')
text = text.replace('#', r'\#')
def i(text):
text = text.replace('&', r'\&').replace('#', r'\#')
Timed like this:
python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"
Replacing 17 characters
Here's similar code to do the same but with more characters to escape (\`*_{}>#+-.!$):
def a(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)
def b(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)
import re
def c(text):
rx = re.compile('([])')
text = rx.sub(r'\\\1', text)
RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
text = RX.sub(r'\\\1', text)
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
esc(text)
def f(text):
text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')
def g(text):
replacements = {
"\\": "\\\\",
"`": "\`",
"*": "\*",
"_": "\_",
"{": "\{",
"}": "\}",
"[": "\[",
"]": "\]",
"(": "\(",
")": "\)",
">": "\>",
"#": "\#",
"+": "\+",
"-": "\-",
".": "\.",
"!": "\!",
"$": "\$",
}
text = "".join([replacements.get(c, c) for c in text])
def h(text):
text = text.replace('\\', r'\\')
text = text.replace('`', r'\`')
text = text.replace('*', r'\*')
text = text.replace('_', r'\_')
text = text.replace('{', r'\{')
text = text.replace('}', r'\}')
text = text.replace('[', r'\[')
text = text.replace(']', r'\]')
text = text.replace('(', r'\(')
text = text.replace(')', r'\)')
text = text.replace('>', r'\>')
text = text.replace('#', r'\#')
text = text.replace('+', r'\+')
text = text.replace('-', r'\-')
text = text.replace('.', r'\.')
text = text.replace('!', r'\!')
text = text.replace('$', r'\$')
def i(text):
text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')
Here's the results for the same input string abc&def#ghi
:
- a) 100000 loops, best of 3: 6.72 μs per loop
- b) 100000 loops, best of 3: 2.64 μs per loop
- c) 100000 loops, best of 3: 11.9 μs per loop
- d) 100000 loops, best of 3: 4.92 μs per loop
- e) 100000 loops, best of 3: 2.96 μs per loop
- f) 100000 loops, best of 3: 4.29 μs per loop
- g) 100000 loops, best of 3: 4.68 μs per loop
- h) 100000 loops, best of 3: 4.73 μs per loop
- i) 100000 loops, best of 3: 4.24 μs per loop
And with a longer input string (## *Something* and [another] thing in a longer sentence with {more} things to replace$
):
- a) 100000 loops, best of 3: 7.59 μs per loop
- b) 100000 loops, best of 3: 6.54 μs per loop
- c) 100000 loops, best of 3: 16.9 μs per loop
- d) 100000 loops, best of 3: 7.29 μs per loop
- e) 100000 loops, best of 3: 12.2 μs per loop
- f) 100000 loops, best of 3: 5.38 μs per loop
- g) 10000 loops, best of 3: 21.7 μs per loop
- h) 100000 loops, best of 3: 5.7 μs per loop
- i) 100000 loops, best of 3: 5.13 μs per loop
Adding a couple of variants:
def ab(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
text = text.replace(ch,"\\"+ch)
def ba(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
if c in text:
text = text.replace(c, "\\" + c)
With the shorter input:
- ab) 100000 loops, best of 3: 7.05 μs per loop
- ba) 100000 loops, best of 3: 2.4 μs per loop
With the longer input:
- ab) 100000 loops, best of 3: 7.71 μs per loop
- ba) 100000 loops, best of 3: 6.08 μs per loop
So I'm going to use ba
for readability and speed.
Addendum
Prompted by haccks in the comments, one difference between ab
and ba
is the if c in text:
check. Let's test them against two more variants:
def ab_with_check(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)
def ba_without_check(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)
Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.
╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input ║ ab │ ab_with_check │ ba │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │ 4.22 │ 3.45 │ 8.01 │
│ Py3, short ║ 5.54 │ 1.34 │ 1.46 │ 5.34 │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long ║ 9.3 │ 7.15 │ 6.85 │ 8.55 │
│ Py3, long ║ 7.43 │ 4.38 │ 4.41 │ 7.02 │
└────────────╨──────┴───────────────┴──────┴──────────────────┘
We can conclude that:
Those with the check are up to 4x faster than those without the check
ab_with_check
is slightly in the lead on Python 3, butba
(with check) has a greater lead on Python 2However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There's not a huge difference between the slowest on Python 3 and fastest on Python 2!
How do you replace a specific character in a string in R with an existing character?
You can use
str_replace(string=sec, pattern = "^(\\d)-$", "0\\1")
The regex matches
^
- start of string(\d)
- capturing group #1 (\1
in the replacement pattern refers to the string captured in this group): a digit-
- a hyphen$
- end of string.
How to replace specific characters within a string in JavaScript?
You need to move your alert line out of the for loop to the end, and use Array.join() to convert newa
into a string without commas.
Also, it would make sense to move the newa
array declaration inside the swit
function.
Lastly, you should consider also adding an else condition where you just push the exact same current character onto the newa
array if it is not an a
or b
, so the output string length is the same as the original string.
function swit(x) { var newa = []; for (var i = 0; i < x.length; i++) { if (x[i] === 'a') { newa.push('b'); } else if (x[i] === 'b') { newa.push('a'); } else { newa.push(x[i]) } } alert(newa.join(""));}
swit("aaab");swit("aasdfcvbab");
How to replace set or group of characters with string in Python
My knowledge tells me there are 3
different ways of doing this, all of which are shorter than your method:
- Using a
for-loop
- Using a
generator-comprehension
- Using
regular expressions
First, using a for-loop
. This is probably the most straight-forward improvement to your code and essentially just reduces the 5
lines with .replace
on down to 2
:
def replace_all(text, repl):
for c in "aeiou":
text = text.replace(c, repl)
return text
You could also do it in one-line using a generator-comprehension
, combined with the str.join
method. This would be faster (if that is of importance) as it is of complexity O(n)
since we will go through each character and evaluate it once (the first method is complexity O(n^5)
as Python will loop through text
five times for the different replaces).
So, this method is simply:
def replace_all(text, repl):
return ''.join(repl if c in 'aeiou' else c for c in text)
Finally, we can use re.sub
to substitute all of the characters in the set: [aeiou]
with the text repl
. This is the shortest of the solutions and probably what I would recommend:
import re
def replace_all(text, repl):
return re.sub('[aeiou]', repl, text)
As I said at the start, all these methods complete the task so there is no point me providing individual test cases but they do work as seen in this test:
>>> replace_all('hello world', 'x')
'hxllx wxrld'
Update
A new method has been brought to my attention: str.translate
.
>>> {c:'x' for c in 'aeiou'}
{'a': 'x', 'e': 'x', 'i': 'x', 'o': 'x', 'u': 'x'}
>>> 'hello world'.translate({ord(c):'x' for c in 'aeiou'})
'hxllx wxrld'
This method is also O(n)
, so just as efficient as the previous two.
Java ArrayList - Replace a specific letter or character within an ArrayList of Strings?
newList.get(i).replace(".", "");
This doesn't update the list element - you construct the replaced string, but then discard that string.
You could use set
:
newList.set(i, newList.get(i).replace(".", ""));
Or you could use a ListIterator
:
ListIterator<String> it = newList.listIterator();
while (it.hasNext()) {
String s = it.next();
if (s.contains(".")) {
it.set(s.replace(".", ""));
}
}
but a better way would be to use replaceAll
, with no need to loop explicitly:
newList.replaceAll(s -> s.replace(".", ""));
There's no need to check contains
either: replace
won't do anything if the search string is not present.
Related Topics
How to Get Summary Statistics by Group
How to Trim Leading and Trailing White Space
Relative Frequencies/Proportions With Dplyr
Plot Two Graphs in Same Plot in R
Cleaning Up Factor Levels (Collapsing Multiple Levels/Labels)
How to Combine Multiple Conditions to Subset a Data-Frame Using "Or"
Reshape Multiple Value Columns to Wide Format
Finding Local Maxima and Minima
Add Column Which Contains Binned Values of a Numeric Column
Controlling Number of Decimal Digits in Print Output in R
Ggplot'S Qplot Does Not Execute on Sourcing
Dictionary Style Replace Multiple Items
Pull Out P-Values and R-Squared from a Linear Regression
Force R to Stop Plotting Abbreviated Axis Labels (Scientific Notation) - E.G. 1E+00