Remove specific characters from a string in Python
Strings in Python are immutable (can't be changed). Because of this, the effect of line.replace(...)
is just to create a new string, rather than changing the old one. You need to rebind (assign) it to line
in order to have that variable take the new value, with those characters removed.
Also, the way you are doing it is going to be kind of slow, relatively. It's also likely to be a bit confusing to experienced pythonators, who will see a doubly-nested structure and think for a moment that something more complicated is going on.
Starting in Python 2.6 and newer Python 2.x versions *, you can instead use str.translate
, (see Python 3 answer below):
line = line.translate(None, '!@#$')
or regular expression replacement with re.sub
import re
line = re.sub('[!@#$]', '', line)
The characters enclosed in brackets constitute a character class. Any characters in line
which are in that class are replaced with the second parameter to sub
: an empty string.
Python 3 answer
In Python 3, strings are Unicode. You'll have to translate a little differently. kevpie mentions this in a comment on one of the answers, and it's noted in the documentation for str.translate
.
When calling the translate
method of a Unicode string, you cannot pass the second parameter that we used above. You also can't pass None
as the first parameter. Instead, you pass a translation table (usually a dictionary) as the only parameter. This table maps the ordinal values of characters (i.e. the result of calling ord
on them) to the ordinal values of the characters which should replace them, or—usefully to us—None
to indicate that they should be deleted.
So to do the above dance with a Unicode string you would call something like
translation_table = dict.fromkeys(map(ord, '!@#$'), None)
unicode_line = unicode_line.translate(translation_table)
Here dict.fromkeys
and map
are used to succinctly generate a dictionary containing
{ord('!'): None, ord('@'): None, ...}
Even simpler, as another answer puts it, create the translation table in place:
unicode_line = unicode_line.translate({ord(c): None for c in '!@#$'})
Or, as brought up by Joseph Lee, create the same translation table with str.maketrans
:
unicode_line = unicode_line.translate(str.maketrans('', '', '!@#$'))
* for compatibility with earlier Pythons, you can create a "null" translation table to pass in place of None
:
import string
line = line.translate(string.maketrans('', ''), '!@#$')
Here string.maketrans
is used to create a translation table, which is just a string containing the characters with ordinal values 0 to 255.
How to remove all characters before a specific character in Python?
Use re.sub
. Just match all the chars upto I
then replace the matched chars with I
.
re.sub(r'^.*?I', 'I', stri)
Remove specific characters from String List - Python
It can be implemented much simpler by directly traversing the file and writing its content to a variable with filtering out unwanted characters.
For example, here is the 'file1.txt'
file with the content:
Hello how are you? Very good!
Then we can do the following:
def main():
characters = '!?¿-.:;'
with open('file1.txt') as f:
aux = ''.join(c for c in f.read() if c not in characters)
# print(aux) # Hello how are you Very good
As we see aux
is the file's content without unwanted chars and it can be easily edited based on the desired output format.
For example, if we want a list of words, we can do this:
def main():
characters = '!?¿-.:;'
with open('file1.txt') as f:
aux = ''.join(c for c in f.read() if c not in characters)
aux = aux.split()
# print(aux) # ['Hello', 'how', 'are', 'you', 'Very', 'good']
How to remove certain characters from a string? [Python]
Since strings are immutable, use the replace function to reassign cool
cool = "cool°"
cool = cool.replace("°","")
cool
'cool'
How to remove all characters after a specific character in python?
Split on your separator at most once, and take the first piece:
sep = '...'
stripped = text.split(sep, 1)[0]
You didn't say what should happen if the separator isn't present. Both this and Alex's solution will return the entire string in that case.
How to remove special characters from a string before specific character?
You can use
df['NEW_EMAIL'] = df['EMAIL'].str.replace(r'[._-](?=[^@]*@)', '', regex=True)
See the regex demo. Details:
[._-]
- a.
,_
or-
char(?=[^@]*@)
- a positive lookahead that requires the presence of any zero or more chars other than@
and then a@
char immediately to the right of the current location.
If you need to replace/remove any special char, you should use
df['NEW_EMAIL'] = df['EMAIL'].str.replace(r'[\W_](?=[^@]*@)', '', regex=True)
See a Pandas test:
>>> import pandas as pd
>>> df = pd.DataFrame({'EMAIL':['ab_cd_123@email.com', 'ab_cd.12-3@email.com']})
>>> df['EMAIL'].str.replace(r'[._-](?=[^@]*@)', '', regex=True)
0 abcd123@email.com
1 abcd123@email.com
Name: EMAIL, dtype: object
How do I remove a substring from the end of a string?
strip
doesn't mean "remove this substring". x.strip(y)
treats y
as a set of characters and strips any characters in that set from both ends of x
.
On Python 3.9 and newer you can use the removeprefix
and removesuffix
methods to remove an entire substring from either side of the string:
url = 'abcdc.com'
url.removesuffix('.com') # Returns 'abcdc'
url.removeprefix('abcdc.') # Returns 'com'
The relevant Python Enhancement Proposal is PEP-616.
On Python 3.8 and older you can use endswith
and slicing:
url = 'abcdc.com'
if url.endswith('.com'):
url = url[:-4]
Or a regular expression:
import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)
Pythonic way to remove specific characters from a string
Use re.sub
with a character class containing the individual characters you want to remove:
words = re.sub(r'[\[\]_(/]', '', words)
How to remove a character from every string in a list, based on the position where a specific character occurs in the first member of said list?
Solution using zip()
>>> shortened = [*zip(*[t for t in zip(*list_strings) if t[0] != "-"])]
>>> shortened
[('A', 'C', 'T', 'G'), ('A', 'C', 'T', 'A'), ('A', 'G', 'G', 'A'), ('A', 'G', 'G', 'G')]
>>>
>>> new_strings = ["".join(t) for t in shortened]
>>> new_strings
['ACTG', 'ACTA', 'AGGA', 'AGGG']
So, there are plenty of ways to do this, but this particular method zips the gene strings together and filters out the tuples which start with a "-"
. Think of stacking the four gene strings on top of each other: zip()
takes the "columns" of that stack:
>>> [*zip(*list_strings)]
[('A', 'A', 'A', 'A'), ('-', 'T', 'T', 'T'), ('C', 'C', 'G', 'G'), ('-', 'G', 'C', 'C'), ('T', 'T', 'G', 'G'), ('G', 'A', 'A', 'G'), ('-', 'G', 'T', 'T'), ('-', 'C', 'C', 'C')]
After removing the tuples that start with "-"
, the tuples are zipped back together the other way (think now of taking these tuples and stacking them vertically, then in the same way as before, zip()
takes the columns of that stack). Finally, "".join()
turns the tuples of characters into strings.
"What am I doing wrong?"
To answer the question "what am I doing wrong?", I've added print statements to your code. Try running this and interpreting the output:
list_strings=["A-C-TG--","ATCGTAGC","ATGCGATC","ATGCGGTC"]
new_list_strings=[]
positions=[i for i, letter in enumerate(list_strings[0]) if letter == "-"]
for string in list_strings:
print(f"string: {string}")
for i in range(len(string)):
print(f" i: {i}")
for pos in positions:
print(f" pos: {pos}")
if i==pos:
string2=string[:i]+string[i+1:]
print(f" match! string2 result: {string2}")
new_list_strings.append(string2)
print()
Notice that for each string
, multiple string2
objects are created.
Solution using a plain-Jane accumulator pattern
The barebones accumulator pattern does work for this problem:
list_strings = ["A-C-TG--","ATCGTAGC","ATGCGATC","ATGCGGTC"]
positions = [i for i, letter in enumerate(list_strings[0]) if letter == "-"]
new_list_strings = []
for string in list_strings:
new_str = ""
for idx, char in string:
if idx not in positions:
new_str += char
new_list_strings.append(new_str)
Related Topics
Python Dictionary Comprehension
How to Get the Source Code of a Python Function
Using Numpy to Build an Array of All Combinations of Two Arrays
How to Make One Python File Run Another
Why Can't Python'S Raw String Literals End With a Single Backslash
Determine the Type of an Object
Why Is "Except: Pass" a Bad Programming Practice
Valueerror: Invalid Literal For Int() With Base 10: ''
Bare Asterisk in Function Arguments
What Does the "At" (@) Symbol Do in Python