Remove Specific Characters from a String in Python

Remove specific characters from a string in Python

Strings in Python are immutable (can't be changed). Because of this, the effect of line.replace(...) is just to create a new string, rather than changing the old one. You need to rebind (assign) it to line in order to have that variable take the new value, with those characters removed.

Also, the way you are doing it is going to be kind of slow, relatively. It's also likely to be a bit confusing to experienced pythonators, who will see a doubly-nested structure and think for a moment that something more complicated is going on.

Starting in Python 2.6 and newer Python 2.x versions *, you can instead use str.translate, (see Python 3 answer below):

line = line.translate(None, '!@#$')

or regular expression replacement with re.sub

import re
line = re.sub('[!@#$]', '', line)

The characters enclosed in brackets constitute a character class. Any characters in line which are in that class are replaced with the second parameter to sub: an empty string.

Python 3 answer

In Python 3, strings are Unicode. You'll have to translate a little differently. kevpie mentions this in a comment on one of the answers, and it's noted in the documentation for str.translate.

When calling the translate method of a Unicode string, you cannot pass the second parameter that we used above. You also can't pass None as the first parameter. Instead, you pass a translation table (usually a dictionary) as the only parameter. This table maps the ordinal values of characters (i.e. the result of calling ord on them) to the ordinal values of the characters which should replace them, or—usefully to us—None to indicate that they should be deleted.

So to do the above dance with a Unicode string you would call something like

translation_table = dict.fromkeys(map(ord, '!@#$'), None)
unicode_line = unicode_line.translate(translation_table)

Here dict.fromkeys and map are used to succinctly generate a dictionary containing

{ord('!'): None, ord('@'): None, ...}

Even simpler, as another answer puts it, create the translation table in place:

unicode_line = unicode_line.translate({ord(c): None for c in '!@#$'})

Or, as brought up by Joseph Lee, create the same translation table with str.maketrans:

unicode_line = unicode_line.translate(str.maketrans('', '', '!@#$'))

* for compatibility with earlier Pythons, you can create a "null" translation table to pass in place of None:

import string
line = line.translate(string.maketrans('', ''), '!@#$')

Here string.maketrans is used to create a translation table, which is just a string containing the characters with ordinal values 0 to 255.

How to remove all characters before a specific character in Python?

Use re.sub. Just match all the chars upto I then replace the matched chars with I.

re.sub(r'^.*?I', 'I', stri)

Remove specific characters from String List - Python

It can be implemented much simpler by directly traversing the file and writing its content to a variable with filtering out unwanted characters.

For example, here is the 'file1.txt' file with the content:

Hello how are you? Very good!

Then we can do the following:

def main():

characters = '!?¿-.:;'

with open('file1.txt') as f:
aux = ''.join(c for c in f.read() if c not in characters)

# print(aux) # Hello how are you Very good

As we see aux is the file's content without unwanted chars and it can be easily edited based on the desired output format.

For example, if we want a list of words, we can do this:

def main():

characters = '!?¿-.:;'

with open('file1.txt') as f:
aux = ''.join(c for c in f.read() if c not in characters)
aux = aux.split()

# print(aux) # ['Hello', 'how', 'are', 'you', 'Very', 'good']

How to remove certain characters from a string? [Python]

Since strings are immutable, use the replace function to reassign cool

cool = "cool°"
cool = cool.replace("°","")
cool
'cool'

How to remove all characters after a specific character in python?

Split on your separator at most once, and take the first piece:

sep = '...'
stripped = text.split(sep, 1)[0]

You didn't say what should happen if the separator isn't present. Both this and Alex's solution will return the entire string in that case.

How to remove special characters from a string before specific character?

You can use

df['NEW_EMAIL'] = df['EMAIL'].str.replace(r'[._-](?=[^@]*@)', '', regex=True)

See the regex demo. Details:

  • [._-] - a ., _ or - char
  • (?=[^@]*@) - a positive lookahead that requires the presence of any zero or more chars other than @ and then a @ char immediately to the right of the current location.

If you need to replace/remove any special char, you should use

df['NEW_EMAIL'] = df['EMAIL'].str.replace(r'[\W_](?=[^@]*@)', '', regex=True)

See a Pandas test:

>>> import pandas as pd
>>> df = pd.DataFrame({'EMAIL':['ab_cd_123@email.com', 'ab_cd.12-3@email.com']})
>>> df['EMAIL'].str.replace(r'[._-](?=[^@]*@)', '', regex=True)
0 abcd123@email.com
1 abcd123@email.com
Name: EMAIL, dtype: object

How do I remove a substring from the end of a string?

strip doesn't mean "remove this substring". x.strip(y) treats y as a set of characters and strips any characters in that set from both ends of x.

On Python 3.9 and newer you can use the removeprefix and removesuffix methods to remove an entire substring from either side of the string:

url = 'abcdc.com'
url.removesuffix('.com') # Returns 'abcdc'
url.removeprefix('abcdc.') # Returns 'com'

The relevant Python Enhancement Proposal is PEP-616.

On Python 3.8 and older you can use endswith and slicing:

url = 'abcdc.com'
if url.endswith('.com'):
url = url[:-4]

Or a regular expression:

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

Pythonic way to remove specific characters from a string

Use re.sub with a character class containing the individual characters you want to remove:

words = re.sub(r'[\[\]_(/]', '', words)

How to remove a character from every string in a list, based on the position where a specific character occurs in the first member of said list?

Solution using zip()

>>> shortened = [*zip(*[t for t in zip(*list_strings) if t[0] != "-"])]
>>> shortened
[('A', 'C', 'T', 'G'), ('A', 'C', 'T', 'A'), ('A', 'G', 'G', 'A'), ('A', 'G', 'G', 'G')]
>>>
>>> new_strings = ["".join(t) for t in shortened]
>>> new_strings
['ACTG', 'ACTA', 'AGGA', 'AGGG']

So, there are plenty of ways to do this, but this particular method zips the gene strings together and filters out the tuples which start with a "-". Think of stacking the four gene strings on top of each other: zip() takes the "columns" of that stack:

>>> [*zip(*list_strings)]
[('A', 'A', 'A', 'A'), ('-', 'T', 'T', 'T'), ('C', 'C', 'G', 'G'), ('-', 'G', 'C', 'C'), ('T', 'T', 'G', 'G'), ('G', 'A', 'A', 'G'), ('-', 'G', 'T', 'T'), ('-', 'C', 'C', 'C')]

After removing the tuples that start with "-", the tuples are zipped back together the other way (think now of taking these tuples and stacking them vertically, then in the same way as before, zip() takes the columns of that stack). Finally, "".join() turns the tuples of characters into strings.

"What am I doing wrong?"

To answer the question "what am I doing wrong?", I've added print statements to your code. Try running this and interpreting the output:

list_strings=["A-C-TG--","ATCGTAGC","ATGCGATC","ATGCGGTC"]
new_list_strings=[]
positions=[i for i, letter in enumerate(list_strings[0]) if letter == "-"]

for string in list_strings:
print(f"string: {string}")
for i in range(len(string)):
print(f" i: {i}")
for pos in positions:
print(f" pos: {pos}")
if i==pos:
string2=string[:i]+string[i+1:]
print(f" match! string2 result: {string2}")
new_list_strings.append(string2)
print()

Notice that for each string, multiple string2 objects are created.

Solution using a plain-Jane accumulator pattern

The barebones accumulator pattern does work for this problem:

list_strings = ["A-C-TG--","ATCGTAGC","ATGCGATC","ATGCGGTC"]
positions = [i for i, letter in enumerate(list_strings[0]) if letter == "-"]

new_list_strings = []
for string in list_strings:
new_str = ""
for idx, char in string:
if idx not in positions:
new_str += char
new_list_strings.append(new_str)


Related Topics



Leave a reply



Submit