Removing a List of Characters in String

Removing a list of characters in string

If you're using python2 and your inputs are strings (not unicodes), the absolutely best method is str.translate:

>>> chars_to_remove = ['.', '!', '?']
>>> subj = 'A.B!C?'
>>> subj.translate(None, ''.join(chars_to_remove))
'ABC'

Otherwise, there are following options to consider:

A. Iterate the subject char by char, omit unwanted characters and join the resulting list:

>>> sc = set(chars_to_remove)
>>> ''.join([c for c in subj if c not in sc])
'ABC'

(Note that the generator version ''.join(c for c ...) will be less efficient).

B. Create a regular expression on the fly and re.sub with an empty string:

>>> import re
>>> rx = '[' + re.escape(''.join(chars_to_remove)) + ']'
>>> re.sub(rx, '', subj)
'ABC'

(re.escape ensures that characters like ^ or ] won't break the regular expression).

C. Use the mapping variant of translate:

>>> chars_to_remove = [u'δ', u'Γ', u'ж']
>>> subj = u'AжBδCΓ'
>>> dd = {ord(c):None for c in chars_to_remove}
>>> subj.translate(dd)
u'ABC'

Full testing code and timings:

#coding=utf8

import re

def remove_chars_iter(subj, chars):
sc = set(chars)
return ''.join([c for c in subj if c not in sc])

def remove_chars_re(subj, chars):
return re.sub('[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_re_unicode(subj, chars):
return re.sub(u'(?u)[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_translate_bytes(subj, chars):
return subj.translate(None, ''.join(chars))

def remove_chars_translate_unicode(subj, chars):
d = {ord(c):None for c in chars}
return subj.translate(d)

import timeit, sys

def profile(f):
assert f(subj, chars_to_remove) == test
t = timeit.timeit(lambda: f(subj, chars_to_remove), number=1000)
print ('{0:.3f} {1}'.format(t, f.__name__))

print (sys.version)
PYTHON2 = sys.version_info[0] == 2

print ('\n"plain" string:\n')

chars_to_remove = ['.', '!', '?']
subj = 'A.B!C?' * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)
profile(remove_chars_re)

if PYTHON2:
profile(remove_chars_translate_bytes)
else:
profile(remove_chars_translate_unicode)

print ('\nunicode string:\n')

if PYTHON2:
chars_to_remove = [u'δ', u'Γ', u'ж']
subj = u'AжBδCΓ'
else:
chars_to_remove = ['δ', 'Γ', 'ж']
subj = 'AжBδCΓ'

subj = subj * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)

if PYTHON2:
profile(remove_chars_re_unicode)
else:
profile(remove_chars_re)

profile(remove_chars_translate_unicode)

Results:

2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]

"plain" string:

0.637 remove_chars_iter
0.649 remove_chars_re
0.010 remove_chars_translate_bytes

unicode string:

0.866 remove_chars_iter
0.680 remove_chars_re_unicode
1.373 remove_chars_translate_unicode

---

3.4.2 (v3.4.2:ab2c023a9432, Oct 5 2014, 20:42:22)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

"plain" string:

0.512 remove_chars_iter
0.574 remove_chars_re
0.765 remove_chars_translate_unicode

unicode string:

0.817 remove_chars_iter
0.686 remove_chars_re
0.876 remove_chars_translate_unicode

(As a side note, the figure for remove_chars_translate_bytes might give us a clue why the industry was reluctant to adopt Unicode for such a long time).

Remove specific characters from a string in Python

Strings in Python are immutable (can't be changed). Because of this, the effect of line.replace(...) is just to create a new string, rather than changing the old one. You need to rebind (assign) it to line in order to have that variable take the new value, with those characters removed.

Also, the way you are doing it is going to be kind of slow, relatively. It's also likely to be a bit confusing to experienced pythonators, who will see a doubly-nested structure and think for a moment that something more complicated is going on.

Starting in Python 2.6 and newer Python 2.x versions *, you can instead use str.translate, (see Python 3 answer below):

line = line.translate(None, '!@#$')

or regular expression replacement with re.sub

import re
line = re.sub('[!@#$]', '', line)

The characters enclosed in brackets constitute a character class. Any characters in line which are in that class are replaced with the second parameter to sub: an empty string.

Python 3 answer

In Python 3, strings are Unicode. You'll have to translate a little differently. kevpie mentions this in a comment on one of the answers, and it's noted in the documentation for str.translate.

When calling the translate method of a Unicode string, you cannot pass the second parameter that we used above. You also can't pass None as the first parameter. Instead, you pass a translation table (usually a dictionary) as the only parameter. This table maps the ordinal values of characters (i.e. the result of calling ord on them) to the ordinal values of the characters which should replace them, or—usefully to us—None to indicate that they should be deleted.

So to do the above dance with a Unicode string you would call something like

translation_table = dict.fromkeys(map(ord, '!@#$'), None)
unicode_line = unicode_line.translate(translation_table)

Here dict.fromkeys and map are used to succinctly generate a dictionary containing

{ord('!'): None, ord('@'): None, ...}

Even simpler, as another answer puts it, create the translation table in place:

unicode_line = unicode_line.translate({ord(c): None for c in '!@#$'})

Or, as brought up by Joseph Lee, create the same translation table with str.maketrans:

unicode_line = unicode_line.translate(str.maketrans('', '', '!@#$'))

* for compatibility with earlier Pythons, you can create a "null" translation table to pass in place of None:

import string
line = line.translate(string.maketrans('', ''), '!@#$')

Here string.maketrans is used to create a translation table, which is just a string containing the characters with ordinal values 0 to 255.

Removing a character from a string in a list of lists

strings are immutable , and as such item.replace('*','') returns back the string with the replaced characters, it does not replace them inplace (it cannot , since strings are immutable) . you can enumerate over your lists, and then assign the returned string back to the list -

Example -

for lst in testList:
for j, item in enumerate(lst):
lst[j] = item.replace('*', '')

You can also do this easily with a list comprehension -

testList = [[item.replace('*', '') for item in lst] for lst in testList]

Remove specific characters from String List - Python

It can be implemented much simpler by directly traversing the file and writing its content to a variable with filtering out unwanted characters.

For example, here is the 'file1.txt' file with the content:

Hello how are you? Very good!

Then we can do the following:

def main():

characters = '!?¿-.:;'

with open('file1.txt') as f:
aux = ''.join(c for c in f.read() if c not in characters)

# print(aux) # Hello how are you Very good

As we see aux is the file's content without unwanted chars and it can be easily edited based on the desired output format.

For example, if we want a list of words, we can do this:

def main():

characters = '!?¿-.:;'

with open('file1.txt') as f:
aux = ''.join(c for c in f.read() if c not in characters)
aux = aux.split()

# print(aux) # ['Hello', 'how', 'are', 'you', 'Very', 'good']

Removing character in list of strings

Try this:

lst = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
print([s.strip('8') for s in lst]) # remove the 8 from the string borders
print([s.replace('8', '') for s in lst]) # remove all the 8s

How can I remove special characters from a list of elements in python?

Use the str.translate() method to apply the same translation table to all strings:

removetable = str.maketrans('', '', '@#%')
out_list = [s.translate(removetable) for s in my_list]

The str.maketrans() static method is a helpful tool to produce the translation map; the first two arguments are empty strings because you are not replacing characters, only removing. The third string holds all characters you want to remove.

Demo:

>>> my_list = ["on@3", "two#", "thre%e"]
>>> removetable = str.maketrans('', '', '@#%')
>>> [s.translate(removetable) for s in my_list]
['on3', 'two', 'three']

How to remove a character from every string in a list, based on the position where a specific character occurs in the first member of said list?

Solution using zip()

>>> shortened = [*zip(*[t for t in zip(*list_strings) if t[0] != "-"])]
>>> shortened
[('A', 'C', 'T', 'G'), ('A', 'C', 'T', 'A'), ('A', 'G', 'G', 'A'), ('A', 'G', 'G', 'G')]
>>>
>>> new_strings = ["".join(t) for t in shortened]
>>> new_strings
['ACTG', 'ACTA', 'AGGA', 'AGGG']

So, there are plenty of ways to do this, but this particular method zips the gene strings together and filters out the tuples which start with a "-". Think of stacking the four gene strings on top of each other: zip() takes the "columns" of that stack:

>>> [*zip(*list_strings)]
[('A', 'A', 'A', 'A'), ('-', 'T', 'T', 'T'), ('C', 'C', 'G', 'G'), ('-', 'G', 'C', 'C'), ('T', 'T', 'G', 'G'), ('G', 'A', 'A', 'G'), ('-', 'G', 'T', 'T'), ('-', 'C', 'C', 'C')]

After removing the tuples that start with "-", the tuples are zipped back together the other way (think now of taking these tuples and stacking them vertically, then in the same way as before, zip() takes the columns of that stack). Finally, "".join() turns the tuples of characters into strings.

"What am I doing wrong?"

To answer the question "what am I doing wrong?", I've added print statements to your code. Try running this and interpreting the output:

list_strings=["A-C-TG--","ATCGTAGC","ATGCGATC","ATGCGGTC"]
new_list_strings=[]
positions=[i for i, letter in enumerate(list_strings[0]) if letter == "-"]

for string in list_strings:
print(f"string: {string}")
for i in range(len(string)):
print(f" i: {i}")
for pos in positions:
print(f" pos: {pos}")
if i==pos:
string2=string[:i]+string[i+1:]
print(f" match! string2 result: {string2}")
new_list_strings.append(string2)
print()

Notice that for each string, multiple string2 objects are created.

Solution using a plain-Jane accumulator pattern

The barebones accumulator pattern does work for this problem:

list_strings = ["A-C-TG--","ATCGTAGC","ATGCGATC","ATGCGGTC"]
positions = [i for i, letter in enumerate(list_strings[0]) if letter == "-"]

new_list_strings = []
for string in list_strings:
new_str = ""
for idx, char in string:
if idx not in positions:
new_str += char
new_list_strings.append(new_str)

how to remove first x characters from a string in a list

When doing i [7:] you are not actually editing the element of the list, you are just computing a new string, without the first 7 characters, and not doing anything with it.

You can do this instead :

>>> lst = [e[7:] for e in lst]
>>> lst
['something', 'smthelse']

This will loop on the elements of your array, and remove the characters from the beginning, as expected.



Related Topics



Leave a reply



Submit