Python String.Strip Stripping Too Many Characters

Python string.strip stripping too many characters

strip() removes all the leading and trailing characters from the input string that match one of the characters in the parameter string:

>>> "abcdefabcdefabc".strip("cba")
'defabcdef'

You want to use a regex: table_name = re.sub(r"\.csv$", "", name) or os.paths path manipulation functions:

>>> table_name, extension = os.path.splitext("movies.csv")
>>> table_name
'movies'
>>> extension
'.csv'

Strip removing more characters than expected

str.lstrip removes all the characters in its argument from the string, starting at the left. Since all the characters in the left prefix "REFPROP-MIX:ME" are in the argument "REFPROP-MIX:", all those characters are removed. Likewise:

>>> s = 'abcadef'
>>> s.lstrip('abc')
'def'
>>> s.lstrip('cba')
'def'
>>> s.lstrip('bacabacabacabaca')
'def'

str.lstrip does not remove whole strings (of length greater than 1) from the left. If you want to do that, use a regular expression with an anchor ^ at the beginning:

>>> import re
>>> s = 'REFPROP-MIX:METHANOL&WATER'
>>> re.sub(r'^REFPROP-MIX:', '', s)
'METHANOL&WATER'

Python strip() multiple characters?

I did a time test here, using each method 100000 times in a loop. The results surprised me. (The results still surprise me after editing them in response to valid criticism in the comments.)

Here's the script:

import timeit

bad_chars = '(){}<>'

setup = """import re
import string
s = 'Barack (of Washington)'
bad_chars = '(){}<>'
rgx = re.compile('[%s]' % bad_chars)"""

timer = timeit.Timer('o = "".join(c for c in s if c not in bad_chars)', setup=setup)
print "List comprehension: ", timer.timeit(100000)

timer = timeit.Timer("o= rgx.sub('', s)", setup=setup)
print "Regular expression: ", timer.timeit(100000)

timer = timeit.Timer('for c in bad_chars: s = s.replace(c, "")', setup=setup)
print "Replace in loop: ", timer.timeit(100000)

timer = timeit.Timer('s.translate(string.maketrans("", "", ), bad_chars)', setup=setup)
print "string.translate: ", timer.timeit(100000)

Here are the results:

List comprehension:  0.631745100021
Regular expression: 0.155561923981
Replace in loop: 0.235936164856
string.translate: 0.0965719223022

Results on other runs follow a similar pattern. If speed is not the primary concern, however, I still think string.translate is not the most readable; the other three are more obvious, though slower to varying degrees.

How to strip multiple unwanted characters from a list of strings in python?

Use re module, re.sub function will allow you to do that.
We need to replace multilpe \n occurences with single \n and remove -- string

import re

code='''Although never is often better than right now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!'''

result = re.sub('\n{2,}', '\n', code)
result = re.sub(' -- ', ' ', result)

print(result)

After that split() your text.

lstrip unexpected output: removes additional character

lstrip takes a list of characters to remove from the string. As c is in the list you provided, it gets removed

To achieve what you actually want, use replace:

'http://twitter.com/c_renwick'.replace('http://twitter.com/','')

Python strip() removing extra (unwanted) character

As per the comments, .strip() takes a list of characters and strips all of them - not a string to strip in particular. It will do so at the ends of the string, moving inwards until it finds something not part of that character set, at which point it will stop.

An alternative that will do what you want (at least in this case) is regex replacement:

>>> import re
>>> re.sub(r'_pre_relu$', '', word)
'mixed4d'

This simply looks for the text _pre_relu at the very end of the string, thus serving as a rstrip(). The equivalent lstrip() replacement would be r'^_pre_relu', which would remove that text at the very beginning of the string, instead.

How do I remove a substring from the end of a string?

strip doesn't mean "remove this substring". x.strip(y) treats y as a set of characters and strips any characters in that set from both ends of x.

On Python 3.9 and newer you can use the removeprefix and removesuffix methods to remove an entire substring from either side of the string:

url = 'abcdc.com'
url.removesuffix('.com') # Returns 'abcdc'
url.removeprefix('abcdc.') # Returns 'com'

The relevant Python Enhancement Proposal is PEP-616.

On Python 3.8 and older you can use endswith and slicing:

url = 'abcdc.com'
if url.endswith('.com'):
url = url[:-4]

Or a regular expression:

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)


Related Topics



Leave a reply



Submit