Python string.strip stripping too many characters
strip()
removes all the leading and trailing characters from the input string that match one of the characters in the parameter string:
>>> "abcdefabcdefabc".strip("cba")
'defabcdef'
You want to use a regex: table_name = re.sub(r"\.csv$", "", name)
or os.path
s path manipulation functions:
>>> table_name, extension = os.path.splitext("movies.csv")
>>> table_name
'movies'
>>> extension
'.csv'
Strip removing more characters than expected
str.lstrip
removes all the characters in its argument from the string, starting at the left. Since all the characters in the left prefix "REFPROP-MIX:ME" are in the argument "REFPROP-MIX:", all those characters are removed. Likewise:
>>> s = 'abcadef'
>>> s.lstrip('abc')
'def'
>>> s.lstrip('cba')
'def'
>>> s.lstrip('bacabacabacabaca')
'def'
str.lstrip
does not remove whole strings (of length greater than 1) from the left. If you want to do that, use a regular expression with an anchor ^
at the beginning:
>>> import re
>>> s = 'REFPROP-MIX:METHANOL&WATER'
>>> re.sub(r'^REFPROP-MIX:', '', s)
'METHANOL&WATER'
Python strip() multiple characters?
I did a time test here, using each method 100000 times in a loop. The results surprised me. (The results still surprise me after editing them in response to valid criticism in the comments.)
Here's the script:
import timeit
bad_chars = '(){}<>'
setup = """import re
import string
s = 'Barack (of Washington)'
bad_chars = '(){}<>'
rgx = re.compile('[%s]' % bad_chars)"""
timer = timeit.Timer('o = "".join(c for c in s if c not in bad_chars)', setup=setup)
print "List comprehension: ", timer.timeit(100000)
timer = timeit.Timer("o= rgx.sub('', s)", setup=setup)
print "Regular expression: ", timer.timeit(100000)
timer = timeit.Timer('for c in bad_chars: s = s.replace(c, "")', setup=setup)
print "Replace in loop: ", timer.timeit(100000)
timer = timeit.Timer('s.translate(string.maketrans("", "", ), bad_chars)', setup=setup)
print "string.translate: ", timer.timeit(100000)
Here are the results:
List comprehension: 0.631745100021
Regular expression: 0.155561923981
Replace in loop: 0.235936164856
string.translate: 0.0965719223022
Results on other runs follow a similar pattern. If speed is not the primary concern, however, I still think string.translate
is not the most readable; the other three are more obvious, though slower to varying degrees.
How to strip multiple unwanted characters from a list of strings in python?
Use re
module, re.sub
function will allow you to do that.
We need to replace multilpe \n
occurences with single \n
and remove --
string
import re
code='''Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!'''
result = re.sub('\n{2,}', '\n', code)
result = re.sub(' -- ', ' ', result)
print(result)
After that split() your text.
lstrip unexpected output: removes additional character
lstrip
takes a list of characters to remove from the string. As c
is in the list you provided, it gets removed
To achieve what you actually want, use replace:
'http://twitter.com/c_renwick'.replace('http://twitter.com/','')
Python strip() removing extra (unwanted) character
As per the comments, .strip()
takes a list of characters and strips all of them - not a string to strip in particular. It will do so at the ends of the string, moving inwards until it finds something not part of that character set, at which point it will stop.
An alternative that will do what you want (at least in this case) is regex replacement:
>>> import re
>>> re.sub(r'_pre_relu$', '', word)
'mixed4d'
This simply looks for the text _pre_relu
at the very end of the string, thus serving as a rstrip()
. The equivalent lstrip()
replacement would be r'^_pre_relu'
, which would remove that text at the very beginning of the string, instead.
How do I remove a substring from the end of a string?
strip
doesn't mean "remove this substring". x.strip(y)
treats y
as a set of characters and strips any characters in that set from both ends of x
.
On Python 3.9 and newer you can use the removeprefix
and removesuffix
methods to remove an entire substring from either side of the string:
url = 'abcdc.com'
url.removesuffix('.com') # Returns 'abcdc'
url.removeprefix('abcdc.') # Returns 'com'
The relevant Python Enhancement Proposal is PEP-616.
On Python 3.8 and older you can use endswith
and slicing:
url = 'abcdc.com'
if url.endswith('.com'):
url = url[:-4]
Or a regular expression:
import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)
Related Topics
Why Do "Not a Number" Values Equal True When Cast as Boolean in Python/Numpy
Easy Pretty Printing of Floats
Python Read from Subprocess Stdout and Stderr Separately While Preserving Order
Django Return Redirect() with Parameters
Product Code Looks Like Abcd2343, How to Split by Letters and Numbers
How to Run Pygame or Pyglet in a Browser
Getting Individual Colors from a Color Map in Matplotlib
How to Get the Largest Integer One Can Use in Python
How to Run All Python Unit Tests in a Directory
Calculating Difference Between Two Rows in Python/Pandas
Is There a Clever Way to Pass the Key to Defaultdict's Default_Factory
How to Save and Load Numpy.Array() Data Properly
How to Set the Absolute Position of Figure Windows with Matplotlib
Django Submit Two Different Forms with One Submit Button
How to Hide the Console Window in a Pyqt App Running on Windows