How to remove all the escape sequences from a list of strings?
Something like this?
>>> from ast import literal_eval
>>> s = r'Hello,\nworld!'
>>> print(literal_eval("'%s'" % s))
Hello,
world!
Edit: ok, that's not what you want. What you want can't be done in general, because, as @Sven Marnach explained, strings don't actually contain escape sequences. Those are just notation in string literals.
You can filter all strings with non-ASCII characters from your list with
def is_ascii(s):
try:
s.decode('ascii')
return True
except UnicodeDecodeError:
return False
[s for s in ['william', 'short', '\x80', 'twitter', '\xaa',
'\xe2', 'video', 'guy', 'ray']
if is_ascii(s)]
How to remove escape characters from string in python?
It seems you have a unicode string like in python 2.x we have unicode strings like
inp_str = u'\xd7\nRecord has been added successfully, record id: 92'
if you want to remove escape charecters which means almost special charecters, i hope this is one of the way for getting only ascii charecters without using any regex or any Hardcoded.
inp_str = u'\xd7\nRecord has been added successfully, record id: 92'
print inp_str.encode('ascii',errors='ignore').strip('\n')
Results : 'Record has been added successfully, record id: 92'
First i did encode because it is already a unicode, So while encoding to ascii if any charecters not in ascii level,It will Ignore.And you just strip '\n'
Hope this helps you :)
How do I remove escape character (\) from a list in python?
You can convert the string to bytes and then use the bytes.decode
method with unicode_escape
as the encoding to un-escape a given string:
cmd = [bytes(s, 'utf-8').decode('unicode_escape') for s in cmd]
Python how to remove escape characters from a string
Maybe the regex module is the way to go
>>> s = 'test\x06\x06\x06\x06'
>>> s1 = 'test2\x04\x04\x04\x04'
>>> import re
>>> re.sub('[^A-Za-z0-9]+', '', s)
'test'
>>> re.sub('[^A-Za-z0-9]+', '', s1)
'test2'
Remove escape character from string
The character '\a' is the ASCII BEL character, chr(7).
To do the conversion in Python 2:
from __future__ import print_function
a = '\\a'
c = a.decode('string-escape')
print(repr(a), repr(c))
output
'\\a' '\x07'
And for future reference, in Python 3:
a = '\\a'
b = bytes(a, encoding='ascii')
c = b.decode('unicode-escape')
print(repr(a), repr(c))
This gives identical output to the above snippet.
In Python 3, if you were working with bytes objects you'd do something like this:
a = b'\\a'
c = bytes(a.decode('unicode-escape'), 'ascii')
print(repr(a), repr(c))
output
b'\\a' b'\x07'
As Antti Haapala mentions, this simple strategy for Python 3 won't work if the source string contains unicode characters too. In tha case, please see his answer for a more robust solution.
Remove escaped characters like new line, tabs, carriage returns, etc. inside a string
While, for example, \n
is an escape character, \\n
is not. This is why you are left with strings like \\n \\\\n \\t\\\\t \\r\\\\r
after sentence.split()
.
This will return the desired output:
result=" ".join(word for word in sentence.split() if not word.startswith("\\"))
It breaks the sentence down into words, striping any leading or trailing whitespace, but only considering words that do not start with a backslash. Remember things like \\n
are not escape characters but representation of literal string \n
.
Btw I wouldn't call your attempt "brute force", as string functions like split()
, strip()
, join()
, replace()
etc. are intended for solving exactly this type of problem.
remove the escape character and get part of string
The string looks like a json after unicode-escape decoding:
>>> s = '{"type":"2","question_id":"...","text":"\\u5fcd \\u8b93\\u5c0d\\u65b9"}'
>>> s.encode().decode('unicode-escape') # `encode` is not needed in python 2.x
'{"type":"2","question_id":"對於經營一段感情,妳覺得最重要的關鍵是什麼呢?","text":"忍 讓對方"}'
You can use json.loads
to deserialize the json:
>>> import json
>>> print(json.loads(s.encode().decode('unicode-escape'))['text'])
'忍 讓對方'
how to Remove Escaping character ( Back slash "\") from pandas dataframe
You can try replace -
>>> import pandas as pd
>>>
>>> val = [r"ALTRAN CONSULTING & \NENGINEERING GMBH",r"NANOVO KERESKEDELMI KFT \KENYSZERTORLES ALATT"]
>>>
>>> d = {'name':val}
>>>
>>> df = pd.DataFrame(d)
>>> df['name'] = df['name'].replace(to_replace= r'\\', value= '', regex=True)
>>> df
name
0 ALTRAN CONSULTING & NENGINEERING GMBH
1 NANOVO KERESKEDELMI KFT KENYSZERTORLES ALATT
>>>
Related Topics
Converting Two Lists into a Matrix
Finding Non-Numeric Rows in Dataframe in Pandas
Convert String to Python Class Object
Jsondecodeerror: Expecting Value: Line 1 Column 1 (Char 0)
Import Error: Dll Load Failed in Jupyter Notebook But Working in .Py File
Finding an Exact Substring in a String in Python
Pandas - Replace Outliers With Groupby Mean
How to Find the Longest Word in a Text File
Find the Last Row from a CSV Input Python
How to Write to an Existing Excel File Without Overwriting Data (Using Pandas)
Python | Count Number of False Statements in 3 Rows
Python - How to Separate Paragraphs from Text
Python Pandas: Nameerror: Name Is Not Defined
How to Replace Nan Values Where the Other Columns Meet a Certain Criteria
How to Get the Neighboring Elements in a Numpy Array With Taking Boundaries into Account