Process Escape Sequences in a String in Python

Process escape sequences in a string in Python

The correct thing to do is use the 'string-escape' code to decode the string.

>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs

Don't use the AST or eval. Using the string codecs is much safer.

How do you input escape sequences in Python?

The input statement takes the input that the user typed literally. The \-escaping convention is something that happens in Python string literals: it is not a universal convention that applies to data stored in variables. If it were, then you could never store in a string variable the two characters \ followed by n because they would be interpreted as ASCII 13.

You can do what you want this way:

import ast
import shlex
a=input("Input: ")
print(ast.literal_eval(shlex.quote(a)))

If in response to the Input: prompt you type one\ntwo, then this code will print

one
two

This works by turning the contents of a which is one\ntwo back into a quoted string that looks like "one\ntwo" and then evaluating it as if it were a string literal. That brings the \-escaping convention back into play.

But it is very roundabout. Are you sure you want users of your program feeding it control characters?

Why print returns \\, not a escape character \ in Python

Referring to String and Bytes literals, when python sees a backslash in a string literal while compiling the program, it looks to the next character to see how the following characters are to be escaped. In the first case the following character is U so python knows its a unicode escape. In the final case, it sees {, realizes there is no escape, and just emits the backslash and that { character.

In print('\{}'.format('U0001F602')) there are two different string literals '\{}' and 'U0001F602'. That the first string will be parsed at runtime with .format doesn't make the result a string literal at all - its a composite value.

How do I convert a string to an escape sequence in Python?

What you're trying to do is interpret the escape sequences in the original string, to get the corresponding character(s). Don't compute them yourself, call a decode() method. In Python 3 you'll only find it on bytes objects (not str), so you need to convert to a bytes object and back:

>>> bytes("\\xf0\\xfa", "utf-8").decode("unicode_escape")
'ðú'

See here for a more complete answer to your question.

How to un-escape a backslash-escaped string?

>>> print '"Hello,\\nworld!"'.decode('string_escape')
"Hello,
world!"

Python - How do I split a string that includes an escape character as a delimiter?

Convert your string to raw string by doing r'string'

Try this:

MyString = r'A\x92\xa4\xbf'
delim = '\\' + 'x' #OR simply: delim = '\\x'
MyList = MyString.split(delim)
print(MyList)

Output:

['A', '92', 'a4', 'bf']

This technique works for any escape sequence (let me know otherwise xD) \x, just set delimiter as \\x. Working sample : https://repl.it/@stupidlylogical/RawStringPython

Works because:

Python raw string treats backslash (\) as a literal character. This is
useful when we want to have a string that contains backslash and don't
want it to be treated as an escape character.

Explanation:

When an 'r' or 'R' prefix is present, a character following a
backslash is included in the string without change, and all
backslashes are left in the string.

More: https://docs.python.org/2/reference/lexical_analysis.html#string-literals

How to format escape sequences inside a function

Since you aren't working with string literals, don't use escape sequences in the function.

def vhf(c):
print "...I want this %s escape sequence" % (c,)

vhf('\n')

How do I .decode('string-escape') in Python 3?

If you want str-to-str decoding of escape sequences, so both input and output are Unicode:

def string_escape(s, encoding='utf-8'):
return (s.encode('latin1') # To bytes, required by 'unicode-escape'
.decode('unicode-escape') # Perform the actual octal-escaping decode
.encode('latin1') # 1:1 mapping back to bytes
.decode(encoding)) # Decode original encoding

Testing:

>>> string_escape('\\123omething special')
'Something special'

>>> string_escape(r's\000u\000p\000p\000o\000r\000t\000@'
r'\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000',
'utf-16-le')
'support@psiloc.com'


Related Topics



Leave a reply



Submit