Quoting Backslashes in Python String Literals

Quoting backslashes in Python string literals

You're being mislead by output -- the second approach you're taking actually does what you want, you just aren't believing it. :)

>>> foo = 'baz "\\"'
>>> foo
'baz "\\"'
>>> print(foo)
baz "\"

Incidentally, there's another string form which might be a bit clearer:

>>> print(r'baz "\"')
baz "\"

How can I put an actual backslash in a string literal (not use it for an escape sequence)?

To answer your question directly, put r in front of the string.

final= path + r'\xulrunner.exe ' + path + r'\application.ini'

But a better solution would be os.path.join:

final = os.path.join(path, 'xulrunner.exe') + ' ' + \
os.path.join(path, 'application.ini')

(the backslash there is escaping a newline, but you could put the whole thing on one line if you want)

I will mention that you can use forward slashes in file paths, and Python will automatically convert them to the correct separator (backslash on Windows) as necessary. So

final = path + '/xulrunner.exe ' + path + '/application.ini'

should work. But it's still preferable to use os.path.join because that makes it clear what you're trying to do.

What does a backslash mean in a string literal?

The backslash is used to escape special (unprintable) characters in string literals. \n is for newline, \t for tab, \f for a form-feed (rarely used) and several more exist.

When you give the string literal "\0" you effectively denote a string with exactly one character which is the (unprintable) NUL character (a 0-byte). You can represent this as \0 in string literals. The same goes for \1 (which is a 1-byte in a string) etc.

Actually, the \8 and \9 are different because after a backslash you have to denote the value of the byte you want in octal notation, e. g. using digits 07 only. So effectively, the backslash before the 8 and before the 9 has no special meaning and \8 results in two characters, namely the backslash verbatim and the 8 as a digit verbatim.

When you now print the representation of such a string literal (e. g. by having it in a list you print), then the Python interpreter recreates a representation for the internal string (which is supposed to look like a string literal). This is not the string contents, but the version of the string as you can denote it in a Python program, i. e. enclosed in quotes and using backslashes to escape special characters. The Python interpreter doesn't represent special characters using the octal notation, though. It uses the hexadecimal notation instead which introduces each special character with a \x followed by exactly two hexadecimal characters.

That means that \0 becomes \x00, \1 becomes \x01 etc. The \8, as mentioned, is in fact the representation of two characters, namely the backslash and the digit 8. The backslash is then escaped by the Python interpreter to a double backslash \\, and the 8 is appended as normal character.

The input \10 is the character with value 8 (because octal 10 is decimal 8 and also hexadecimal 8, look up octal and hexadecimal numbers to learn about that). So the input \10 becomes \x08. The \11 is the character with value 9 which is a tab character for which a special notation exists, that is \t.

How does Python interpret backslash in string?

From your follow-up comment:

What puzzled me is in my example, it doesn't escape. Single backslash produces double backslashes. Double backslashes produce Double backslashes. Triple backslashes produce quadruple backslashes.....

To be clear: your first output is a string with one backslash in it. Python displays two backslashes in its representation of the string.

When you input the string with a single backslash, Python does not treat the sequence \] in the input as any special escape sequence, and therefore the \ is turned into an actual backslash in the actual string, and the ] into a closing square bracket. Quoting from the documentation linked by Klaus D.:

Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)

When you input the string with a double backslash, the sequence \\ is an escape sequence for a single backslash, and then the ] is just a ].

Either way, when Python displays the string back to you, it uses \\ for the single actual backslash, because it does not look ahead to determine that a single backslash would work - the backslash always gets escaped.


To go into a little more detail: Python doesn't care about how you specified the string in the first place - it has a specific "normalized" form that depends only on what the string actually contains. We can see this by playing around with the different ways to quote a string:

>>> 'foo'
'foo'
>>> "foo"
'foo'
>>> r'foo'
'foo'
>>> """foo"""
'foo'

The normalized form will use double quotes if that avoids escape sequences for single quotes:

>>> '\'\'\''
"'''"

But it will switch back to single quotes if the string contains both types of quote:

>>> '\'"'
'\'"'
>>> "'\"'
'\'"'

(Exercise: how many characters are actually in this string, and what are they? How many backslashes does the string contain?)


It contains two characters - a single-quote and a double-quote - and no backslashes.

Why can't Python's raw string literals end with a single backslash?

The reason is explained in the part of that section which I highlighted in bold:

String quotes can be escaped with a
backslash,
but the backslash remains
in the string; for example, r"\"" is a
valid string literal consisting of two
characters: a backslash and a double
quote; r"\" is not a valid string
literal (even a raw string cannot end
in an odd number of backslashes).
Specifically, a raw string cannot end
in a single backslash (since the
backslash would escape the following
quote character). Note also that a
single backslash followed by a newline
is interpreted as those two characters
as part of the string, not as a line
continuation.

So raw strings are not 100% raw, there is still some rudimentary backslash-processing.

Why I get a backslash in python string

When you're writing strings in python, you can choose to either "write them with double quotes" or 'write them with single quotes'. You only need to escape quotes that match the quotes you used for the string. So, 'This "is" a valid string', but to change those double quotes to single quotes, 'You \'need\' to do this'. So, in your string, you're correctly escaping the double quotes, because you have a double quoted string, but you don't need to escape the single quote.

However, it doesn't matter, because \' is still being interpreted correctly by python; it's just printing it as an escaped character so you know it's the character ' and not the end of the string.

>>> "\'\""
'\'"'
>>> print("\'\"")
'"

How to quote backslash in Python code (four \ to quote one)?

You should only need to escape something once if you specify it to be a raw r string.

regex = r"C:\\ghs\\comp_201416\\([a-z]*)\.exe"

\ is escaped once, so it looks like \\, for .exe only . needs escaping, so \.



Related Topics



Leave a reply



Submit