What does preceding a string literal with r mean?
The r
means that the string is to be treated as a raw string, which means all escape codes will be ignored.
For an example:
'\n'
will be treated as a newline character, while r'\n'
will be treated as the characters \
followed by n
.
When an
'r'
or'R'
prefix is present,
a character following a backslash is
included in the string without change,
and all backslashes are left in the
string. For example, the string
literalr"\n"
consists of two
characters: a backslash and a
lowercase'n'
. String quotes can be
escaped with a backslash, but the
backslash remains in the string; for
example,r"\""
is a valid string
literal consisting of two characters:
a backslash and a double quote;r"\"
is not a valid string literal (even a
raw string cannot end in an odd number
of backslashes). Specifically, a raw
string cannot end in a single
backslash (since the backslash would
escape the following quote character).
Note also that a single backslash
followed by a newline is interpreted
as those two characters as part of the
string, not as a line continuation.
Source: Python string literals
What exactly do u and r string prefixes do, and what are raw string literals?
There's not really any "raw string"; there are raw string literals, which are exactly the string literals marked by an 'r'
before the opening quote.
A "raw string literal" is a slightly different syntax for a string literal, in which a backslash, \
, is taken as meaning "just a backslash" (except when it comes right before a quote that would otherwise terminate the literal) -- no "escape sequences" to represent newlines, tabs, backspaces, form-feeds, and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.
This syntax variant exists mostly because the syntax of regular expression patterns is heavy with backslashes (but never at the end, so the "except" clause above doesn't matter) and it looks a bit better when you avoid doubling up each of them -- that's all. It also gained some popularity to express native Windows file paths (with backslashes instead of regular slashes like on other platforms), but that's very rarely needed (since normal slashes mostly work fine on Windows too) and imperfect (due to the "except" clause above).
r'...'
is a byte string (in Python 2.*), ur'...'
is a Unicode string (again, in Python 2.*), and any of the other three kinds of quoting also produces exactly the same types of strings (so for example r'...'
, r'''...'''
, r"..."
, r"""..."""
are all byte strings, and so on).
Not sure what you mean by "going back" - there is no intrinsically back and forward directions, because there's no raw string type, it's just an alternative syntax to express perfectly normal string objects, byte or unicode as they may be.
And yes, in Python 2.*, u'...'
is of course always distinct from just '...'
-- the former is a unicode string, the latter is a byte string. What encoding the literal might be expressed in is a completely orthogonal issue.
E.g., consider (Python 2.6):
>>> sys.getsizeof('ciao')
28
>>> sys.getsizeof(u'ciao')
34
The Unicode object of course takes more memory space (very small difference for a very short string, obviously ;-).
What does the r in pythons re.compile(r' pattern flags') mean?
As @PauloBu
stated, the r
string prefix is not specifically related to regex's, but to strings generally in Python.
Normal strings use the backslash character as an escape character for special characters (like newlines):
>>> print('this is \n a test')
this is
a test
The r
prefix tells the interpreter not to do this:
>>> print(r'this is \n a test')
this is \n a test
>>>
This is important in regular expressions, as you need the backslash to make it to the re
module intact - in particular, \b
matches empty string specifically at the start and end of a word. re
expects the string \b
, however normal string interpretation '\b'
is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'
), or tell python it is a raw string (r'\b'
).
>>> import re
>>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'\b', 'test') # often this syntax is easier
['', '']
What does 'r' mean before a Regex pattern?
Placing r
or R
before a string literal creates what is known as a raw-string literal. Raw strings do not process escape sequences (\n
, \b
, etc.) and are thus commonly used for Regex patterns, which often contain a lot of \
characters.
Below is a demonstration:
>>> print('\n') # Prints a newline character
>>> print(r'\n') # Escape sequence is not processed
\n
>>> print('\b') # Prints a backspace character
>>> print(r'\b') # Escape sequence is not processed
\b
>>>
The only other option would be to double every backslash:
re.sub('def\\s+([a-zA-Z_][a-zA-Z_0-9]*)\\s*\\(\\s*\\):',
... 'static PyObject*\\npy_\\1(void)\\n{',
... 'def myfunc():')
which is just tedious.
What does the 'r' mean in this python (django) line?
It is a raw python string literal; any \n
or other character escape is not interpreted.
Since you often use the backslash in regular expressions (where they have their own meaning) it is common practice to use raw string literals for such expression definitions.
Python r Preceding Quoted Windows Registry Key
Windows uses \
as its path delimiter. Python uses \
as its string escape character. Clearly these two uses clash.
A Python raw string, prefixed with either r
or R
stops \
being interpreted as an escape character:
When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.
The reason you encounter raw strings in code that works with paths on Windows is that it allows a string containing Windows path separators to be written using \
rather than \\
. So this allows us to write:
r"C:\Python27\Python.exe"
rather than
"C:\\Python27\\Python.exe"
One may consider the raw string to be clearer for the reader. At least once the reader understands raw strings.
I don't understand what you mean about the r
prefix appearing in the data written to the registry. That won't happen with the code in the question.
>>> print r"C:\Python27\Python.exe"
C:\Python27\Python.exe
>>> print "C:\\Python27\\Python.exe"
C:\Python27\Python.exe
>>> r"C:\Python27\Python.exe" == "C:\\Python27\\Python.exe"
True
My guess is that you have manually added data using regedit that is confusing you. Beware also of the registry redirector. If you use 32 bit Python then your code modifies the 32 bit registry view, typically stored under HKLM\Software\Wow6432Node
.
What does the 'b' character do in front of a string literal?
To quote the Python 2.x documentation:
A prefix of 'b' or 'B' is ignored in
Python 2; it indicates that the
literal should become a bytes literal
in Python 3 (e.g. when code is
automatically converted with 2to3). A
'u' or 'b' prefix may be followed by
an 'r' prefix.
The Python 3 documentation states:
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
Related Topics
Pip Install Access Denied on Windows
How to Change Backends in Matplotlib/Python
Concatenate Rows of Two Dataframes in Pandas
Skip Rows During CSV Import Pandas
Polling the Keyboard (Detect a Keypress) in Python
Return in Generator Together with Yield
Id' Is a Bad Variable Name in Python
What Do Square Brackets, "[]", Mean in Function/Class Documentation
How to Run Pip from Different Versions of Python Using the Python Command
Why Does Using 'Arg=None' Fix Python's Mutable Default Argument Issue
Differences Between 'Input' and 'Raw_Input'
Some Unix Commands Fail with "<Command> Not Found", When Executed Using Python Paramiko Exec_Command
Plot Different Color for Different Categorical Levels Using Matplotlib
How to Check Type of Files Without Extensions
How to Take a Screenshot/Image of a Website Using Python
Pip Uses Incorrect Cached Package Version, Instead of the User-Specified Version
Remove All Special Characters, Punctuation and Spaces from String