Casting Raw Strings Python

casting raw strings python

Python 3:

"hurr..\n..durr".encode('unicode-escape').decode()

Python 2:

"hurr..\n..durr".encode('string-escape')

Print raw string from variable? (not getting the answers)

I had a similar problem and stumbled upon this question, and know thanks to Nick Olson-Harris' answer that the solution lies with changing the string.

Two ways of solving it:

  1. Get the path you want using native python functions, e.g.:

    test = os.getcwd() # In case the path in question is your current directory
    print(repr(test))

    This makes it platform independent and it now works with .encode. If this is an option for you, it's the more elegant solution.

  2. If your string is not a path, define it in a way compatible with python strings, in this case by escaping your backslashes:

    test = 'C:\\Windows\\Users\\alexb\\'
    print(repr(test))

Make Python string a raw string

The idea behind r' ' is to write raw string literals, because it changes the way python escape characters. If it happens to be a value from a variable, as you stated above, you don't need to use r' ' because you are not explicitly writing a string literal.

Either way, I think this will do:

path = r'%s' % pathToFile

EDIT:
Also, as commented to the question, you should really be sure the path exists.

How to create raw string from string variable in python?

There is no such thing as "raw string" once the string is created in the process. The "" and r"" ways of specifying the string exist only in the source code itself.

That means "\x01" will create a string consisting of one byte 0x01, but r"\x01" will create a string consisting of 4 bytes '0x5c', '0x78', '0x30', '0x31'. (assuming we're talking about python 2 and ignoring encodings for a while).

You mentioned in the comment that you're taking the string from the user (either gui or console input will work the same here) - in that case string character escapes will not be processed, so there's nothing you have to do about it. You can check it easily like this (or whatever the windows equivalent is, I only speak *nix):

% cat > test <<EOF                                             
heredoc> \x41
heredoc> EOF
% < test python -c "import sys; print sys.stdin.read()"
\x41

How to convert a raw string into a normal string?

If your input value is a str string, use codecs.decode() to convert:

import codecs

codecs.decode(raw_unicode_string, 'unicode_escape')

If your input value is a bytes object, you can use the bytes.decode() method:

raw_byte_string.decode('unicode_escape')

Demo:

>>> import codecs
>>> codecs.decode('\\x89\\n', 'unicode_escape')
'\x89\n'
>>> b'\\x89\\n'.decode('unicode_escape')
'\x89\n'

Python 2 byte strings can be decoded with the 'string_escape' codec:

>>> import sys; sys.version_info[:2]
(2, 7)
>>> '\\x89\\n'.decode('string_escape')
'\x89\n'

For Unicode literals (with a u prefix, e.g. u'\\x89\\n'), use 'unicode_escape'.

What exactly do u and r string prefixes do, and what are raw string literals?

There's not really any "raw string"; there are raw string literals, which are exactly the string literals marked by an 'r' before the opening quote.

A "raw string literal" is a slightly different syntax for a string literal, in which a backslash, \, is taken as meaning "just a backslash" (except when it comes right before a quote that would otherwise terminate the literal) -- no "escape sequences" to represent newlines, tabs, backspaces, form-feeds, and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.

This syntax variant exists mostly because the syntax of regular expression patterns is heavy with backslashes (but never at the end, so the "except" clause above doesn't matter) and it looks a bit better when you avoid doubling up each of them -- that's all. It also gained some popularity to express native Windows file paths (with backslashes instead of regular slashes like on other platforms), but that's very rarely needed (since normal slashes mostly work fine on Windows too) and imperfect (due to the "except" clause above).

r'...' is a byte string (in Python 2.*), ur'...' is a Unicode string (again, in Python 2.*), and any of the other three kinds of quoting also produces exactly the same types of strings (so for example r'...', r'''...''', r"...", r"""...""" are all byte strings, and so on).

Not sure what you mean by "going back" - there is no intrinsically back and forward directions, because there's no raw string type, it's just an alternative syntax to express perfectly normal string objects, byte or unicode as they may be.

And yes, in Python 2.*, u'...' is of course always distinct from just '...' -- the former is a unicode string, the latter is a byte string. What encoding the literal might be expressed in is a completely orthogonal issue.

E.g., consider (Python 2.6):

>>> sys.getsizeof('ciao')
28
>>> sys.getsizeof(u'ciao')
34

The Unicode object of course takes more memory space (very small difference for a very short string, obviously ;-).

How to treat a returned/stored string like a raw string in Python?

x = '\xff\x00'
y = ['%02x' % ord(c) for c in x]
print y

Output:

['ff', '00']


Related Topics



Leave a reply



Submit