What Does a B Prefix Before a Python String Mean

What does a b prefix before a python string mean?

This is Python3 bytes literal. This prefix is absent in Python 2.5 and older (it is equivalent to a plain string of 2.x, while plain string of 3.x is equivalent to a literal with u prefix in 2.x). In Python 2.6+ it is equivalent to a plain string, for compatibility with 3.x.

What does the 'b' character do in front of a string literal?

To quote the Python 2.x documentation:

A prefix of 'b' or 'B' is ignored in
Python 2; it indicates that the
literal should become a bytes literal
in Python 3 (e.g. when code is
automatically converted with 2to3). A
'u' or 'b' prefix may be followed by
an 'r' prefix.

The Python 3 documentation states:

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

How do I get rid of the b-prefix in a string in python?

decode the bytes to produce a str:

b = b'1234'
print(b.decode('utf-8')) # '1234'

When creating bytes with b prefix before string, what encoding does python use?

The bytes type can hold arbitrary data.
For example, (the beginning of) a JPEG image:

>>> with open('Bilder/19/01/IMG_3388.JPG', 'rb') as f:
... head = f.read(10)

You should think of it as a sequence of integers.
That's also how the type behaves in many aspects:

>>> list(head)
[255, 216, 255, 225, 111, 254, 69, 120, 105, 102]
>>> head[0]
255
>>> sum(head)
1712

For reasons of convenience (and for historical reasons, I guess), the standard representation of the bytes, and its literals, are similar to strings:

>>> head
b'\xff\xd8\xff\xe1o\xfeExif'

It uses ASCII printable characters where applicable, \xNN escapes otherwise.
This is convenient if the bytes object represents text:

>>> 'Zoë'.encode('utf8')
b'Zo\xc3\xab'
>>> 'Zoë'.encode('utf16')
b'\xff\xfeZ\x00o\x00\xeb\x00'
>>> 'Zoë'.encode('latin1')
b'Zo\xeb'

When you type bytes literals, Python uses ASCII to decode them.
Characters in the ASCII range are encoded the same way in UTF-8, that's why you observed the equivalence of b'a' == bytes('a', 'utf8').
A bit less misleading might be the expression b'a' == bytes('a', 'ascii').

r string b string u string Python 2 / 3 comparison

From the python docs for literals: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

Bytes literals are always prefixed with 'b' or 'B'; they produce an
instance of the bytes type instead of the str type. They may only
contain ASCII characters; bytes with a numeric value of 128 or greater
must be expressed with escapes.

Both string and bytes literals may optionally be prefixed with a
letter 'r' or 'R'; such strings are called raw strings and treat
backslashes as literal characters. As a result, in string literals,
'\U' and '\u' escapes in raw strings are not treated specially. Given
that Python 2.x’s raw unicode literals behave differently than Python
3.x’s the 'ur' syntax is not supported.

and

A string literal with 'f' or 'F' in its prefix is a formatted string
literal; see Formatted string literals. The 'f' may be combined with
'r', but not with 'b' or 'u', therefore raw formatted strings are
possible, but formatted bytes literals are not.

So:

  • r means raw
  • b means bytes
  • u means unicode
  • f means format

The r and b were already available in Python 2, as such in many other languages (they are very handy sometimes).

Since the strings literals were not unicode in Python 2, the u-strings were created to offer support for internationalization. As of Python 3, u-strings are the default strings, so "..." is semantically the same as u"...".

Finally, from those, the f-string is the only one that isn't supported in Python 2.

Add b prefix to python variable?

# only an example, you can choose a different encoding
bytes('example', encoding='utf-8')

In Python3:

Bytes literals are always prefixed with 'b' or 'B'; they produce an
instance of the bytes type instead of the str type. They may only
contain ASCII characters; bytes with a numeric value of 128 or greater
must be expressed with escapes.

In Python2:

A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the
literal should become a bytes literal in Python 3.

More about bytes():

bytes([source[, encoding[, errors]]])

Return a new “bytes” object, which is an immutable sequence of
integers in the range 0 <= x < 256. bytes is an immutable version of
bytearray – it has the same non-mutating methods and the same indexing
and slicing behavior.

Accordingly, constructor arguments are interpreted as for bytearray().

Bytes objects can also be created with literals, see String and Bytes
literals.

Remove 'b' character do in front of a string literal in Python 3

Decoding is redundant

You only had this "error" in the first place, because of a misunderstanding of what's happening.

You get the b because you encoded to utf-8 and now it's a bytes object.

 >> type("text".encode("utf-8"))
>> <class 'bytes'>

Fixes:

  1. You can just print the string first
  2. Redundantly decode it after encoding

What exactly do u and r string prefixes do, and what are raw string literals?

There's not really any "raw string"; there are raw string literals, which are exactly the string literals marked by an 'r' before the opening quote.

A "raw string literal" is a slightly different syntax for a string literal, in which a backslash, \, is taken as meaning "just a backslash" (except when it comes right before a quote that would otherwise terminate the literal) -- no "escape sequences" to represent newlines, tabs, backspaces, form-feeds, and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.

This syntax variant exists mostly because the syntax of regular expression patterns is heavy with backslashes (but never at the end, so the "except" clause above doesn't matter) and it looks a bit better when you avoid doubling up each of them -- that's all. It also gained some popularity to express native Windows file paths (with backslashes instead of regular slashes like on other platforms), but that's very rarely needed (since normal slashes mostly work fine on Windows too) and imperfect (due to the "except" clause above).

r'...' is a byte string (in Python 2.*), ur'...' is a Unicode string (again, in Python 2.*), and any of the other three kinds of quoting also produces exactly the same types of strings (so for example r'...', r'''...''', r"...", r"""...""" are all byte strings, and so on).

Not sure what you mean by "going back" - there is no intrinsically back and forward directions, because there's no raw string type, it's just an alternative syntax to express perfectly normal string objects, byte or unicode as they may be.

And yes, in Python 2.*, u'...' is of course always distinct from just '...' -- the former is a unicode string, the latter is a byte string. What encoding the literal might be expressed in is a completely orthogonal issue.

E.g., consider (Python 2.6):

>>> sys.getsizeof('ciao')
28
>>> sys.getsizeof(u'ciao')
34

The Unicode object of course takes more memory space (very small difference for a very short string, obviously ;-).

How do I get rid of b prefix and ' ' in a string variable

One easy way is to interpolate the bytestrings into another bytestring, do not mix them. Then at the end just decode the bytestring if you want a string:

>>> no = "Interpolate string and %s" % b
>>> no
"Interpolate string and b'bytes'"

>>> yes = b"Interpolate bytes and %s" % b
>>> yes
b'Interpolate bytes and bytes'

>>> yes.decode()
'Interpolate bytes and bytes'

So in your example code:

>>> TCi = b"1"
>>> TCo = b"2"
>>> IOtc = b"playrange set: in: %s out: %s\n" % (TCi, TCo)
>>> IOtc
b'playrange set: in: 1 out: 2\n'

And since you need a bytestring at the end to write to the telnet session, this way you wouldn't need to re encode your resulting string, use the bytestring as is instead.



Related Topics



Leave a reply



Submit