Why Do I Need 'B' to Encode a String with Base64

Why do I need 'b' to encode a string with Base64?

base64 encoding takes 8-bit binary byte data and encodes it uses only the characters A-Z, a-z, 0-9, +, /* so it can be transmitted over channels that do not preserve all 8-bits of data, such as email.

Hence, it wants a string of 8-bit bytes. You create those in Python 3 with the b'' syntax.

If you remove the b, it becomes a string. A string is a sequence of Unicode characters. base64 has no idea what to do with Unicode data, it's not 8-bit. It's not really any bits, in fact. :-)

In your second example:

>>> encoded = base64.b64encode('data to be encoded')

All the characters fit neatly into the ASCII character set, and base64 encoding is therefore actually a bit pointless. You can convert it to ascii instead, with

>>> encoded = 'data to be encoded'.encode('ascii')

Or simpler:

>>> encoded = b'data to be encoded'

Which would be the same thing in this case.


* Most base64 flavours may also include a = at the end as padding. In addition, some base64 variants may use characters other than + and /. See the Variants summary table at Wikipedia for an overview.

Why does base64.b64encode return a value of b'somestring' instead of simply 'somestring'?

It is returning a bytes object. Usually base64 is used to make something 7-bit safe, and thus is often used with byte-oriented (rather than character oriented) data, for example, to shove out a socket.

You can decode it to a string, just like any other bytes object:

output.decode('ascii')

Byte and String objects are changed between each other using encode() and decode(). It is safe to use the ascii codec since base64 is guaranteed to only return 7-bit ascii.

What is base 64 encoding used for?

When you have some binary data that you want to ship across a network, you generally don't do it by just streaming the bits and bytes over the wire in a raw format. Why? because some media are made for streaming text. You never know -- some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you've entered a special character combination (like how FTP translates line endings).

So to get around this, people encode the binary data into characters. Base64 is one of these types of encodings.

Why 64?
Because you can generally rely on the same 64 characters being present in many character sets, and you can be reasonably confident that your data's going to end up on the other side of the wire uncorrupted.

Why does base64 encode return bytes instead of string directly?

You are correct that base64 is meant to be a textual representation of binary data.

However, you are neglecting constraints on the actual implementation side of things.

import sys

>>> sys.getsizeof("Hello World")
60
>>> sys.getsizeof("Hello World".encode("utf-8"))
44

str objects simply take up more system resources than bytes. This overhead can lead to non-trivial degradation in performance when working with larger bodies of base64 encoded data.

I also suspect that since the original python module was ported from python2.7 (which did not distinguish between str and bytes), that this might also just be a legacy quirk.

python base64 encode to string

You have a bytes object; decode it to Unicode:

print(four.decode('ascii'))

Base64 only uses ASCII characters, so that's a good codec do use here. If you don't explicitly decode, print() can only use the repr() representation, which produces Python literal syntax, the syntax you'd use to create the same value as a literal.

How to print without the b`` string?

The b-prefix indicates that you are dealing with a byte string, which is basically just a sequence of bytes. to turn those into text, you need to apply some encoding.

Given that you used base64, all the produced bytes nicely map onto ascii anyway, and you can do something like this:

print(base64.b64encode(os.urandom(24)).decode("ascii"))

Why does base64.b64encode() return a bytes object?

The purpose of the base64.b64encode() function is to convert binary data into ASCII-safe "text"

Python disagrees with that - base64 has been intentionally classified as a binary transform.

It was a design decision in Python 3 to force the separation of bytes and text and prohibit implicit transformations. Python is now so strict about this that bytes.encode doesn't even exist, and so b'abc'.encode('base64') would raise an AttributeError.

The opinion the language takes is that a bytestring object is already encoded. A codec which encodes bytes into text does not fit into this paradigm, because when you want to go from the bytes domain to the text domain it's a decode. Note that rot13 encoding was also banished from the list of standard encodings for the same reason - it didn't fit properly into the Python 3 paradigm.

There also can be a performance argument to make: suppose Python automatically handled decoding of the base64 output, which is an ASCII-encoded binary representation produced by C code from the binascii module, into a Python object in the text domain. If you actually wanted the bytes, you would just have to undo the decoding by encoding into ASCII again. It would be a wasteful round-trip, an unnecessary double-negation. Better to 'opt-in' for the decode-to-text step.

Convert ascii string to base64 without the b and quotation marks

The b prefix denotes that it is a binary string. A binary string is not a string: it is a sequence of bytes (values in the 0 to 255 range). It is simply typesetted as a string to make it more compact.

In case of base64 however, all characters are valid ASCII characters, you can thus simply decode it like:

print(string.decode('ascii'))

So here we will decode each byte to its ASCII equivalent. Since base64 guarantees that every byte it produces is in the ASCII range 'A' to '/') we will always produce a valid string. Mind however that this is not guaranteed with an arbitrary binary string.

How can you encode a string to Base64 in JavaScript?

You can use btoa() and atob() to convert to and from base64 encoding.

There appears to be some confusion in the comments regarding what these functions accept/return, so…

  • btoa() accepts a “string” where each character represents an 8-bit byte – if you pass a string containing characters that can’t be represented in 8 bits, it will probably break. This isn’t a problem if you’re actually treating the string as a byte array, but if you’re trying to do something else then you’ll have to encode it first.

  • atob() returns a “string” where each character represents an 8-bit byte – that is, its value will be between 0 and 0xff. This does not mean it’s ASCII – presumably if you’re using this function at all, you expect to be working with binary data and not text.

See also:

  • How do I load binary image data using Javascript and XMLHttpRequest?

Most comments here are outdated. You can probably use both btoa() and atob(), unless you support really outdated browsers.

Check here:

  • https://caniuse.com/?search=atob
  • https://caniuse.com/?search=btoa


Related Topics



Leave a reply



Submit