Failed to Execute 'Btoa' on 'Window': the String to Be Encoded Contains Characters Outside of the Latin1 Range

Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range.

If you have UTF8, use this (actually works with SVG source), like:

btoa(unescape(encodeURIComponent(str)))

example:

 var imgsrc = 'data:image/svg+xml;base64,' + btoa(unescape(encodeURIComponent(markup)));
var img = new Image(1, 1); // width, height values are optional params
img.src = imgsrc;

If you need to decode that base64, use this:

var str2 = decodeURIComponent(escape(window.atob(b64)));
console.log(str2);

Example:

var str = "äöüÄÖÜçéèñ";
var b64 = window.btoa(unescape(encodeURIComponent(str)))
console.log(b64);

var str2 = decodeURIComponent(escape(window.atob(b64)));
console.log(str2);

Note: if you need to get this to work in mobile-safari, you might need to strip all the white-space from the base64 data...

function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(escape(window.atob( str )));
}

2017 Update

This problem has been bugging me again.

The simple truth is, atob doesn't really handle UTF8-strings - it's ASCII only.

Also, I wouldn't use bloatware like js-base64.

But webtoolkit does have a small, nice and very maintainable implementation:

/**
*
* Base64 encode / decode
* http://www.webtoolkit.info
*
**/
var Base64 = {

// private property
_keyStr: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="

// public method for encoding
, encode: function (input)
{
var output = "";
var chr1, chr2, chr3, enc1, enc2, enc3, enc4;
var i = 0;

input = Base64._utf8_encode(input);

while (i < input.length)
{
chr1 = input.charCodeAt(i++);
chr2 = input.charCodeAt(i++);
chr3 = input.charCodeAt(i++);

enc1 = chr1 >> 2;
enc2 = ((chr1 & 3) << 4) | (chr2 >> 4);
enc3 = ((chr2 & 15) << 2) | (chr3 >> 6);
enc4 = chr3 & 63;

if (isNaN(chr2))
{
enc3 = enc4 = 64;
}
else if (isNaN(chr3))
{
enc4 = 64;
}

output = output +
this._keyStr.charAt(enc1) + this._keyStr.charAt(enc2) +
this._keyStr.charAt(enc3) + this._keyStr.charAt(enc4);
} // Whend

return output;
} // End Function encode

// public method for decoding
,decode: function (input)
{
var output = "";
var chr1, chr2, chr3;
var enc1, enc2, enc3, enc4;
var i = 0;

input = input.replace(/[^A-Za-z0-9\+\/\=]/g, "");
while (i < input.length)
{
enc1 = this._keyStr.indexOf(input.charAt(i++));
enc2 = this._keyStr.indexOf(input.charAt(i++));
enc3 = this._keyStr.indexOf(input.charAt(i++));
enc4 = this._keyStr.indexOf(input.charAt(i++));

chr1 = (enc1 << 2) | (enc2 >> 4);
chr2 = ((enc2 & 15) << 4) | (enc3 >> 2);
chr3 = ((enc3 & 3) << 6) | enc4;

output = output + String.fromCharCode(chr1);

if (enc3 != 64)
{
output = output + String.fromCharCode(chr2);
}

if (enc4 != 64)
{
output = output + String.fromCharCode(chr3);
}

} // Whend

output = Base64._utf8_decode(output);

return output;
} // End Function decode

// private method for UTF-8 encoding
,_utf8_encode: function (string)
{
var utftext = "";
string = string.replace(/\r\n/g, "\n");

for (var n = 0; n < string.length; n++)
{
var c = string.charCodeAt(n);

if (c < 128)
{
utftext += String.fromCharCode(c);
}
else if ((c > 127) && (c < 2048))
{
utftext += String.fromCharCode((c >> 6) | 192);
utftext += String.fromCharCode((c & 63) | 128);
}
else
{
utftext += String.fromCharCode((c >> 12) | 224);
utftext += String.fromCharCode(((c >> 6) & 63) | 128);
utftext += String.fromCharCode((c & 63) | 128);
}

} // Next n

return utftext;
} // End Function _utf8_encode

// private method for UTF-8 decoding
,_utf8_decode: function (utftext)
{
var string = "";
var i = 0;
var c, c1, c2, c3;
c = c1 = c2 = 0;

while (i < utftext.length)
{
c = utftext.charCodeAt(i);

if (c < 128)
{
string += String.fromCharCode(c);
i++;
}
else if ((c > 191) && (c < 224))
{
c2 = utftext.charCodeAt(i + 1);
string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
i += 2;
}
else
{
c2 = utftext.charCodeAt(i + 1);
c3 = utftext.charCodeAt(i + 2);
string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
i += 3;
}

} // Whend

return string;
} // End Function _utf8_decode

}

https://www.fileformat.info/info/unicode/utf8.htm

  • For any character equal to or below 127 (hex 0x7F), the UTF-8
    representation is one byte. It is just the lowest 7 bits of the full
    unicode value. This is also the same as the ASCII value.

  • For characters equal to or below 2047 (hex 0x07FF), the UTF-8
    representation is spread across two bytes. The first byte will have
    the two high bits set and the third bit clear (i.e. 0xC2 to 0xDF). The
    second byte will have the top bit set and the second bit clear (i.e.
    0x80 to 0xBF).

  • For all characters equal to or greater than 2048 but less than 65535
    (0xFFFF), the UTF-8 representation is spread across three bytes.

Why won't window.btoa work on – ” characters in Javascript?

btoa is an exotic function in that it requires a "Binary String", which is an 8-bit clean string format. It doesn't work with unicode values above charcode 255, such as used by your em dash and "fancy" quote symbol.

You'll either have to turn your string into a new string that conforms to single byte packing (and then manually reconstitute the result of the associated atob), or you can uri encode the data first, making it safe:

> var str = `Hello – World`;
> window.btoa(encodeURIComponent(str));
"SGVsbG8lMjAlRTIlODAlOTMlMjBXb3JsZA=="

And then remember to decode it again when unpacking:

> var base64= "SGVsbG8lMjAlRTIlODAlOTMlMjBXb3JsZA==";
> decodeURIComponent(window.atob(base64));
"Hello – World"

Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings

The Unicode Problem

Though JavaScript (ECMAScript) has matured, the fragility of Base64, ASCII, and Unicode encoding has caused a lot of headache (much of it is in this question's history).

Consider the following example:

const ok = "a";
console.log(ok.codePointAt(0).toString(16)); // 61: occupies < 1 byte

const notOK = "✓"
console.log(notOK.codePointAt(0).toString(16)); // 2713: occupies > 1 byte

console.log(btoa(ok)); // YQ==
console.log(btoa(notOK)); // error

Why do we encounter this?

Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data.

Source: MDN (2021)

The original MDN article also covered the broken nature of window.btoa and .atob, which have since been mended in modern ECMAScript. The original, now-dead MDN article explained:

The "Unicode Problem"
Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF).



Solution with binary interoperability

(Keep scrolling for the ASCII base64 solution)

Source: MDN (2021)

The solution recommended by MDN is to actually encode to and from a binary string representation:

Encoding UTF8 ⇢ binary

// convert a Unicode string to a string in which
// each 16-bit unit occupies only one byte
function toBinary(string) {
const codeUnits = new Uint16Array(string.length);
for (let i = 0; i < codeUnits.length; i++) {
codeUnits[i] = string.charCodeAt(i);
}
return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}

// a string that contains characters occupying > 1 byte
let encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="

Decoding binary ⇢ UTF-8

function fromBinary(encoded) {
const binary = atob(encoded);
const bytes = new Uint8Array(binary.length);
for (let i = 0; i < bytes.length; i++) {
bytes[i] = binary.charCodeAt(i);
}
return String.fromCharCode(...new Uint16Array(bytes.buffer));
}

// our previous Base64-encoded string
let decoded = fromBinary(encoded) // "✓ à la mode"

Where this fails a little, is that you'll notice the encoded string EycgAOAAIABsAGEAIABtAG8AZABlAA== no longer matches the previous solution's string 4pyTIMOgIGxhIG1vZGU=. This is because it is a binary encoded string, not a UTF-8 encoded string. If this doesn't matter to you (i.e., you aren't converting strings represented in UTF-8 from another system), then you're good to go. If, however, you want to preserve the UTF-8 functionality, you're better off using the solution described below.



Solution with ASCII base64 interoperability

The entire history of this question shows just how many different ways we've had to work around broken encoding systems over the years. Though the original MDN article no longer exists, this solution is still arguably a better one, and does a great job of solving "The Unicode Problem" while maintaining plain text base64 strings that you can decode on, say, base64decode.org.

There are two possible methods to solve this problem:

  • the first one is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;
  • the second one is to convert the UTF-16 DOMString to an UTF-8 array of characters and then encode it.

A note on previous solutions: the MDN article originally suggested using unescape and escape to solve the Character Out Of Range exception problem, but they have since been deprecated. Some other answers here have suggested working around this with decodeURIComponent and encodeURIComponent, this has proven to be unreliable and unpredictable. The most recent update to this answer uses modern JavaScript functions to improve speed and modernize code.

If you're trying to save yourself some time, you could also consider using a library:

  • js-base64 (NPM, great for Node.js)
  • base64-js

Encoding UTF8 ⇢ base64

    function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}

b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="

Decoding base64 ⇢ UTF8

    function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}

b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"

(Why do we need to do this? ('00' + c.charCodeAt(0).toString(16)).slice(-2) prepends a 0 to single character strings, for example when c == \n, the c.charCodeAt(0).toString(16) returns a, forcing a to be represented as 0a).



TypeScript support

Here's same solution with some additional TypeScript compatibility (via @MA-Maddin):

// Encoding UTF8 ⇢ base64

function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode(parseInt(p1, 16))
}))
}

// Decoding base64 ⇢ UTF8

function b64DecodeUnicode(str) {
return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
}).join(''))
}


The first solution (deprecated)

This used escape and unescape (which are now deprecated, though this still works in all modern browsers):

function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}

function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}

// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"

And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source. Whether or not this is still relevant in 2021, I don't know:

function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(escape(window.atob( str )));
}

How to base64 encode emojis in JavaScript?

You can encode it by escaping it first and then calling EncodeUriComponent on it.

This looks like this:

btoa(unescape(encodeURIComponent('')));

Convert binary data to base64 does not work with btoa unescape

You've said you're using axios.get to get the image from the server. What you'll presumably get back will be a Buffer or ArrayBuffer or Blob, etc., but it depends on what you do with the response you get from axios.

I don't use axios (never felt the need to), but you can readily get a data URI for binary data from the server via fetch:

// 1.
const response = await fetch("/path/to/resource");
// 2.
if (!response.ok) {
throw new Error("HTTP error " + response.status);
}
// 3.
const buffer = await response.arrayBuffer();
// 4.
const byteArray = new Uint8Array(buffer);
// 5.
const charArray = Array.from(byteArray, byte => String.fromCharCode(byte));
// 6.
const binaryString = charArray.join("");
// 7.
const theImage = btoa(binaryString);

Or more concisely:

const response = await fetch("/path/to/resource");
if (!response.ok) {
throw new Error("HTTP error " + response.status);
}
const buffer = await response.arrayBuffer();
const binaryString = Array.from(new Uint8Array(buffer), byte => String.fromCharCode(byte)).join("");
const theImage = btoa(binaryString);

Here's how that works:

  1. We request the image data.
  2. We check that the request worked (fetch only rejects its promise on network errors, not HTTP errors; those are reported via the status and ok props.
  3. We read the body of the response into an ArrayBuffer; the buffer will have the binary image data.
  4. We want to convert that buffer data into a binary string. To do that, we need to access the bytes individually, so we create a Uint8Array (using that buffer) to access them.
  5. To convert that byte array into a binary string, we need to convert each byte into its equivalent character, and join those together into a string. Let's do that by using Array.from and in its mapping function (called for each byte), we'll use String.fromCharCode to convert the byte to a character. (It's not really much of a conversion. The byte 25 [for instance] becomes the character with character code 25 in the string.)
  6. Now we create the binary string by joining the characters in that array together into one string.
  7. Finally, we convert that string to Base64.

Looking at the docs, it looks like axios lets you provide the option responseType: "arraybuffer" to get an array buffer. If I'm reading right, you could use axios like this:

const response = await axios.get("/path/to/resource", {responseType: "arraybuffer"});
const binaryString = Array.from(new Uint8Array(response.body), v => String.fromCharCode(v)).join("");
const theImage = btoa(binaryString);


Related Topics



Leave a reply



Submit