Error When Decoding Certain Base64 Strings, But Not Others

Error when decoding certain Base64 strings, but not others

The problem is here:

let base64Decoded: NSString = NSString(data: nsdata, encoding: NSUTF8StringEncoding)!

You convert the decoded data to a string. This fails for
[128] because that does not represent a valid UTF-8 sequence.

Here is a version that avoids the intermediate string:

func base64ToByteArray(base64String: String) -> [UInt8] {
    let nsdata: NSData = NSData(base64EncodedString: base64String, options: NSDataBase64DecodingOptions(rawValue: 0))!
    // Create array of the required size ...
    var bytes = [UInt8](count: nsdata.length, repeatedValue: 0)
    // ... and fill it with the data
    nsdata.getBytes(&bytes)
    return bytes
}

Remarks:

options: NSDataBase64DecodingOptions(rawValue: 0) can be simplified
to options: nil.
There are some unnecessary type annotations and conversions in your code.
Your function crashes if baseString is not a valid Base64 string.
You could change it to return an optional.

Then it would look like this:

func byteArrayToBase64(bytes: [UInt8]) -> String {
    let nsdata = NSData(bytes: bytes, length: bytes.count)
    let base64Encoded = nsdata.base64EncodedStringWithOptions(nil);
    return base64Encoded;
}

func base64ToByteArray(base64String: String) -> [UInt8]? {
    if let nsdata = NSData(base64EncodedString: base64String, options: nil) {
        var bytes = [UInt8](count: nsdata.length, repeatedValue: 0)
        nsdata.getBytes(&bytes)
        return bytes
    }
    return nil // Invalid input
}

Example usage:

let testString = byteArrayToBase64([127, 128, 0, 130]);
println(testString) // Output: f4AAgg==
if let result = base64ToByteArray(testString) {
    println(result) // Output: [127, 128, 0, 130]
} else {
    println("failed")
}

Update for Swift 2 / Xcode 7:

func byteArrayToBase64(bytes: [UInt8]) -> String {
    let nsdata = NSData(bytes: bytes, length: bytes.count)
    let base64Encoded = nsdata.base64EncodedStringWithOptions([]);
    return base64Encoded;
}

func base64ToByteArray(base64String: String) -> [UInt8]? {
    if let nsdata = NSData(base64EncodedString: base64String, options: []) {
        var bytes = [UInt8](count: nsdata.length, repeatedValue: 0)
        nsdata.getBytes(&bytes, length: bytes.count)
        return bytes
    }
    return nil // Invalid input
}

let testString = byteArrayToBase64([127, 128, 0, 130]);
print(testString) // Output: f4AAgg==
if let result = base64ToByteArray(testString) {
    print(result) // Output: [127, 128, 0, 130]
} else {
    print("failed")
}

Update for Swift 3 and later:

func byteArrayToBase64(bytes: [UInt8]) -> String {
    let data = Data(bytes)
    let base64Encoded = data.base64EncodedString()
    return base64Encoded;
}

func base64ToByteArray(base64String: String) -> [UInt8]? {
    guard let data = Data(base64Encoded: base64String)  else {
        return nil
    }
    return Array(data)
}

let testString = byteArrayToBase64(bytes: [127, 128, 0, 130]);
print(testString) // Output: f4AAgg==
if let result = base64ToByteArray(base64String: testString) {
    print(result) // Output: [127, 128, 0, 130]
} else {
    print("failed")
}

Strange Base64 encode/decode problem

Whatever populates params expects the request to be a URL-encoded form (specifically, application/x-www-form-urlencoded, where "+" means space), but you didn't URL-encode it. I don't know what functions your language provides, but in pseudo code, queryString should be constructed from

concat(uri_escape("data"), "=", uri_escape(base64_encode(rawBytes)))

which simplifies to

concat("data=", uri_escape(base64_encode(rawBytes)))

The "+" characters will be replaced with "%2B".

How to handle an error while decoding a Base64 string

My apologies. The RFC's on MIME (1341, 1521, 2045) contain the following paragraph, which I could not find until now:

The output stream (encoded bytes) must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances.

In any case, it is appropriate that this question and answer be available ono StackOverflow.

P.S. If there are other standards of Base64 with other guidelines, links and quotes are appropriate in other answers.

python error in decoding base64 string

This worked for me (Python 3). The padding is indeed important, as you've seen in other answers:

import base64
import zlib
import json

s = b'H4sIAAAAAAAAA22PW0/CQBCF/8s81wQosdA3TESJhhhb9cHwMN1O6Ybtbt0LhDT97+5yU4yPc+bMnO90YCyyDaSfHRimieQSG4IUaldABC1qbAykHbQsrzWZWokSUumEiMCQ3nJGCy9ADH0EFvWarJ+eHv11v4qgEIptqHyTlovzWes0q9HQ3X87Lh80Msp5gDhqzGlN0or9B1pWU5ldxV72c2/ODg0C7lUXu/U2p8XLpY35+6Mmtsn4WqLILFrnTRUKQxFwk7+fSL23+zX215VD/jE16CeojIzhSi5kpQ6xzVkIz76wuSmHRVINRuVtheMxDuLJJB5Nk5hRMkriaTGJh8MDn5LWv8v3bejzvFjez15/5EsNbuZo7FzpHepyJoTaBWqrHfX9N0/UAJ7qAQAA.bi0I1YDZ3V6AXu6aYTGO1JWi5tE5CoZli7aa6bFtqM4'

decoded = base64.urlsafe_b64decode(s + b'=')
uncompressed = zlib.decompress(decoded, 16 + zlib.MAX_WBITS)
unjsoned = json.loads(uncompressed.decode('utf-8'))

print(unjsoned)

The zlib.decompress(decoded, 16 + zlib.MAX_WBITS) is a slightly more compact way to un-gzip a byte string.

Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings

The Unicode Problem

Though JavaScript (ECMAScript) has matured, the fragility of Base64, ASCII, and Unicode encoding has caused a lot of headache (much of it is in this question's history).

Consider the following example:

const ok = "a";
console.log(ok.codePointAt(0).toString(16)); //   61: occupies < 1 byte

const notOK = "✓"
console.log(notOK.codePointAt(0).toString(16)); // 2713: occupies > 1 byte

console.log(btoa(ok));    // YQ==
console.log(btoa(notOK)); // error

Why do we encounter this?

Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data.

Source: MDN (2021)

The original MDN article also covered the broken nature of window.btoa and .atob, which have since been mended in modern ECMAScript. The original, now-dead MDN article explained:

The "Unicode Problem"
Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF).

Solution with binary interoperability

(Keep scrolling for the ASCII base64 solution)

Source: MDN (2021)

The solution recommended by MDN is to actually encode to and from a binary string representation:

Encoding UTF8 ⇢ binary

// convert a Unicode string to a string in which
// each 16-bit unit occupies only one byte
function toBinary(string) {
  const codeUnits = new Uint16Array(string.length);
  for (let i = 0; i < codeUnits.length; i++) {
    codeUnits[i] = string.charCodeAt(i);
  }
  return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}

// a string that contains characters occupying > 1 byte
let encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="

Decoding binary ⇢ UTF-8

function fromBinary(encoded) {
  const binary = atob(encoded);
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < bytes.length; i++) {
    bytes[i] = binary.charCodeAt(i);
  }
  return String.fromCharCode(...new Uint16Array(bytes.buffer));
}

// our previous Base64-encoded string
let decoded = fromBinary(encoded) // "✓ à la mode"

Where this fails a little, is that you'll notice the encoded string EycgAOAAIABsAGEAIABtAG8AZABlAA== no longer matches the previous solution's string 4pyTIMOgIGxhIG1vZGU=. This is because it is a binary encoded string, not a UTF-8 encoded string. If this doesn't matter to you (i.e., you aren't converting strings represented in UTF-8 from another system), then you're good to go. If, however, you want to preserve the UTF-8 functionality, you're better off using the solution described below.

Solution with ASCII base64 interoperability

The entire history of this question shows just how many different ways we've had to work around broken encoding systems over the years. Though the original MDN article no longer exists, this solution is still arguably a better one, and does a great job of solving "The Unicode Problem" while maintaining plain text base64 strings that you can decode on, say, base64decode.org.

There are two possible methods to solve this problem:

the first one is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;
the second one is to convert the UTF-16 DOMString to an UTF-8 array of characters and then encode it.

A note on previous solutions: the MDN article originally suggested using unescape and escape to solve the Character Out Of Range exception problem, but they have since been deprecated. Some other answers here have suggested working around this with decodeURIComponent and encodeURIComponent, this has proven to be unreliable and unpredictable. The most recent update to this answer uses modern JavaScript functions to improve speed and modernize code.

If you're trying to save yourself some time, you could also consider using a library:

js-base64 (NPM, great for Node.js)
base64-js

Encoding UTF8 ⇢ base64

    function b64EncodeUnicode(str) {
        // first we use encodeURIComponent to get percent-encoded UTF-8,
        // then we convert the percent encodings into raw bytes which
        // can be fed into btoa.
        return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
            function toSolidBytes(match, p1) {
                return String.fromCharCode('0x' + p1);
        }));
    }
    
    b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
    b64EncodeUnicode('\n'); // "Cg=="

Decoding base64 ⇢ UTF8

    function b64DecodeUnicode(str) {
        // Going backwards: from bytestream, to percent-encoding, to original string.
        return decodeURIComponent(atob(str).split('').map(function(c) {
            return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
        }).join(''));
    }
    
    b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
    b64DecodeUnicode('Cg=='); // "\n"

(Why do we need to do this? ('00' + c.charCodeAt(0).toString(16)).slice(-2) prepends a 0 to single character strings, for example when c == \n, the c.charCodeAt(0).toString(16) returns a, forcing a to be represented as 0a).

TypeScript support

Here's same solution with some additional TypeScript compatibility (via @MA-Maddin):

// Encoding UTF8 ⇢ base64

function b64EncodeUnicode(str) {
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
        return String.fromCharCode(parseInt(p1, 16))
    }))
}

// Decoding base64 ⇢ UTF8

function b64DecodeUnicode(str) {
    return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
    }).join(''))
}

The first solution (deprecated)

This used escape and unescape (which are now deprecated, though this still works in all modern browsers):

function utf8_to_b64( str ) {
    return window.btoa(unescape(encodeURIComponent( str )));
}

function b64_to_utf8( str ) {
    return decodeURIComponent(escape(window.atob( str )));
}

// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"

And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source. Whether or not this is still relevant in 2021, I don't know:

function b64_to_utf8( str ) {
    str = str.replace(/\s/g, '');    
    return decodeURIComponent(escape(window.atob( str )));
}

Error while decoding base64 string to UIImage

So i finally found the problem. Data class was not finding the base64Encoded inializer, and xcode was getting confused. As Data class is in the Foundation Framework. so what i did was.

  if let decodedImage =  Foundation.Data(base64Encoded: 
  "your base64 String", options: .ignoreUnknownCharacters){
            let image = UIImage(data: decodedImage)
   }

i directly referred it to the Foundation Framework, and it's working like a chram.

Handling errors from base64 decode in Go

Look at the error type. For example,

package main

import (
    "encoding/base64"
    "fmt"
)

func main() {
    encoded := "XXXXXaGVsbG8=" // corrupt
    decoded, err := base64.StdEncoding.DecodeString(encoded)
    if err != nil {
        if _, ok := err.(base64.CorruptInputError); ok {
            panic("\nbase64 input is corrupt, check service Key")
        }
        panic(err)
    }
    fmt.Println(string(decoded))
}

Output:

panic: 
base64 input is corrupt, check service Key

Unable to decode Base-64 URLs

To recap, I was creating a URL that used a base-64 encoded parameter which was itself a Triple DES encrypted string. So the URL looked like http://[Domain_Name]/SDN1S2JkeXpyVW890 The link referenced a controller action on an MVC web site.

The URL was then inserted into an HTML formatted email. Looking at the error log, we saw that around 5% of the public users that responded to the link were throwing an "invalid base-64 string error". Most, but not all, of these errors were related to the IE6 user agent.

After trying many possible solutions based around character and URL encoding, it was discovered that somewhere in the client's process the url was being converted to lower-case - this, of course, broke the base-64 encoding (as it is uses both upper and lower case encoding characters).

Whether the case corruption was caused by the client's browser, email client or perhaps local anti-virus software, I have not been able to determine.

The Solution
Do not use any of the standard base-64 encoding methods, instead use a base-32 or zBase-32 encoding instead - both of which are case-insensitive.

See the following links for more details

Base-32 - Wikipedia

MyTenPennies Base-32 .NET Implementation

The moral of the story is, Base-64 URL encoding can be unreliable in some public environments. Base-32, whilst slightly more verbose, is a better choice.

Hope this helps.

Error When Decoding Certain Base64 Strings, But Not Others