How to Check for a Valid Base64 Encoded String

How to check for a valid Base64 encoded string

Update: For newer versions of C#, there's a much better alternative, please refer to the answer by Tomas here: https://stackoverflow.com/a/54143400/125981.


It's pretty easy to recognize a Base64 string, as it will only be composed of characters 'A'..'Z', 'a'..'z', '0'..'9', '+', '/' and it is often padded at the end with up to three '=', to make the length a multiple of 4. But instead of comparing these, you'd be better off ignoring the exception, if it occurs.

How to check whether a string is Base64 encoded or not

You can use the following regular expression to check if a string constitutes a valid base64 encoding:

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$

In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /]. If the rest length is less than 4, the string is padded with '=' characters.

^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.

([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$ means the string ends in one of three forms: [A-Za-z0-9+/]{4}, [A-Za-z0-9+/]{3}= or [A-Za-z0-9+/]{2}==.

check the string is Base64 encoded in PowerShell

The following returns $true if $item contains a valid Base64-encoded string, and $false otherwise:

try { $null=[Convert]::FromBase64String($item); $true } catch { $false }
  • The above uses System.Convert.FromBase64String to try to convert input string $item to the array of bytes it represents.

  • If the call succeeds, the output byte array is ignored ($null = ...), and $true is output.

  • Otherwise, the catch block is entered and $false is returned.

Caveat: Even regular strings can accidentally be technically valid Base64-encoded strings, namely if they happen to contain only characters from the Base64 character set and the character count is a multiple of 4.
For instance, the above test yields $true for "word" (only Base64 chars., and a multiple of 4), but not for "words" (not multiple of 4 chars.)


For example, in the context of an if statement:

  • Note: In order for a try / catch statement to serve as an expression in the if conditional, $(), the subexpression operator, must be used.
# Process 2 sample strings, one Base64-encoded, the other not.
foreach ($item in 'foo', 'SGFwcHkgSG9saWRheXM=') {

if ($(try { $null=[Convert]::FromBase64String($item); $true } catch { $false })) {
'Base64-encoded: [{0}]; decoded as UTF-8: [{1}]' -f
$item,
[Text.Encoding]::UTF8.GetString([Convert]::FromBase64String($item))
}
else {
'NOT Base64-encoded: [{0}]' -f $item
}

}

The above yields:

NOT Base64-encoded: [foo]
Base64-encoded: [SGFwcHkgSG9saWRheXM=]; decoded as UTF-8: [Happy Holidays]

It's easy to wrap the functionality in a custom helper function, Test-Base64:

# Define function.
# Accepts either a single string argument or multiple strings via the pipeline.
function Test-Base64 {
param(
[Parameter(ValueFromPipeline)]
[string] $String
)
process {
try { $null=[Convert]::FromBase64String($String); $true } catch { $false }
}
}

# Test two sample strings.
foreach ($item in 'foo', 'SGFwcHkgSG9saWRheXM=') {
if (Test-Base64 $item) {
"YES: $item"
}
else {
"NO: $item"
}
}

For information on converting bytes to and from Base64-encoded strings, see this answer.

How to check whether a string is base64 encoded or not?

If you receive the exact value by <img src="..." /> attribute then it should have Data URL format

The simple regexp could determine whether the URL is Data or regular. In java it can look like

    private static final Pattern DATA_URL_PATTERN = Pattern.compile("^data:image/(.+?);base64,\\s*", Pattern.CASE_INSENSITIVE);

static void handleImgSrc(String path) {
if (path.startsWith("data:")) {
final Matcher m = DATA_URL_PATTERN.matcher(path);
if (m.find()) {
String imageType = m.group(1);
String base64 = path.substring(m.end());
// decodeImage(imageType, base64);
} else {
// some logging
}
} else {
// downloadImage(path);
}
}

Valid Base64 string can't be decoded

As Base64 string maps each byte 6 bits to 8 bits so each 3 bytes (24 bits) become 4 bytes.
Base64 string length must be divisible to 4, if not as many = characters as needed are added to the end of it (which is actually not part of its content) to make the length divisible to 4.

As your Base64 string length is already divisble by 4, there is no need for extra = characters.

Determine if string is in base64 using JavaScript

If "valid" means "only has base64 chars in it" then check against /[A-Za-z0-9+/=]/.

If "valid" means a "legal" base64-encoded string then you should check for the = at the end.

If "valid" means it's something reasonable after decoding then it requires domain knowledge.

Is there a bulletproof way to detect base64 encoding in a string in php?

I will post Yoshi's comment as the final conclusion:

I think you're out of luck. The false positives you mention, still are valid base64 encodings. You'd need to judge whether the decoded version makes any sense, but that will probably be a never ending story, and ultimately would probably also result in false positives. – Yoshi

How to detect true base64 on PHP

Since base64 is a mapping from 8 bit to 6 bit representation of data. You have just the following options:

  • Look for non-ASCII chars (other than A-Z, a-z, 0-9, +, /) and paddings
  • Look for the number of characters (it must be dividable by three).

By this way, you can check whether the data is not base64 encoded. But you cannot check whether the data is real base64, since it can be a normal string passing the requirements of base64 encoding.

On the other hand, if you know the structure of the data, it is possible to check that the decoding of base64 text fits the structure.



Related Topics



Leave a reply



Submit