Maximum Length of Generated Hash When Using Password_Hash

Maximum length of generated hash when using password_hash?

From the password_hash documentation:

The following algorithms are currently supported:

PASSWORD_DEFAULT - Use the bcrypt algorithm (default as of PHP 5.5.0).
Note that this constant is designed to change over time as new and
stronger algorithms are added to PHP. For that reason, the length of
the result from using this identifier can change over time. Therefore,
it is recommended to store the result in a database column that can
expand beyond 60 characters (255 characters would be a good choice).

PASSWORD_BCRYPT - Use the CRYPT_BLOWFISH algorithm to create the hash.
This will produce a standard crypt() compatible hash using the "$2y$"
identifier. The result will always be a 60 character string, or FALSE
on failure.

Therefore, using PASSWORD_BCRYPT, the result of password_hash will be a 60 character string.

PHP password_hash() password_verify() maximum password length?

Ok, let's go through this.

The function does have a password length limit. Just like all strings in PHP, it is limited to 2^31-1 bytes.

To be clear, there's no way for PHP to deal with anything larger than that (today at least).

So the function itself is limited. But what about the underlying crypto algorithms.

BCrypt is limited to processing the first 72 characters of password. However, this is not commonly a problem as explained in this answer.

So in short, yes it does have an effective limit (it will only "use" the first 72 chars with the default and only algorithm), And no this is not a problem and nor should you try to "fix" or "mitigate" it.

What is the maximum length of hashed passwords using the HMACSHA1 algorithm

The output of PBKDF2 can be specified. A PBKDF is a password based key derivation function. Generally those have a key expansion phase that allows the output to be specified.

However, if PBKDF2 is used as password hash rather than for key derivation the size of the configured hash is kept; that provides the maximum security that can be retrieved from the algorithm. In this case that's SHA-1 that generates 160 bits / 20 bytes.

Unless you really need text, the output can be stored as static binary of 20 bytes. In your case you should be storing it as base 64 version of the 20 bytes. That should amount to a fixed 28 bytes: ((20 + 2) / 3) * 4 = 28 to calculate the base 64 expansion. However, your code explicitly specifies the output size to be 256 / 8 = 64 bytes. A quick calculation suggests that it always uses 88 base 64 characters for that size.

Producing 64 bytes while using SHA-1 is not a good setting because it requires the inner function of PBKDF2 to run 4 times, giving you no advantage of running it only once to produce 20 bytes, giving advantage to an attacker. An attacker only has to check the first 20 bytes to make sure a password matches, after all, and for that only one of the four runs is required. The method that PBKDF2 uses to expand the key size over the hash size is really inefficient and may be considered a design flaw.

On the other hand, 10.000 iterations is not very high. You should, for PBKDF2:

specify the output size of the underlying hash as output size (20 bytes instead of 64 bytes for SHA-1) and
use a higher number of iterations (limited by how much CPU time you can spend in PBKDF2).

The size of the password doesn't have any influence on the size of the password hash.

Beware that some password hashes on other runtimes create a password hash string themselves, more compatible with crypt on Unix systems. So they would have a larger output that is not directly compatible.

How to hash long passwords (72 characters) with blowfish

The problem here is basically a problem of entropy. So let's start looking there:

Entropy Per Character

The number of bits of entropy per byte are:

Hex Characters
- Bits: 4
- Values: 16
- Entropy In 72 Chars: 288 bits
Alpha-Numeric
- Bits: 6
- Values: 62
- Entropy In 72 Chars: 432 bits
"Common" Symbols
- Bits: 6.5
- Values: 94
- Entropy In 72 Chars: 468 bits
Full Bytes
- Bits: 8
- Values: 255
- Entropy In 72 Chars: 576 bits

So, how we act depends on what type of characters we expect.

The First Problem

The first problem with your code, is that your "pepper" hash step is outputting hex characters (since the fourth parameter to hash_hmac() is not set).

Therefore, by hashing your pepper in, you're effectively cutting the maximum entropy available to the password by a factor of 2 (from 576 to 288 possible bits).

The Second Problem

However, sha256 only provides 256 bits of entropy in the first place. So you're effectively cutting a possible 576 bits down to 256 bits. Your hash step * immediately*, by very definition loses
at least 50% of the possible entropy in the password.

You could partially solve this by switching to SHA512, where you'd only reduce the available entropy by about 12%. But that's still a not-insignificant difference. That 12% reduces the number of permutations by a factor of 1.8e19. That's a big number... And that's the factor it reduces it by...

The Underlying Issue

The underlying issue is that there are three types of passwords over 72 characters. The impact that this style system has on them will be very different:

Note: from here on out I'm assuming we're comparing to a pepper system which uses SHA512 with raw output (not hex).

High entropy random passwords
These are your users using password generators which generate what amount to large keys for passwords. They are random (generated, not human chosen), and have high entropy per character. These types are using high-bytes (characters > 127) and some control characters.
For this group, your hashing function will significantly reduce their available entropy into bcrypt.
Let me say that again. For users who are using high entropy, long passwords, your solution significantly reduces the strength of their password by a measurable amount. (62 bits of entropy lost for a 72 character password, and more for longer passwords)
Medium entropy random passwords
This group is using passwords containing common symbols, but no high bytes or control characters. These are your typable passwords.
For this group, you are going to slightly unlock more entropy (not create it, but allow more entropy to fit into the bcrypt password). When I say slightly, I mean slightly. The break-even occurs when you max out the 512 bits that SHA512 has. Therefore, the peak is at 78 characters.
Let me say that again. For this class of passwords, you can only store an additional 6 characters before you run out of entropy.
Low entropy non-random passwords
This is the group who are using alpha-numeric characters that are probably not randomly generated. Something like a bible quote or such. These phrases have approximately 2.3 bits of entropy per character.
For this group, you can significantly unlock more entropy (not create it, but allow more to fit into the bcrypt password input) by hashing. The breakeven is around 223 characters before you run out of entropy.
Let's say that again. For this class of passwords, pre-hashing definitely increases security significantly.

Back To The Real World

These kinds of entropy calculations don't really matter much in the real world. What matters is guessing entropy. That's what directly effects what attackers can do. That's what you want to maximize.

While there's little research that's gone into guessing entropy, there are some points that I'd like to point out.

The chances of randomly guessing 72 correct characters in a row are extremely low. You're more likely to win the Powerball lottery 21 times, than to have this collision... That's how big of a number we're talking about.

But we may not stumble on it statistically. In the case of phrases the chance of the first 72 characters being the same is a whole lot higher than for a random password. But it's still trivially low (you're more likely to win the Powerball lottery 5 times, based on 2.3 bits per character).

Practically

Practically, it doesn't really matter. The chances of someone guessing the first 72 characters right, where the latter ones make a significant difference are so low that it's not worth worrying about. Why?

Well, let's say you're taking a phrase. If the person can get the first 72 characters right, they are either really lucky (not likely), or it's a common phrase. If it's a common phrase, the only variable is how long to make it.

Let's take an example. Let's take a quote from the bible (just because it's a common source of long text, not for any other reason):

You shall not covet your neighbor's house. You shall not covet your neighbor's wife, or his manservant or maidservant, his ox or donkey, or anything that belongs to your neighbor.

That's 180 characters. The 73rd character is the g in the second neighbor's. If you guessed that much, you're likely not stopping at nei, but continuing with the rest of the verse (since that's how the password is likely to be used). Therefore, your "hash" didn't add much.

BTW: I am ABSOLUTELY NOT advocating using a bible quote. In fact, the exact opposite.

Conclusion

You're not really going to help people much who use long passwords by hashing first. Some groups you can definitely help. Some you can definitely hurt.

But in the end, none of it is overly significant. The numbers we are dealing with are just WAY too high. The difference in entropy isn't going to be much.

You're better off leaving bcrypt as it is. You're more likely to screw up the hashing (literally, you've done it already, and you're not the first, or last to make that mistake) than the attack you're trying to prevent is going to happen.

Focus on securing the rest of the site. And add a password entropy meter to the password box on registration to indicate password strength (and indicate if a password is overlong that the user may wish to change it)...

That's my $0.02 at least (or possibly way more than $0.02)...

As Far As Using A "Secret" Pepper:

There is literally no research into feeding one hash function into bcrypt. Therefore, it's unclear at best if feeding a "peppered" hash into bcrypt will ever cause unknown vulnerabilities (we know doing hash1(hash2($value)) can expose significant vulnerabilities around collision resistance and preimage attacks).

Considering that you're already considering storing a secret key (the "pepper"), why not use it in a way that's well studied and understood? Why not encrypt the hash prior to storing it?

Basically, after you hash the password, feed the entire hash output into a strong encryption algorithm. Then store the encrypted result.

Now, an SQL-Injection attack will not leak anything useful, because they don't have the cipher key. And if the key is leaked, the attackers are no better off than if you used a plain hash (which is provable, something with the pepper "pre-hash" doesn't provide).

Note: if you choose to do this, use a library. For PHP, I strongly recommend Zend Framework 2's Zend\Crypt package. It's actually the only one I'd recommend at this current point in time. It's been strongly reviewed, and it makes all the decisions for you (which is a very good thing)...

Something like:

use Zend\Crypt\BlockCipher;

public function createHash($password) {
    $hash = password_hash($password, PASSWORD_BCRYPT, ["cost"=>$this->cost]);

    $blockCipher = BlockCipher::factory('mcrypt', array('algo' => 'aes'));
    $blockCipher->setKey($this->key);
    return $blockCipher->encrypt($hash);
}

public function verifyHash($password, $hash) {
    $blockCipher = BlockCipher::factory('mcrypt', array('algo' => 'aes'));
    $blockCipher->setKey($this->key);
    $hash = $blockCipher->decrypt($hash);

    return password_verify($password, $hash);
}

And it's beneficial because you're using all of the algorithms in ways that are well understood and well studied (relatively at least). Remember:

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break.

Bruce Schneier

PHP password_hash function salt length 21 or 22?

First, please don't provide your own salt. You're not going to do a better job generating it than the library does. And using static salts (like you did in the example) will compromise security. Just let it generate its own salt (incidentally, I believe letting a salt in is the biggest mistake I made in that API).

As far as 21 vs 22 characters, give this answer a read.

Basically, the salt is base64 encoded. This means that every 6 bits of the salt is encoded into 8 bits. So every byte of encoded salt is 6 bits.

21 characters is 126 bits. That means that only part of the 22nd character is used (the first 2 decoded bits). The reason you get the same hash with A and B, is that the leading 2 bits are the same for both characters.

In fact, there are only 4 unique hashes for the 22nd byte.

How to hash a token of 1000 characters or more using password_hash function in php?

If you read the documentation of password_hash, you will see this:

CAUTION Using the PASSWORD_BCRYPT as the algorithm, will result in the password parameter being truncated to a maximum length of 72 characters.

This only applies for PASSWORD_BCRYPT, which is the default hashing algo for password_hash. If you use PASSWORD_ARGON2I, this limitation doesn't exists.

But you should keep in mind, that PASSWORD_ARGON2I only exists in PHP7.2 and newer.

How to use PHP's password_hash to hash and verify passwords

Using password_hash is the recommended way to store passwords. Don't separate them to DB and files.

Let's say we have the following input:

$password = $_POST['password'];

You first hash the password by doing this:

$hashed_password = password_hash($password, PASSWORD_DEFAULT);

Then see the output:

var_dump($hashed_password);

As you can see it's hashed. (I assume you did those steps).

Now you store this hashed password in your database, ensuring your password column is large enough to hold the hashed value (at least 60 characters or longer). When a user asks to log them in, you check the password input with this hash value in the database, by doing this:

// Query the database for username and password
// ...

if(password_verify($password, $hashed_password)) {
    // If the password inputs matched the hashed password in the database
    // Do something, you know... log them in.
} 

// Else, Redirect them back to the login page.

Official Reference

Is PHP's password_hash() Backwards Compatible?

The short answer is yes, it is backwards compatible, providing your storage allows for longer password hash lengths.

When you generate a password you will get it in a specific format.

I've generated two passwords using bcrypt and the newer argon2i hash. (Coming in PHP 7.2 with the introduction of libsodium)

FYI: My code here is based upon the libsodium extension since I haven't got php 7.2 downloaded, and wasn't going to install sodium_compat just for the aliases. The php 7.2 syntax will not contain any namespaces.

$bcrypt = password_hash('insecurepassword', PASSWORD_BCRYPT);
$argon2i = \Sodium\crypto_pwhash_str('insecurepassword', \Sodium\CRYPTO_PWHASH_OPSLIMIT_INTERACTIVE, \Sodium\CRYPTO_PWHASH_MEMLIMIT_INTERACTIVE);

var_dump($bcrypt, $argon2i);

The output I got when I ran that was this

string(60) "$2y$10$jiT0NF3u426kguHes8ZBputRE/n9OSdPi5HhHvEWW4mX1XDwKwy1e"
string(96) "$argon2i$v=19$m=32768,t=4,p=1$Ho4Vzgp5nzQkLlp99P+ViA$bDqX8UUlSnfLRCfFBzBnFhWr/hzHzuuUCfZ0LSIns64"

Both passwords are in the same format more or less. If you explode the $ you end up with each piece of the puzzle required to verify the password.

The first section contains the algorithm. 2y for bcrypt, argon2i for... argon2i. (surprise!)

The next bits contain the configuration options, so in bcrypt's case the cost. In argon2i's case, there's some extra configuration.

The final section contains the actual password and salt for the algorithm to check against.

For a bit more of a visual breakdown checkout the php docs.

Even though I've used some different functions to demo some output, there is an rfc that was accepted that introduced support for the argon2i hash to the password_hash functions in 7.2.

So when that time comes, password_verify will "just work" regardless of if you give it a bcrypt or argon2i hash to verify a password against.

Password max length with bcrypt, blowfish

Question 1: There is simply no reason to limit the password length, BCrypt will work with much longer passwords, though only 72 characters are used. You would have no advantage at all, but theoretically you would hinder people with password managers using longer passwords. If you switch to another algorithm in future, the limit would probably be different, so no reason to limit with 72 characters.

Question 2: Instead of peppering you could better use another approach: Encrypt the password-hash with a server-side key. The reason to add a pepper is, that an attacker must gain privileges on the server, because without the key he cannot start brute-forcing the hashes (SQL-injection or thrown away backups of the database won't do). The same advantage you get with encrypting (two-way) the hash.

This way you do not need to reserve characters for the pepper, you can use all 72 characters from the password.
In contrast to the pepper, the server side key can be exchanged whenever this is necessary. A pepper actually becomes part of the password and cannot be changed until the next login.
Another discussed point is, that a pepper could theoretically interfere with the hash-algorithm.

What column type/length should I use for storing a Bcrypt hashed password in a Database?

The modular crypt format for bcrypt consists of

$2$ , $2a$ or $2y$ identifying the hashing algorithm and format
a two digit value denoting the cost parameter, followed by $
a 53 characters long base-64-encoded value (they use the alphabet ., /, 0–9, A–Z, a–z that is different to the standard Base 64 Encoding alphabet) consisting of:
- 22 characters of salt (effectively only 128 bits of the 132 decoded bits)
- 31 characters of encrypted output (effectively only 184 bits of the 186 decoded bits)

Thus the total length is 59 or 60 bytes respectively.

As you use the 2a format, you’ll need 60 bytes. And thus for MySQL I’ll recommend to use the CHAR(60) BINARYor BINARY(60) (see The _bin and binary Collations for information about the difference).

CHAR is not binary safe and equality does not depend solely on the byte value but on the actual collation; in the worst case A is treated as equal to a. See The _bin and binary Collations for more information.

Maximum Length of Generated Hash When Using Password_Hash