Bcrypt Says Long, Similar Passwords Are Equivalent - Problem with Me, the Gem, or the Field of Cryptography

BCrypt says long, similar passwords are equivalent - problem with me, the gem, or the field of cryptography?

The good news is, the mathematical foundations of encryption haven't been dissolved. :)

The bad news is that there's an 8-bit key length limit in bcrypt.c which is silently failing:

uint8_t key_len, salt_len, logr, minor;

Then later:

key_len = strlen(key) + (minor >= 'a' ? 1 : 0);

What you're passing in for encryption is 263 characters, but it winds up thinking it's only 8. So you're getting comparisons on only the very first part of the strings.

However, it works fine for me when I pare down the length of the long_strings, so if you actually do get a problem in the sub-255-total range that may be related to something else.

jBCrypt serious issue with checkpw (return true when it shouldn't?)

Ok, so wording the question gave me enough to actually figure out what I was looking for (hurray for rubber ducking). The field of cryptography is safe for now!

BCrypt implementation XOR using P_orig which is 18 4 bytes integer until it gets to the end, which limits your encryption "key" to 72 bytes. Eveyrything after 72 bytes is ignored (a warning would have been nice).

What appears to be the accepted compromise is not to limit the user's password to 72 characters or less but simply let it pass silently. The idea behind this being that a 72 characters bCrypted password is better than the fast hashing alternative anyway.

Source: BCrypt says long, similar passwords are equivalent - problem with me, the gem, or the field of cryptography?

How to hash long passwords ( 72 characters) with blowfish

The problem here is basically a problem of entropy. So let's start looking there:

Entropy Per Character

The number of bits of entropy per byte are:

Hex Characters
- Bits: 4
- Values: 16
- Entropy In 72 Chars: 288 bits
Alpha-Numeric
- Bits: 6
- Values: 62
- Entropy In 72 Chars: 432 bits
"Common" Symbols
- Bits: 6.5
- Values: 94
- Entropy In 72 Chars: 468 bits
Full Bytes
- Bits: 8
- Values: 255
- Entropy In 72 Chars: 576 bits

So, how we act depends on what type of characters we expect.

The First Problem

The first problem with your code, is that your "pepper" hash step is outputting hex characters (since the fourth parameter to hash_hmac() is not set).

Therefore, by hashing your pepper in, you're effectively cutting the maximum entropy available to the password by a factor of 2 (from 576 to 288 possible bits).

The Second Problem

However, sha256 only provides 256 bits of entropy in the first place. So you're effectively cutting a possible 576 bits down to 256 bits. Your hash step * immediately*, by very definition loses
at least 50% of the possible entropy in the password.

You could partially solve this by switching to SHA512, where you'd only reduce the available entropy by about 12%. But that's still a not-insignificant difference. That 12% reduces the number of permutations by a factor of 1.8e19. That's a big number... And that's the factor it reduces it by...

The Underlying Issue

The underlying issue is that there are three types of passwords over 72 characters. The impact that this style system has on them will be very different:

Note: from here on out I'm assuming we're comparing to a pepper system which uses SHA512 with raw output (not hex).

High entropy random passwords
These are your users using password generators which generate what amount to large keys for passwords. They are random (generated, not human chosen), and have high entropy per character. These types are using high-bytes (characters > 127) and some control characters.
For this group, your hashing function will significantly reduce their available entropy into bcrypt.
Let me say that again. For users who are using high entropy, long passwords, your solution significantly reduces the strength of their password by a measurable amount. (62 bits of entropy lost for a 72 character password, and more for longer passwords)
Medium entropy random passwords
This group is using passwords containing common symbols, but no high bytes or control characters. These are your typable passwords.
For this group, you are going to slightly unlock more entropy (not create it, but allow more entropy to fit into the bcrypt password). When I say slightly, I mean slightly. The break-even occurs when you max out the 512 bits that SHA512 has. Therefore, the peak is at 78 characters.
Let me say that again. For this class of passwords, you can only store an additional 6 characters before you run out of entropy.
Low entropy non-random passwords
This is the group who are using alpha-numeric characters that are probably not randomly generated. Something like a bible quote or such. These phrases have approximately 2.3 bits of entropy per character.
For this group, you can significantly unlock more entropy (not create it, but allow more to fit into the bcrypt password input) by hashing. The breakeven is around 223 characters before you run out of entropy.
Let's say that again. For this class of passwords, pre-hashing definitely increases security significantly.

Back To The Real World

These kinds of entropy calculations don't really matter much in the real world. What matters is guessing entropy. That's what directly effects what attackers can do. That's what you want to maximize.

While there's little research that's gone into guessing entropy, there are some points that I'd like to point out.

The chances of randomly guessing 72 correct characters in a row are extremely low. You're more likely to win the Powerball lottery 21 times, than to have this collision... That's how big of a number we're talking about.

But we may not stumble on it statistically. In the case of phrases the chance of the first 72 characters being the same is a whole lot higher than for a random password. But it's still trivially low (you're more likely to win the Powerball lottery 5 times, based on 2.3 bits per character).

Practically

Practically, it doesn't really matter. The chances of someone guessing the first 72 characters right, where the latter ones make a significant difference are so low that it's not worth worrying about. Why?

Well, let's say you're taking a phrase. If the person can get the first 72 characters right, they are either really lucky (not likely), or it's a common phrase. If it's a common phrase, the only variable is how long to make it.

Let's take an example. Let's take a quote from the bible (just because it's a common source of long text, not for any other reason):

You shall not covet your neighbor's house. You shall not covet your neighbor's wife, or his manservant or maidservant, his ox or donkey, or anything that belongs to your neighbor.

That's 180 characters. The 73rd character is the g in the second neighbor's. If you guessed that much, you're likely not stopping at nei, but continuing with the rest of the verse (since that's how the password is likely to be used). Therefore, your "hash" didn't add much.

BTW: I am ABSOLUTELY NOT advocating using a bible quote. In fact, the exact opposite.

Conclusion

You're not really going to help people much who use long passwords by hashing first. Some groups you can definitely help. Some you can definitely hurt.

But in the end, none of it is overly significant. The numbers we are dealing with are just WAY too high. The difference in entropy isn't going to be much.

You're better off leaving bcrypt as it is. You're more likely to screw up the hashing (literally, you've done it already, and you're not the first, or last to make that mistake) than the attack you're trying to prevent is going to happen.

Focus on securing the rest of the site. And add a password entropy meter to the password box on registration to indicate password strength (and indicate if a password is overlong that the user may wish to change it)...

That's my $0.02 at least (or possibly way more than $0.02)...

As Far As Using A "Secret" Pepper:

There is literally no research into feeding one hash function into bcrypt. Therefore, it's unclear at best if feeding a "peppered" hash into bcrypt will ever cause unknown vulnerabilities (we know doing hash1(hash2($value)) can expose significant vulnerabilities around collision resistance and preimage attacks).

Considering that you're already considering storing a secret key (the "pepper"), why not use it in a way that's well studied and understood? Why not encrypt the hash prior to storing it?

Basically, after you hash the password, feed the entire hash output into a strong encryption algorithm. Then store the encrypted result.

Now, an SQL-Injection attack will not leak anything useful, because they don't have the cipher key. And if the key is leaked, the attackers are no better off than if you used a plain hash (which is provable, something with the pepper "pre-hash" doesn't provide).

Note: if you choose to do this, use a library. For PHP, I strongly recommend Zend Framework 2's Zend\Crypt package. It's actually the only one I'd recommend at this current point in time. It's been strongly reviewed, and it makes all the decisions for you (which is a very good thing)...

Something like:

use Zend\Crypt\BlockCipher;

public function createHash($password) {
    $hash = password_hash($password, PASSWORD_BCRYPT, ["cost"=>$this->cost]);

    $blockCipher = BlockCipher::factory('mcrypt', array('algo' => 'aes'));
    $blockCipher->setKey($this->key);
    return $blockCipher->encrypt($hash);
}

public function verifyHash($password, $hash) {
    $blockCipher = BlockCipher::factory('mcrypt', array('algo' => 'aes'));
    $blockCipher->setKey($this->key);
    $hash = $blockCipher->decrypt($hash);

    return password_verify($password, $hash);
}

And it's beneficial because you're using all of the algorithms in ways that are well understood and well studied (relatively at least). Remember:

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break.

Bruce Schneier

Bcrypt encryption different every time with same input

The bcrypt hash algorithm, by design, generates a different encrypted string every time you call it (it is salted). If you have a plaintext password you want to check, and ciphertext in the database, you should be able to pass those two things to bcrypt.CompareHashAndPassword. Adapting your code:

var result []string
db.Where(&User{Username: strings.ToLower(data.Username)})
        .First(&user)
        .Pluck("password", &result)

encryptionErr := bcrypt.CompareHashAndPassword([]byte(result[0]), []byte(data.Password))

You shouldn't need to call bcrypt.GenerateFromPassword again; as you note, it will generate a different encrypted password and it should be all but impossible to compare the two for equality.

Devise: manually encrypt password and store directly

Good news and bad news.

Good news:

The following works to create your user's password manually.

 pepper = nil
 cost = 10
 encrypted_password = ::BCrypt::Password.create("#{password}#{pepper}", :cost => cost).to_s

You can find your pepper and cost in your devise initializer. This method was confirmed using Devise's "valid_password?" method.

Bad news:

The entire reason I was trying to avoid "User.new(password: password).encrypted_password" was because of speed. It's terribly slow. With all my other pieces of my import task, I've intentionally avoided this.

But as it turns out, the major cost here is not instantiating a User object -- but BCrypt itself. There is very little noticeable speed boost when using BCrypt directly because it's intentionally designed to be slow.

My final answer: suck it up, run the rake script, go find a beverage.

why is password hash different for 2 users with the same password?

What you are looking at here is more than a password hash, there is a lot of metadata about the hash included in those strings. In terms of bcrypt the entire string would be considered the bcrypt hash. Here is what it includes:

$ is the delimiter in bcrypt.

The $2a$ is the bcrypt algorithm that was used.

The $10$ is the cost factor that was used. This is why bcrypt is very popular for storing hashes. Every hash has a complexity/cost associated with it, which you can think of as how quickly it will take a computer to generate this hash. This number is of course relative to the speed of computers, so as computers get faster and faster over the years it will take less and less time to generate a hash with the cost of 10. So next year you increase your cost to 11, then to 12... 13... and so on. This allows your future hashes to remain strong while keeping your older hashes still in valid. Just note that you cannot change the cost of a hash without rehashing the original string.

The $QyrjMQf... is a combination of the salt and the hash. This is a base64 encoded string.

The first 22 characters are the salt.

The remaining characters are the hash when used with the 2a algorithm, cost of 10, and the given salt. The reason for the salt is so an attacker cannot pre compute bcrypt hashes in order to avoid paying the cost of generating them.

In fact this is the answer to your original question: The reason the hashes are different is because if they were the same you would know that anytime you saw the bcrypt string $2a$10$QyrjMQfjgGIb4ymtdKQXI.WObnWK0/CzR6yfb6tlGJy0CsVWY0GzO you would know the password would be abcd. So you could just scan an databases of hashes and quickly find all of the users with the abcd password by looking up that hash.

You cannot do this with bcrypt because $2a$10$dQSPyeQmZCzVUOXQ3rGtZONX6pwvnKSBRmsLnq1t1CsvdOTAMQlem is also abcd. And there are many many many more hashes that will be the result of bcrypt('abcd'). This makes scaning a database for abcd passwords next to impossible.

Bcrypt Says Long, Similar Passwords Are Equivalent - Problem with Me, the Gem, or the Field of Cryptography