What Is Salt and How to Use It

What is SALT and how do i use it?

I am definitely not an expert, but the really short answer is that "salting" a line of text means to stick a few extra characters on the end of it. You could salt "salt" with "abcdefg" to get "saltabcdefg". This might be useful if "salt" happens to be a password that you'd like to make more difficult to guess.

Typically, the password+salt are transformed ('hashed') by some difficult-to-reverse process into a completely different string. This transformed string is then stored as the password, together with the plaintext of the salt, and the original plain text of the password proper is tossed away. When you want to check that someone has input the correct password, you combine whatever they've typed in with the salt that's listed in the password file and then hash the result. If the result matches the password hash you have on record, then you know that they've put in the right password.

Implementing a salt can be as easy as picking a string to serve as the salt and then making sure you keep track of it. But, you could vary the salt with each password, and then you'll have to have a way of keeping track of password+salt combinations as well as generating the variations. Of course, you'll probably also want to hash the password rather than saving the password's plain text, and so you'll have to pick a hash function. At this point, the problem has proceeded from salting proper to implementing a password security scheme.

For PHP, you might want to look at how some of the frameworks have implemented this. Two quick links, for CakePHP and Zend, respectively:

http://www.jotlab.com/2010/04/18/cakephp-rainbow-table-protection-behaviour/

http://www.zimuel.it/blog/2009/07/build-a-secure-login-with-zend-framework/

How does a website know what salt to use when you try to login

This is the usual process of handling passwords:

User registers with a password.
A new data record is created for their user ID (which can come in the form of username or email, etc.)
Their password is handled:
1. A salt is generated for them, and that salt is saved in the data record. It's very common and recommended that the salt is visible right next to the password hash.
2. The password is combined with the salt (typically a library handles even this step for you, as any little nuance like appending salt matters a lot in security).
3. The password is hashed using a function designed for password hashing. This is usually quite computationally expensive to frustrate brute force attempts.
4. The password hash is stored in the data record, and the original password is discarded.

Then when they log in, there's no magic here:

The data record is looked up using their user ID.
The password they give is combined with the salt from the data record and then hashed.
If it matches the hash stored, they are granted access.

Some notes:

Password hashing is expensive. I don't think it's uncommon for it to take more than 1 whole second of computing time. Imagine the processing load if a thousand people are logging in at once.
A lot of things can go wrong if you aren't using a decent library to handle password hashing. Writing any of the steps yourself is not recommended.
It's common for libraries to output a coded string that contains everything you need to verify against a password, such as a version code, the hash, and the salt used.

What is the purpose of salt?

MD5 is a hash as you know, so if you give it an input, like 'PASSWORD', you get a unique (hopefully - however MD5 has collisions these days) output, like '3DE2AF...'.

Now, as you know, it's quite hard to directly reverse that, until somebody thought... wait, why don't I pre-generate all the possible combinations of hashable values until I can reverse the hash. This is called a rainbow table.

The purpose of a salt is to add arbitrary random data to the string being hashed, such that you increase the length of input to hash. This means general rainbow tables that expect to reverse just a password input to a hash won't work. Of course, rainbow tables being just reverse lookups, you could simply generate a rainbow table to compensate for all the possible password+salt outputs. This is where the increase in length comes into its own; because of the nature of reversing hashes, the disk space to generate reverses for very long hash inputs soon becomes infeasible. Alphanumeric rainbow tables for 6-8 characters are already a couple of Gigabytes; increase the length and character classes and you start to talk in multiples of 10GB.

Of course, if you're salting with 'PASSWORD' and you hash 'PASSWORD' you're hashing 'PASSWORDPASSWORD' which isn't that much more secure, so the choice of salt is important too. Ideally, you should use a random salt with each hashed string, but of course, you need to know what it is. A common technique is to derive a salt from the username or some other property unique to this case. Adding arbitrary data isn't in itself useful; having user-determined salt data now adds an additional level of complexity, meaning rainbow tables are needed with specialised searches for each user. The more you make this difficult, the more computational power is needed. That's where the battle is.

However, there are some modern techniques. I am not an expert, so I can't tell you how secure these are, but they are worth a mention. The concept is slow hashing. Basically, through compound hash functions you make it take a while to compute each hash. As such, the ability for each user to check the password now has a constant amount of time added for each password you wish to check. If you're bruteforcing, that is Bad News(tm). Similarly, if the system is well designed, if there are no shortcuts (which probably equate to weaknesses) then generating a rainbow table for a slow hash function should also take a while.

Edit more detail here. See crypt() for the first example of this. @CodeInChaos has referenced PBKDF2 which forms part of PKCS#5. A newer development is scrypt.

As I say, I'm not an expert cryptanalyst. On the latter example, I have no particular specialist knowledge as to its suitability, I'm merely showing you where things are headed.

Edit 2 Clarified my write up of salt - I think I danced around the key issue of disk space before.

Storing Salts for Passwords

Let's go over what the salt really is, then.
It's a way of making sure

Two users with the password aren't obvious.
- "password" with PBKDF2-HMAC-SHA-512, 10000 iterations, keeping only the first 32 bytes of output, stored in Base64, and a salt of empty string is ALWAYS
  
  9MpQfAfQvTG8d5oIdWgmpv2d2X1DrCXkspoJM6vqA/M=
- Thus, if you have 5% of your users with that as their password hash, you can be pretty sure it's either "password" or "12345" or one of the other worst passwords.
Attackers cannot precompute attack lists in advance of leaks and then nearly-instantly match their "rainbow table" of results to the leak to eliminate that entire precomputed list, then get on with cracking the hard passwords without wasting precious time on the easy ones.
- So, if we have an attacker with a password list of "password" and "12345", and you use no salt, they have already figured out that those results with the setup above are:
  
  9MpQfAfQvTG8d5oIdWgmpv2d2X1DrCXkspoJM6vqA/M=
- and
  
  I2bEyBbaxTBvHdJ7rIu7kdR2liwGMCg62lyuoj41NB8=
- Thus, the attacker gets your password list however, and they nearly instantly eliminate spending any computation time on the MANY of your users chose terrible passwords, which means they have more time, and combinations, left to try on the higher difficulty targets.
If you use a 16 byte cryptographically random salt for each userid, then instead of needing to perform the hash algorithm once for each password*rule on their list, they have to perform it 2^128 times for each, which is computationally infeasable at this time, let alone the storage requirements.

There's no point in keeping the salt secret - it serves those two purposes without any need for more secrecy than the password hash itself.

What is salt when relating to MYSQL sha1?

A salt is a value that is added to a password (or other secret) which you want to hash one way. This means it could be before, after, or somewhere inside the password, as long as its position and value is consistent for a given supplied password.

What this does is it mitigates dictionary attacks - basically dictionaries of common passwords pre-hashed with no salt - from being used to "guess" a one way password as long as the attacker does not know the hash. If every password has a different hash then it makes it very difficult for an attacker to create a dictionary optimized for cracking your passwords (they would need a dictionary for each separate salt and they would also need to know where the salt was placed in each password).

Of course for all of this to be applicable an attacker must have the hashes of your passwords in the first place. This has nothing to do with attacking passwords by guessing them through some input prompt.

Regarding MySQL specifically if you provide a salt when hashing a password, make sure you record what that salt was somewhere. Then when a user attempts authentication you combine that recorded salt value with the password (during the call to crypt for example) and if the resulting hash matches then they have entered the correct password. (Note that at no time is the hashing of a password reversed; thus one way.)

What does 'salt' refer to in string-to-key (s2k) specifier?

From Wikipedia:

In cryptography, a salt comprises of random bits that are used as one of the inputs to a key derivation function. The other input is usually a password or passphrase. The output of the key derivation function is stored as the encrypted version of the password.

A salt is just some bits that are used to increase the security of the system. They help prevent pre-computed dictionary attacks.

Reason for salting a password for webservice

Salt is used to prevent a hacker from reversing the password hashes into passwords. So here we assume that somewhow the hacker has access to the database.

Without salt

Let us first assume the scenario without salt. In that case the table looks like:

user | md5 password (first 6 chars)
-------------------------------
   1 | 1932ff
   2 | d3b073

(we here make the situation simpler than it is in reality)

The hacker of course wants to know what the passwords behind d3b073 and 1932ff are. A hash function is one directional in the sense that we can hash a password very fast, but unhashing it will - given it is a good hashing function - take a very long time, after guessing a huge amount of passwords.

So there is not much hope to easily retrieve the possible password(s) behind d3b073. But we can easily find a list of the 100'000 most popular passwords, and calculate the MD5 hash of all these passwords. Such list could look like:

password | md5 (first 6 characters)
--------------------------------------------
foo      | d3b073
bar      | c157a7

So apparently user 2 has used foo as password. The password of user 1 is unknown to us (but we know it is not foo or bar).

Now the point is that we can construct such table once and then use it to crack all passwords of all the users. Constructing such table for 100'000 passwords might perhaps take a few hours, but then we can easily retrieve all passwords. So a hacker can construct (or download) such table (there are more efficient ways, for instance with rainbow tables), and then use it each time he/she hacks a website and then obtains the passwords of all users.

With salt

If we however use salting, the table could look like this:

user | salt   | hashed password
-------------------------------
   1 | a91f40 | 1a604e
   2 | c2a67c | b36232

So here if the password of user 2 is foo, then we calculate the hash of fooc2a67c (or we use another way to combine the salt and the password) and store this into the database.

The point is that it is very hard to guess the password, since b36232 is not the hash of foo, but of fooc2a67c and the salt is typically something (pseudo)-random. We can of course again construct the most popular 100'000 passwords with salt c2a67c appended to it, but since we can not know the salt in advance, we can not create this table only once. Even if we are lucky and already constructed the table for salt c2a67c, it will not help us with hacking the password of user 1, since user 1 has a different salt.

So the only way to resolve this, is by constructing a reverse hash lookup table, for every user. Since it is usually very expensive to construct such table once, it will not be easy to calculate such table for every user.

We might of course decide to calculate all hashes of all possible salts, like for instance:

password  | md5 (first 6 characters)
---------------------------------------------
foo000000 | 367390
foo000001 | eca8ea
foo000002 | 6eb7bf
foo000003 | 7906b1
foo000004 | 0e9f0c
foo000005 | 0bfb11
...       | ...

But as you can see, the size of such table would grow to gigantic sizes. Furthermore it would take thousands of years. Even if we add only one hexadecimal character as salt, the size of the table would scale 16 times. Yes there are some techniques to reduce the amount of time and space for such table, but by increasing the "password space", the problem to hack passwords, will definitely be much harder. Furthermore salt is usally a signifcant amount of characters (or bytes) long making it way more harder than just 16 times more.

Basically salt acts as a way to enlarge the password space. Even if you enter the very same password on two websites, the personal salt of the websites will (close to certainty) be unique, and therefore the hash will be unique as well.

Is it an good idea of Using password itself as an salt

A common mistake is to use the same salt in each hash. Either the salt is hard-coded into the program, or is generated randomly once. This is ineffective because if two users have the same password, they'll still have the same hash. An attacker can still use a reverse lookup table attack to run a dictionary attack on every hash at the same time. They just have to apply the salt to each password guess before they hash it. If the salt is hard-coded into a popular product, lookup tables and rainbow tables can be built for that salt, to make it easier to crack hashes generated by the product.

A new random salt must be generated each time a user creates an account or changes their password.

[…] It's easy to get carried away and try to combine different hash functions, hoping that the result will be more secure. In practice, though, there is very little benefit to doing it. All it does is create interoperability problems, and can sometimes even make the hashes less secure. Never try to invent your own crypto, always use a standard that has been designed by experts. Some will argue that using multiple hash functions makes the process of computing the hash slower, so cracking is slower, but there's a better way to make the cracking process slower as we'll see later.

Here are some examples of poor wacky hash functions I've seen suggested in forums on the internet.
md5(sha1(password))
md5(md5(salt) + md5(password))
sha1(sha1(password))
sha1(str_rot13(password + salt))
md5(sha1(md5(md5(password) + sha1(password)) + md5(password)))
Do not use any of these.

Salt should be generated using a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG). CSPRNGs are very different than ordinary pseudo-random number generators, like the "C" language's rand() function. As the name suggests, CSPRNGs are designed to be cryptographically secure, meaning they provide a high level of randomness and are completely unpredictable. We don't want our salts to be predictable, so we must use a CSPRNG. The following table lists some CSPRNGs that exist for some popular programming platforms. (PHP: mcrypt_create_iv, openssl_random_pseudo_bytes)

The salt needs to be unique per-user per-password. Every time a user creates an account or changes their password, the password should be hashed using a new random salt. Never reuse a salt. The salt also needs to be long, so that there are many possible salts. As a rule of thumb, make your salt is at least as long as the hash function's output. The salt should be stored in the user account table alongside the hash.

To Store a Password

Generate a long random salt using a CSPRNG.

Prepend the salt to the password and hash it with a standard cryptographic hash function such as SHA256.

Save both the salt and the hash in the user's database record.

To Validate a Password

Retrieve the user's salt and hash from the database.

Prepend the salt to the given password and hash it using the same hash function.

Compare the hash of the given password with the hash from the database. If they match, the password is correct. Otherwise, the password is incorrect.

At the bottom of this page, there are implementations of salted password hashing in PHP, C#, Java, and Ruby.

In a Web Application, always hash on the server

If you are writing a web application, you might wonder where to hash. Should the password be hashed in the user's browser with JavaScript, or should it be sent to the server "in the clear" and hashed there?

Even if you are hashing the user's passwords in JavaScript, you still have to hash the hashes on the server. Consider a website that hashes users' passwords in the user's browser without hashing the hashes on the server. To authenticate a user, this website will accept a hash from the browser and check if that hash exactly matches the one in the database. This seems more secure than just hashing on the server, since the users' passwords are never sent to the server, but it's not.

The problem is that the client-side hash logically becomes the user's password. All the user needs to do to authenticate is tell the server the hash of their password. If a bad guy got a user's hash they could use it to authenticate to the server, without knowing the user's password! So, if the bad guy somehow steals the database of hashes from this hypothetical website, they'll have immediate access to everyone's accounts without having to guess any passwords.

This isn't to say that you shouldn't hash in the browser, but if you do, you absolutely have to hash on the server too. Hashing in the browser is certainly a good idea, but consider the following points for your implementation:

Client-side password hashing is not a substitute for HTTPS (SSL/TLS). If the connection between the browser and the server is insecure, a man-in-the-middle can modify the JavaScript code as it is downloaded to remove the hashing functionality and get the user's password.

Some web browsers don't support JavaScript, and some users disable JavaScript in their browser. So for maximum compatibility, your app should detect whether or not the browser supports JavaScript and emulate the client-side hash on the server if it doesn't.

You need to salt the client-side hashes too. The obvious solution is to make the client-side script ask the server for the user's salt. Don't do that, because it lets the bad guys check if a username is valid without knowing the password. Since you're hashing and salting (with a good salt) on the server too, it's OK to use the username (or email) concatenated with a site-specific string (e.g. domain name) as the client-side salt.

source: https://crackstation.net/hashing-security.htm

So, to answer your question, bad idea, very bad idea.