how salt can be implemented to prevent pre-computation dictionary attack on password

how salt can be implemented to prevent pre-computation dictionary attack on password - security

A salt makes every users password hash unique, and adding a salt to a password before hashing to protect against a dictionary attack. But how?

The tool you almost certainly want is called PBKDF2 (Password-Based Key Derivation Function 2). It's widely available, either under the name "pbkdf2" or "RFC 2898". PBKDF2 provides both salting (making two otherwise identical passwords different) and stretching (making it expensive to guess passwords).
Whatever system you are developing for probably has a function available that takes a password, a salt, a number of iterations, and an output size. Given those, it will output some string of bytes. There are several ways to actually make use of this depending on your situation (most notably are you dealing with local authentication or remote authentication?)
Most people are looking for remote authentication, so let's walk through a reasonable way to implement that using a mix of deterministic and random salts. (See further discussion below w/ #SilverlightFox.)
First, the high-level approach:
Hash on the client against a deterministic salt. The client should never send a bare password to the server. Users reuse their passwords all the time. You don't want to know their actual password. You'd rather never see it.
Salt randomly and stretch on the server and then compare.
Here's the actual breakdown:
Choose an app-specific component for your salt. For example, "net.robnapier.mygreatapp" might be my prefix.
Choose a user-specific component for your salt. The userid is usually ideal here.
Concatenate them to create your salt. For example, my salt might be "net.robnapier.mygreatapp:suejones#example.org". The actual salt does not matter too much. What matters is that it is at least "mostly" unique across all of your users and across all other sites that might also hash passwords from your users. The scheme I've given achieves that.
Choose a local number of iterations for PBKDF2. That number is almost certainly 1000. This is too few iterations, but is about all JavaScript can handle reasonably. The more iterations, the more secure the system, but the worse the performance. It's a tension.
Choose a length for your hash. 32 bytes is generally a good choice.
Choose a "PRF" if your system allows you to pick one. HMAC-SHA-256 is a good choice.
You now have all the basic pieces in place. Let's compute some hashes.
On the client, take the password and pass it through PBKDF2 with the above settings. That will give you 32 bytes to send to the server.
On the server, if this is the account creation, create 8 or 16 bytes of random data as your salt for this account. Save that in the database along with the username. Use that salt, and another set of iterations (usually 10,000 or 100,000 if you're not in Node) and apply PBKDF2 to the data that the user sent. Store that in the database. If you're testing the password, just read the salt from the database and reapply PBKDF2 to validate.
Everywhere I say "PBKDF2" here there are another options, probably the most common of which is scrypt (there is also bcrypt). The other options are technically better than PBKDF2. I don't think anyone would disagree with that. I usually recommend PBKDF2 because it's so ubiquitous and there's nothing really wrong with it. But if you have scrypt available, feel free to use that. The client and server do not have to use the same algorithm (the client can use PBKDF2 and the server can use scrypt if you like).

What's the md5 hash of "superCommonPassword"? That's easy to pre-calculate.
It's b77755edafab848ffcb9580307e97414
If you steal a password database and see that hash value, you know the password is probably "superCommonPassword".
What's the md5 hash ("aStringYouDontKnowUntilYouStealAPasswordDatabase" + "superCommonPassword")? Oh, you can't calculate that until you steal the database.
An unknown salt means pre-calculating hashes of common passwords is useless. An unknown salt per user means you need to calculate hashes of common passwords for each user. This slows down the attacker and increases his costs.
Don't use md5 for password hashing though. Use bcrypt or scrypt or PBKDF2.

Related

Store passwords safely but determine same passwords

I have legacy browser game which historicaly uses simple hashing function for password storage. I know that it' far from ideal. However time has proven that most of the cheaters (multiaccounts) use same password for all of fake accounts.
In update of my game I want to store passwords more safely. I already know, that passwords should by randomly salted, hashed by safe algorithms etc. That's all nice.
But is there any way, how to store passwords properly and determine that two (or more) users use same password? I don't want to know the password. I don't want to be able to search by password. I only need to tell, that suspect users A, B and C use same one.
Thanks.

If you store them correctly - no. This is one of the points of a proper password storage.
You could have very long passwords, beyond what is available on rainbow tables (not sure about the current state of the art, but it used to be 10 or 12 characters) and not salt them. In this case two passwords would have the same hash. This is a very bad idea (but a solution nevertheless) - if your passwords leak someone may be able to guess them indirectly (xkcd reference).
You may also look at homomorphic encryption, but this is in the realm of science fiction for now.

Well, if you use salt + hashing, you have all the salts as plain text. When a user enters a password, before storing/verifying it, you can hash it with all the salts available and see if you get the corresponding existing hash. :)
The obvious problem with this is that if you are doing it properly with bcrypt or pbkdf2 for hashing, this would be very slow - that's kind of the point in these functions.
I don't think there is any other way you can tell whether two passwords are the same - you need at least one of them plain text, which is only when the user enters it. And then you want to remove it from memory asap, which contradicts doing all these calculations with the plain text password in memory.

This will reduce the security of all passwords somewhat, since it leaks information about when two users have the same password. Even so, it is a workable trade-off and is straightforward to secure within that restriction.
The short answer is: use the same salt for all the passwords, but make that salt unique to your site.
Now the long answer:
First, to describe a standard and appropriate way to handle passwords. I'll get to the differences for you afterwards. (You may know all of this already, but it's worth restating.)
Start with a decent key-stretching algorithm, such as PBKDF2 (there are others, some even better, but PBKDF2 is ubiquitous and sufficient for most uses). Select a number of iterations depending on what is client-side environment is involved. For JavaScript, you'll want something like 1k-4k iterations. For languages with faster math, you can use 10k-100k.
The key stretcher will need a salt. I'll talk about the salt in a moment.
The client sends the password to the server. The server applies a fast hash (SHA-256 is nice) and compares that to the stored hash. (For setting the password, the server does the same thing; it accepts a PBKDF2 hash, applies SHA-256, and then stores it.)
All that is standard stuff. The question is the salt. The best salt is random, but no good for this. The second-best salt is built from service_id+user_id (i.e. use a unique identifier for the service and concatenate the username). Both of these make sure that every user's password hash is unique, even if their passwords are identical. But you don't want that.
So now finally to the core of your question. You want to use a per-service, but not per-user, static salt. So something like "com.example.mygreatapp" (obviously don't use that actual string; use a string based on your app). With a constant salt, all passwords on your service that are the same will stretch (PBKDF2) and hash (SHA256) to the same value and you can compare them without having any idea what the actual password is. But if your password database is stolen, attackers cannot compare the hashes in it to hashes in other sites' databases, even if they use the same algorithm (because they'll have a different salt).
The disadvantage of this scheme is exactly its goal: if two people on your site have the same password and an attacker steals your database and knows the password of one user, they know the password of the other user, too. That's the trade-off.

How does every system or server get a different hash function to store passwords

as I understand it, user passwords must be stored as hashes instead of encrypted, because an attacker cant deduce a password from its hash, while he can deduce a password from its encryption, if he gets access to the encryption key.
Now, obviously every system must use a different hashing function to hash its keys. My question is, how do they create these different hashing functions? Do they use a standard hashing function and prime it with a big key? If so, wouldn't an attacker be able to deduce the passwords if he got access to this key, making it the same as encryption?

Cryptographic hash functions are always non reversible, this is their purpose. Even discouraged "unsafe" function like MD5 and SHA1 are not reversible and they don't need a key. The problem is that you can find possible matching passwords too fast with brute-forcing (more than 10 Giga MD5/sec).
The "big key" you mentioned is probably the salt. You generate a random salt and use this salt in the calculation. It is safe to store this salt together with the hash, because its purpose is to prevent the attacker from building one single rainbow-table and finding matches for all passwords at once. Instead (s)he must build a rainbow-table for every salt separately, what makes those tables unpracticable.
The problem with the speed you can only overcome with iterations of the hash function. A cost factor defines how many times the hash is calculated. Recommended algorithms are BCrypt, PBKDF2 and SCrypt.

Now, obviously every system must use a different hashing function to hash its keys
No, they don't.
If your password is s3cr3t, then it will have the same hash value in the database of a lot of servers, sadly likely A4D80EAC9AB26A4A2DA04125BC2C096A
The way to make this suck less is to generate a random code per password, called a salt, so that the hash of s3cr3t on server 1 is likely to be different than the hash of s3cr3t on server2: hashFunction('s3cr3t' + 'perUserSalt')
Use bcrypt, scrypt, or PBKDF2 only for password storage.

Importance of salt when using Rfc2898DeriveBytes to create secure passwords from clear text passwords

I'd like to incorporate the encryption and decryption of files in one of my C# .NET apps. The scenario is simple: User A sends an AES256-encrypted file to user B. The clear text password is exchanged on a different channel (e.g. phone call or whatever).
From what I understand I should use Rfc2898DeriveBytes for converting the user's clear text password into a more secure password using maybe 10,000 rounds. (see this article).
What I don't understand is the role of salt in my scenario. Usually salt is used in hashing passwords to prevent dictionary attacks. But in my scenario the PBKDF2 algo is used to compensate weaknesses of short or easy to guess clear text passwords by adding extra calculations required by the PBKDF2-rounds.
If I choose a random salt then the receiver will need to know that salt also in order to decrypt correctly. If I use a constant salt, then hackers can easily reverse engineer my code and run brute force attacks using my constant salt (although they'll be really slow thanks to the PBKDF2 iterations).
From what I understand I have no choice but to use a constant salt in my scenario and enforce a good clear text password rule to make up for the weakness of constant salt. Is my assumption correct?

Salts, in the context of password hashing (and key derivation), are used to prevent precomputation attacks like rainbow tables.
Note that the salt must be different and unpredictable (preferably random) for every password. Also note that salts need not be secret – that's what the password is for. You gain no security by keeping the salt secret.
The recommended approach in your case is to generate a random salt every time a file is encrypted, and transmit the salt along with the ciphertext.
Is there a specific reason you're using AES-256 by the way? It's around 40% slower than AES-128 due to the extra rounds, and it offers no practical security benefit (particularly not in the case of password-based encryption).
It's also worth considering using a well-established standard like PGP rather than building your own protocol from cryptographic primitives, because building secure protocols is so hard that even experts don't always get it right.

Your assumption is correct. If they have access to the password, they will also have access to the salt. The BCrypt implementations I've seen put the number of iterations, the hash, and the salt all in the same result string!
The idea is: your hash should be secure even if the salt and number if iterations is known. (If we could always know that the salt and number of iterations and even the algorithm would be unknown to attackers, security would get a whole heck of a lot easier! Until attackers politely decline to read our salts, we must assume they will have access to them in the event of a breach.) So you're right, they can brute force it - if they have a few supercomputers and a couple million years of computing time at their disposal.

Multiple Salts to protect passwords

If you use a salt before hashing a password - it will make the hash more secure. It makes sense, because rainbow table attacks become much more difficult (impossible?).
What if you use multiple salts? For example - you check if the day is Monday, or the Month, the hour, etc (or some combination). Then you have a database which stores the fields: (userid, hash1, hash2, hash3...).
Would this make the information any more (or less) secure?
Example:
1) User registers with password 'PASS'.
2) System (php in this example) stores values (md5($password.$this_day)) for each day (7 passwords). into table password, column hash_monday, hash_tuesday etc.
3) user logs in, and script checks password where 'hash_'.$this_day matches what is entered.

Your system will be no more secure - you end up with several single salt databases instead of one. In principle it may be even less secure, since you helpfully provide the attacker with 7 hashes to the same string to choose from and he only needs to guess one. These multiple hashes of the same plaintext may also lead to implications to cryptographic strength of the encryption used for passwords (not sure on that one and it will depend on the algorithm used).

Maybe you should have a look at this small article. There are several things wrong with your approach.
A salt does not protect against a dictionary attack. It protects against rainbow-tables if correctly used.
Use a unique salt for each password. The salt should be a random value, not derrived from known information. It has to be stored with the password.
Do not use MD5 for hashing passwords. Md5 is considered broken, and it is ways too fast to hash passwords. With an off-the-shelf GPU, you are able to calculate 8 Giga MD5-hashes per second (in 2012). That makes it possible to brute-force a whole english dictionary with about 500000 words, in less than 0.1 milliseconds!
Use Bcrypt for hashing passwords. It is recommended to use a well established library like phpass, and if you want to understand how it can be implemented, you can read the article above.
If you want to add a secret to your hash function (like a hidden key, or a hidden function), you can add a pepper to the password. The pepper should not be stored in the database, and should remain secret. The pepper can protect against dictionary attacks, as long as the attacker has only access to your password-hashes (SQL-Injection), but not to the server with the secret.

I do not believe multiple hashes are going to help you in this scenario, primarily because when someone compromises your database they will notice that you have 7 different salts to go against and may make an educated guess that they are based on days of the week. There is nothing fundamentally wrong with MD5, as so many people like to jump on that bandwagon. The types of people that say MD5 is a broken hash have a fundamental misunderstanding between a hash function and a cryptographic hash function, I would recommend ignoring them. In the event you need a cryptographic hash function, use SHA-2 (or something from that family or greater).
You will need to salt the user input, as you know, a random value is generally recommended,but it can also be a value you store in a separate application space (outside of the database), you just have to protect that information as well. I highly recommend making the password hashing function take several thousand iterations for any input. As this will slow down the automated process of matching hashes on the database.
If your users use easy to guess passwords, dictionary attacks will beat you every day, cant protect against stupidity.

Salt Generation and open source software

As I understand it, the best practice for generating salts is to use some cryptic formula (or even magic constant) stored in your source code.
I'm working on a project that we plan on releasing as open source, but the problem is that with the source comes the secret formula for generating salts, and therefore the ability to run rainbow table attacks on our site.
I figure that lots of people have contemplated this problem before me, and I'm wondering what the best practice is. It seems to me that there is no point having a salt at all if the code is open source, because salts can be easily reverse-engineered.
Thoughts?

Since questions about salting hashes come along on a quite regular basis and there seems to be quite some confusion about the subject, I extended this answer.
What is a salt?
A salt is a random set of bytes of a fixed length that is added to the input of a hash algorithm.
Why is salting (or seeding) a hash useful?
Adding a random salt to a hash ensures that the same password will produce many different hashes. The salt is usually stored in the database, together with the result of the hash function.
Salting a hash is good for a number of reasons:
Salting greatly increases the difficulty/cost of precomputated attacks (including rainbow tables)
Salting makes sure that the same password does not result in the same hash.
This makes sure you cannot determine if two users have the same password. And, even more important, you cannot determine if the same person uses the same password across different systems.
Salting increases the complexity of passwords, thereby greatly decreasing the effectiveness of both Dictionary- and Birthday attacks. (This is only true if the salt is stored separate from the hash).
Proper salting greatly increases the storage need for precomputation attacks, up to the point where they are no longer practical. (8 character case-sensitive alpha-numeric passwords with 16 bit salt, hashed to a 128 bit value, would take up just under 200 exabytes without rainbow reduction).
There is no need for the salt to be secret.
A salt is not a secret key, instead a salt 'works' by making the hash function specific to each instance. With salted hash, there is not one hash function, but one for every possible salt value. This prevent the attacker from attacking N hashed passwords for less than N times the cost of attacking one password. This is the point of the salt.
A "secret salt" is not a salt, it is called a "key", and it means that you are no longer computing a hash, but a Message Authentication Code (MAC). Computing MAC is tricky business (much trickier than simply slapping together a key and a value into a hash function) and it is a very different subject altogether.
The salt must be random for every instance in which it is used. This ensures that an attacker has to attack every salted hash separately.
If you rely on your salt (or salting algorithm) being secret, you enter the realms of Security Through Obscurity (won't work). Most probably, you do not get additional security from the salt secrecy; you just get the warm fuzzy feeling of security. So instead of making your system more secure, it just distracts you from reality.
So, why does the salt have to be random?
Technically, the salt should be unique. The point of the salt is to be distinct for each hashed password. This is meant worldwide. Since there is no central organization which distributes unique salts on demand, we have to rely on the next best thing, which is random selection with an unpredictable random generator, preferably within a salt space large enough to make collisions improbable (two instances using the same salt value).
It is tempting to try to derive a salt from some data which is "presumably unique", such as the user ID, but such schemes often fail due to some nasty details:
If you use for example the user ID, some bad guys, attacking distinct systems, may just pool their resources and create precomputed tables for user IDs 1 to 50. A user ID is unique system-wide but not worldwide.
The same applies to the username: there is one "root" per Unix system, but there are many roots in the world. A rainbow table for "root" would be worth the effort, since it could be applied to millions of systems. Worse yet, there are also many "bob" out there, and many do not have sysadmin training: their passwords could be quite weak.
Uniqueness is also temporal. Sometimes, users change their password. For each new password, a new salt must be selected. Otherwise, an attacker obtained the hash of the old password and the hash of the new could try to attack both simultaneously.
Using a random salt obtained from a cryptographically secure, unpredictable PRNG may be some kind of overkill, but at least it provably protects you against all those hazards. It's not about preventing the attacker from knowing what an individual salt is, it's about not giving them the big, fat target that will be used on a substantial number of potential targets. Random selection makes the targets as thin as is practical.
In conclusion:
Use a random, evenly distributed, high entropy salt. Use a new salt whenever you create a new password or change a password. Store the salt along with the hashed password. Favor big salts (at least 10 bytes, preferably 16 or more).
A salt does not turn a bad password into a good password. It just makes sure that the attacker will at least pay the dictionary attack price for each bad password he breaks.
Usefull sources:
stackoverflow.com: Non-random salt for password hashes
Bruce Schneier: Practical Cryptography (book)
Matasano Security: Enough with the Rainbow Tables
usenix.org: Unix crypt used salt since 1976
owasp.org: Why add salt
openwall.com: Salts
Disclaimer:
I'm not a security expert. (Although this answer was reviewed by Thomas Pornin)
If any of the security professionals out there find something wrong, please do comment or edit this wiki answer.

Really salts just need to be unique for each entry. Even if the attacker can calculate what the salt is, it makes the rainbow table extremely difficult to create. This is because the salt is added to the password before it is hashed, so it effectively adds to the total number of entries the rainbow table must contain to have a list of all possible values for a password field.

Since Unix became popular, the right way to store a password has been to append a random value (the salt) and hash it. Save the salt away where you can get to it later, but where you hope the bad guys won't get it.
This has some good effects. First, the bad guys can't just make a list of expected passwords like "Password1", hash them into a rainbow table, and go through your password file looking for matches. If you've got a good two-byte salt, they have to generate 65,536 values for each expected password, and that makes the rainbow table a lot less practical. Second, if you can keep the salt from the bad guys who are looking at your password file, you've made it much harder to calculate possible values. Third, you've made it impossible for the bad guys to determine if a given person uses the same password on different sites.
In order to do this, you generate a random salt. This should generate every number in the desired range with uniform probability. This isn't difficult; a simple linear congruential random number generator will do nicely.
If you've got complicated calculations to make the salt, you're doing it wrong. If you calculate it based on the password, you're doing it WAY wrong. In that case, all you're doing is complicating the hash, and not functionally adding any salt.
Nobody good at security would rely on concealing an algorithm. Modern cryptography is based on algorithms that have been extensively tested, and in order to be extensively tested they have to be well known. Generally, it's been found to be safer to use standard algorithms rather than rolling one's own and hoping it's good. It doesn't matter if the code is open source or not, it's still often possible for the bad guys to analyze what a program does.

You can just generate a random salt for each record at runtime. For example, say you're storing hashed user passwords in a database. You can generate an 8-character random string of lower- and uppercase alphanumeric characters at runtime, prepend that to the password, hash that string, and store it in the database. Since there are 628 possible salts, generating rainbow tables (for every possible salt) will be prohibitively expensive; and since you're using a unique salt for each password record, even if an attacker has generated a couple matching rainbow tables, he still won't be able to crack every password.
You can change the parameters of your salt generation based on your security needs; for example, you could use a longer salt, or you could generate a random string that also contains punctuation marks, to increase the number of possible salts.

Use a random function generator to generate the salt, and store it in the database, make salt one per row, and store it in the database.
I like how salt is generated in django-registration. Reference: http://bitbucket.org/ubernostrum/django-registration/src/tip/registration/models.py#cl-85
salt = sha_constructor(str(random.random())).hexdigest()[:5]
activation_key = sha_constructor(salt+user.username).hexdigest()
return self.create(user=user,
activation_key=activation_key)
He uses a combination of sha generated by a random number and the username to generate a hash.
Sha itself is well known for being strong and unbreakable. Add multiple dimensions to generate the salt itself, with random number, sha and the user specific component, you have unbreakable security!

In the case of a desktop application that encrypts data and send it on a remote server, how do you consider using a different salt each time?
Using PKCS#5 with the user's password, it needs a salt to generate an encryption key, to encrypt the data. I know that keep the salt hardcoded (obfuscated) in the desktop application is not a good idea.
If the remote server must NEVER know the user's password, is it possible to user different salt each time? If the user use the desktop application on another computer, how will it be able to decrypt the data on the remote server if he does not have the key (it is not hardcoded in the software) ?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string