Enforcement of password policy - security

Many operating systems enforce certain constraints on passwords such as changing the password every n days etc.
Some also enforce a policy such as "the new password must differ in at least n characters from your previous password(s)".
My question is: how can you enforce such a policy without actually storing the passwords in clear text. Specifically: If I do not want to store the passwords in clear text but rather als (salted) hashes, how would I enforce this kind of poilicy?
Thanks in advance!

You can't. You can check that the password doesn't match the last N passwords by comparing it to old hashes, but anything that goes down to character level cannot be easily applied.
In theory, if you really wanted to do it, you could probably bruteforce one or two characters difference. (just hash all possible 2-character changes from the new password) But given how new algorithms rely on hashing being slow, this is not realistic with modern password hashing functions.
Just to be clear, I'm assuming that clear-text is the same as locally-encrypted for all practical purposes. Some systems will encrypt and save your original password, so they can verify it or allow recovery. Of course that only provides few benefits of hashing.

Related

Store passwords safely but determine same passwords

I have legacy browser game which historicaly uses simple hashing function for password storage. I know that it' far from ideal. However time has proven that most of the cheaters (multiaccounts) use same password for all of fake accounts.
In update of my game I want to store passwords more safely. I already know, that passwords should by randomly salted, hashed by safe algorithms etc. That's all nice.
But is there any way, how to store passwords properly and determine that two (or more) users use same password? I don't want to know the password. I don't want to be able to search by password. I only need to tell, that suspect users A, B and C use same one.
Thanks.
If you store them correctly - no. This is one of the points of a proper password storage.
You could have very long passwords, beyond what is available on rainbow tables (not sure about the current state of the art, but it used to be 10 or 12 characters) and not salt them. In this case two passwords would have the same hash. This is a very bad idea (but a solution nevertheless) - if your passwords leak someone may be able to guess them indirectly (xkcd reference).
You may also look at homomorphic encryption, but this is in the realm of science fiction for now.
Well, if you use salt + hashing, you have all the salts as plain text. When a user enters a password, before storing/verifying it, you can hash it with all the salts available and see if you get the corresponding existing hash. :)
The obvious problem with this is that if you are doing it properly with bcrypt or pbkdf2 for hashing, this would be very slow - that's kind of the point in these functions.
I don't think there is any other way you can tell whether two passwords are the same - you need at least one of them plain text, which is only when the user enters it. And then you want to remove it from memory asap, which contradicts doing all these calculations with the plain text password in memory.
This will reduce the security of all passwords somewhat, since it leaks information about when two users have the same password. Even so, it is a workable trade-off and is straightforward to secure within that restriction.
The short answer is: use the same salt for all the passwords, but make that salt unique to your site.
Now the long answer:
First, to describe a standard and appropriate way to handle passwords. I'll get to the differences for you afterwards. (You may know all of this already, but it's worth restating.)
Start with a decent key-stretching algorithm, such as PBKDF2 (there are others, some even better, but PBKDF2 is ubiquitous and sufficient for most uses). Select a number of iterations depending on what is client-side environment is involved. For JavaScript, you'll want something like 1k-4k iterations. For languages with faster math, you can use 10k-100k.
The key stretcher will need a salt. I'll talk about the salt in a moment.
The client sends the password to the server. The server applies a fast hash (SHA-256 is nice) and compares that to the stored hash. (For setting the password, the server does the same thing; it accepts a PBKDF2 hash, applies SHA-256, and then stores it.)
All that is standard stuff. The question is the salt. The best salt is random, but no good for this. The second-best salt is built from service_id+user_id (i.e. use a unique identifier for the service and concatenate the username). Both of these make sure that every user's password hash is unique, even if their passwords are identical. But you don't want that.
So now finally to the core of your question. You want to use a per-service, but not per-user, static salt. So something like "com.example.mygreatapp" (obviously don't use that actual string; use a string based on your app). With a constant salt, all passwords on your service that are the same will stretch (PBKDF2) and hash (SHA256) to the same value and you can compare them without having any idea what the actual password is. But if your password database is stolen, attackers cannot compare the hashes in it to hashes in other sites' databases, even if they use the same algorithm (because they'll have a different salt).
The disadvantage of this scheme is exactly its goal: if two people on your site have the same password and an attacker steals your database and knows the password of one user, they know the password of the other user, too. That's the trade-off.

DoD Password Complexity: Users cannot reuse any of their previous X passwords

I have seen a couple of posts on this, but I haven't seen a definitive answer necessarily. Therefore, I thought I would try to restate the question in a new context (Department of Defense).
According to DISA's "Application Security and Development STIG, V3R2", section 3.1.24.2 Password Complexity and Maintenance, DoD enterprise software has a pretty tough guideline with passwords:
Passwords must be at least 15 characters long.
Passwords must contain a mix of upper case letters, lower case letters, numbers, and special characters.
When a password is changed, users must not be
able to use personal information such as names, telephone numbers,
account names, or dictionary words.
Passwords must expire after 60
days.
Users must not be able to reuse any of their previous 10
passwords.
Ensure that the application has the capability to require that new account passwords differ from the previous password by at least four characters when a password is changed.
Users must not be able to change passwords more than
once a day, except in the case of an administrator or a privileged
user. Privileged users may be required to reset a user’s forgotten
passwords and the ability to change passwords more than once per day.
As stated in NullUserException's post, for the developer to actually be able to check for the last X amount of passwords (and also ensure that new passwords differ from the previous password [bullet 6]), the passwords would have to be encrypted using a reversible method, rather than hashing a password (which is a lot more unsecure, even if I am using NSA approved encryption algorithms). The proposed answer seemed to make a deal of sense, although there seemed to be some discrepancies and arguments, as seen in Dan Vinton's post.
I guess the real question here is, has anyone been able to implement all of these seemingly common password complexity constraints without actually diminishing the security of their systems?
Edit: Vulnerability APP3320.7 (bullet point 6) states "Ensure that the application has the capability to require that new account passwords differ from the previous password by at least four characters when a password is changed." That lead me to believe that I would have to run a string similarity algorithm such as Levenshtein to check similarity. I cannot do this on a hash/salt. Please let me know if I am wrong here?
The character distance requirement as stated is only for the (one) previous password, not the 10 previous. Assuming your password tool requires entering the current password as well as a new one, you just check against that; no need to store anything there. (Also noted on this answer to the post you mentioned.)
The requirement of not matching any of the previous 10 passwords, of course, is handled by just checking against the old hashes.
Using reversible methods to generate password-derived keys is not a secure practice and you must NOT DO IT. You must not store plain-text authentication information either. Since you will be storing keys (and perhaps salts, if you're into that kind of fetish), it is trivial to keep copies of the last 10 keys and check the newly submitted passwords against them.
Requiring that new passwords differ from previous N passwords by M characters is crazy, as it implies that password history is either plaintext or reversibly encrypted, neither of which is safe.
Limiting password history to "last N" and consequently limiting the frequency of password changes to "once per day" made sense when storage space was cost-prohibitive, but makes no sense today, where storage is very cheap. A much more reasonable policy be "new passwords must not be the same as any known old passwords" and leave it at that. Ditch the "last N" and ditch the "max once daily" rule which is only there to prevent users from circumventing history.
Some password management systems support this configuration (and have supported it for many years). Example: https://hitachi-id.com/password-manager/features/password-policy-enforcement.html

Does this mean my university is storing passwords insecurely?

My university requires you to change passwords regularly. If I try any variation on my current password I get the message:
The new password you have entered is
not acceptable for the following
reason: That password is too similar
to the old one! Please try again
please go back and try again.
Now I'm no cryptographer, but if they can compute a similarity measure between the new and old passwords, doesn't this mean that passwords are being stored insecurely, or even in plaintext?
EDIT: I may be being an idiot. They do require you to enter the current password as well.
Do you have to enter your current password when changing passwords? Perhaps they're verifying that the current password hashes to the right value, and the comparing the plaintext to the new password.
Not quite. They could take the new password you entered, change a character and check the hash of the altered password against the stored hash. Repeat this for a series of minor alterations, e.g. modifying/inserting/deleting a single character and if any of the hashes generated equal the one stored then give the error you see.
Example: Say your old password is "password" and you try change to "pssword". Insert "a" after the "p" gives you "password", which hashes to the same thing as the old password. Therefore without knowing the old password, but only the hash, we have determined that the passwords are similar.
For a password of length N, this generates and compares O(3N) = O(N) hashes. Assuming a hash takes O(N) to compute, the overall complexity will be O(N^2) which is very feasible for passwords all the way up to 1,000 characters.
There is a very rare chance of a hash collision, and the more alterations they consider similar the higher this chance. But it's still rare nonetheless.
Note that this doesn't guarantee that the passwords are being stored securely. It just means you c an't conclude that they are not being stored securely.
On Linux (and other Unix-like systems) there are two PAM authentication modules that are responsible for this:
(1) Using the remember= option for the pam_unix PAM authentication module. This stores a number of past passwords in their hashed form so that you cannot reuse an old password with no changes. A usual location for those old hashes is /etc/security/opasswd.
(2) The pam_cracklib PAM module uses the old password as you entered it in order to perform the change and checks if enough characters are different when compared to the new password you entered (see the difok= pam_cracklib option).
In no case are old passwords stored in a recoverable form...
Any semi-competent system administrator would use something similar, rather than reinvent the wheel, which probably (but not certainly) means that you should not worry.
doesn't this mean that passwords are being stored insecurely, or even in plaintext?
Could be. A pure hashing-based storage method would make it impossible to compare for similarity: Only perfectly identical passwords could be found out that way.
They could be using an algorithm like SOUNDEX to check similarity - that wouldn't be as awful a practice as storing plaintext passwords, but still a terrible thing to do.
But of course, it's possible that the passwords are stored as plain text. You'd have to ask.

Do similar passwords have similar hashes?

Our computer system at work requires users to change their password every few weeks, and you cannot have the same password as you had previously. It remembers something like 20 of your last passwords. I discovered most people simply increment a digit at the end of their password, so "thisismypassword1" becomes "thisismypassword2" then 3, 4, 5 etc.
Since all of these passwords are stored somewhere, I wondered if there was any weakness in the hashes themselves, for standard hashing algorithms used to store passwords like MD5. Could a hacker increase their chances of brute-forcing the password if they have a list of hashes of similar passwords?
With a good hash algorithm, similar passwords will get distributed across the hashes. So similar passwords will have very different hashes.
You can try this with MD5 and different strings.
"hello world" - 5eb63bbbe01eeed093cb22bb8f5acdc3
"hello world" - fd27fbb9872ba413320c606fdfb98db1
Do similar passwords have similar hashes?
No.
Any similarity, even a complex correlation, would be considered a weakness in the hash. Once discovered by the crypto community it would be published, and enough discovered weaknesses in the hash eventually add up to advice not to use that hash any more.
Of course there's no way to know whether a hash has undiscovered weaknesses, or weaknesses known to an attacker but not published, in which case most likely the attacker is a well-funded government organization. The NSA certainly is in possession of non-public theoretical attacks on some crypto components, but whether those attacks are usable is another matter. GCHQ probably is. I'd guess that a few other countries have secret crypto programs with enough mathematicians to have done original work: China would be my first guess. All you can do is act on the best available information. And if the best available information says that a hash is "good for crypto", then one of the things that means is no usable similarities of this kind.
Finally, some systems use weak hashes for passwords -- either due to ignorance by the implementer or legacy. All bets are off for the properties of a hashing scheme that either hasn't had public review, or else has been reviewed and found wanting, or else is old enough that significant weaknesses have eventually been found. MD5 is broken for some purposes (since there exist practical means to generate collisions) but not for all purposes. AFAIK it's OK for this, in the sense that there is no practical pre-image attack, and having a handful of hashes of related plaintexts is no better than having a handful of hashes of unrelated plaintexts. But for unrelated reasons you shouldn't really use a single application of any hash for password storage anyway, you should use multiple rounds.
Could a hacker increase their chances of brute-forcing the password if they have a list of hashes of similar passwords?
Indirectly, yes, knowing that those are your old passwords. Not because of any property of the hash, but suppose the attacker manages to (very slowly) brute-force one or more of your old passwords using those old hashes, and sees that in the past it has been "thisismypassword3" and "thisismypassword4".
Your password has since changed, to "thisismypassword5". Well done, by changing it before the attacker cracked it, you have successfully ensured that the attacker did not recover a valuable password! Victory! Except it does you no good, since the attacker has the means to guess the new one quickly anyway using the old password(s).
Even if the attacker only has one old password, and therefore cannot easily spot a trend, password crackers work by trying passwords which are similar to dictionary words and other values. To over-simplify a bit, it will try the dictionary words first, then strings consisting of a word with one extra character added, removed or changed, then strings with two changes, and so on.
By including your old password in the "other values", the attacker can ensure that strings very similar to it are checked early in the cracking process. So if your new password is similar to old ones, then having the old hashes does have some value to the attacker - reversing any one of them gives him a good seed to crack your current password.
So, incrementing your password regularly doesn't add much. Changing your password to something that's guessable from the old password puts your attacker in the same position as they'd be in if they knew nothing at all, but your password was guessable from nothing at all.
The main practical attacks on password systems these days are eavesdropping (via keyloggers and other malware) and phishing. Trying to reverse password hashes isn't a good percentage attack, although if an attacker has somehow got hold of an /etc/passwd file or equivalent, they will break some weak passwords that way on the average system.
It depends on the hashing algorithm. If it is any good, similar passwords should not have similar hashes.
The whole point of a cryptographic hash is that similar passwords would absolutely not create similar hashes.
More importantly, you would most likely salt the password so that even the same passwords do not produce the same hash.
It depends on the hash algorithm used. A good one will distribute similiar inputs to disparate outputs.
Different Inputs may result in the same Hash this is what is called a hash collision.
Check here:
http://en.wikipedia.org/wiki/Collision_%28computer_science%29
Hash colisions may be used to increase chances of a successfull brute force attack, see:
http://en.wikipedia.org/wiki/Birthday_attack
To expand on what others have said, a quick test shows that you get vastly different hashes with small changes made to the input.
I used the following code to run a quick test:
<?php
for($i=0;$i<5;$i++)
echo 'password' . $i . ' - ' .md5('password' . $i) . "<br />\n";
?>
and I got the following results:
password0 - 305e4f55ce823e111a46a9d500bcb86c
password1 - 7c6a180b36896a0a8c02787eeafb0e4c
password2 - 6cb75f652a9b52798eb6cf2201057c73
password3 - 819b0643d6b89dc9b579fdfc9094f28e
password4 - 34cc93ece0ba9e3f6f235d4af979b16c
Short answer, no!
The output of a hash function varies greatly even if one character is increased.
But this is only if you want to break the hashfunction itself.
Of course, it is bad practice since it makes bruteforcing easier.
No, if you check the password even slightly it produces completely new hash.
As a general rule, a "good hash" will not hash two similar (but unequal) strings to similar hashes. MD5 is good enough that this isn't a problem. However, there are "rainbow tables" (essentially password:hash pairs) for quite a few common passwords (and for some password hashes, the traditional DES-based unix passwords, for example) full rainbow tables exist.

Non-random salt for password hashes

UPDATE: I recently learned from this question that in the entire discussion below, I (and I am sure others did too) was a bit confusing: What I keep calling a rainbow table, is in fact called a hash table. Rainbow tables are more complex creatures, and are actually a variant of Hellman Hash Chains. Though I believe the answer is still the same (since it doesn't come down to cryptanalysis), some of the discussion might be a bit skewed.
The question: "What are rainbow tables and how are they used?"
Typically, I always recommend using a cryptographically-strong random value as salt, to be used with hash functions (e.g. for passwords), such as to protect against Rainbow Table attacks.
But is it actually cryptographically necessary for the salt to be random? Would any unique value (unique per user, e.g. userId) suffice in this regard? It would in fact prevent using a single Rainbow Table to crack all (or most) passwords in the system...
But does lack of entropy really weaken the cryptographic strength of the hash functions?
Note, I am not asking about why to use salt, how to protect it (it doesn't need to be), using a single constant hash (don't), or what kind of hash function to use.
Just whether salt needs entropy or not.
Thanks all for the answers so far, but I'd like to focus on the areas I'm (a little) less familiar with. Mainly implications for cryptanalysis - I'd appreciate most if anyone has some input from the crypto-mathematical PoV.
Also, if there are additional vectors that hadn't been considered, that's great input too (see #Dave Sherohman point on multiple systems).
Beyond that, if you have any theory, idea or best practice - please back this up either with proof, attack scenario, or empirical evidence. Or even valid considerations for acceptable trade-offs... I'm familiar with Best Practice (capital B capital P) on the subject, I'd like to prove what value this actually provides.
EDIT: Some really good answers here, but I think as #Dave says, it comes down to Rainbow Tables for common user names... and possible less common names too. However, what if my usernames are globally unique? Not necessarily unique for my system, but per each user - e.g. email address.
There would be no incentive to build a RT for a single user (as #Dave emphasized, the salt is not kept secret), and this would still prevent clustering. Only issue would be that I might have the same email and password on a different site - but salt wouldnt prevent that anyway.
So, it comes back down to cryptanalysis - IS the entropy necessary, or not? (My current thinking is it's not necessary from a cryptanalysis point of view, but it is from other practical reasons.)
Salt is traditionally stored as a prefix to the hashed password. This already makes it known to any attacker with access to the password hash. Using the username as salt or not does not affect that knowledge and, therefore, it would have no effect on single-system security.
However, using the username or any other user-controlled value as salt would reduce cross-system security, as a user who had the same username and password on multiple systems which use the same password hashing algorithm would end up with the same password hash on each of those systems. I do not consider this a significant liability because I, as an attacker, would try passwords that a target account is known to have used on other systems first before attempting any other means of compromising the account. Identical hashes would only tell me in advance that the known password would work, they would not make the actual attack any easier. (Note, though, that a quick comparison of the account databases would provide a list of higher-priority targets, since it would tell me who is and who isn't reusing passwords.)
The greater danger from this idea is that usernames are commonly reused - just about any site you care to visit will have a user account named "Dave", for example, and "admin" or "root" are even more common - which would make construction of rainbow tables targeting users with those common names much easier and more effective.
Both of these flaws could be effectively addressed by adding a second salt value (either fixed and hidden or exposed like standard salt) to the password before hashing it, but, at that point, you may as well just be using standard entropic salt anyhow instead of working the username into it.
Edited to Add: A lot of people are talking about entropy and whether entropy in salt is important. It is, but not for the reason most of the comments on it seem to think.
The general thought seems to be that entropy is important so that the salt will be difficult for an attacker to guess. This is incorrect and, in fact, completely irrelevant. As has been pointed out a few times by various people, attacks which will be affected by salt can only be made by someone with the password database and someone with the password database can just look to see what each account's salt is. Whether it's guessable or not doesn't matter when you can trivially look it up.
The reason that entropy is important is to avoid clustering of salt values. If the salt is based on username and you know that most systems will have an account named either "root" or "admin", then you can make a rainbow table for those two salts and it will crack most systems. If, on the other hand, a random 16-bit salt is used and the random values have roughly even distribution, then you need a rainbow table for all 2^16 possible salts.
It's not about preventing the attacker from knowing what an individual account's salt is, it's about not giving them the big, fat target of a single salt that will be used on a substantial proportion of potential targets.
Using a high-entropy salt is absolutely necessary to store passwords securely.
Take my username 'gs' and add it to my password 'MyPassword' gives gsMyPassword. This is easily broken using a rainbow-table because if the username hasn't got enough entropy it could be that this value is already stored in the rainbow-table, especially if the username is short.
Another problem are attacks where you know that a user participates in two or more services. There are lots of common usernames, probably the most important ones are admin and root. If somebody created a rainbow-table that have salts with the most common usernames, he could use them to compromise accounts.
They used to have a 12-bit salt. 12 bit are 4096 different combinations. That was not secure enough because that much information can be easily stored nowadays. The same applies for the 4096 most used usernames. It's likely that a few of your users will be choosing a username that belongs to the most common usernames.
I've found this password checker which works out the entropy of your password. Having smaller entropy in passwords (like by using usernames) makes it much easier for rainbowtables as they try to cover at least all passwords with low entropy, because they are more likely to occur.
It is true that the username alone may be problematic since people may share usernames among different website. But it should be rather unproblematic if the users had a different name on each website. So why not just make it unique on each website. Hash the password somewhat like this
hashfunction("www.yourpage.com/"+username+"/"+password)
This should solve the problem. I'm not a master of cryptanalysis, but I sure doubt that the fact that we don't use high entropy would make the hash any weaker.
I like to use both: a high-entropy random per-record salt, plus the unique ID of the record itself.
Though this doesn't add much to security against dictionary attacks, etc., it does remove the fringe case where someone copies their salt and hash to another record with the intention of replacing the password with their own.
(Admittedly it's hard to think of a circumstance where this applies, but I can see no harm in belts and braces when it comes to security.)
If the salt is known or easily guessable, you have not increased the difficulty of a dictionary attack. It even may be possible to create a modified rainbow table that takes a "constant" salt into account.
Using unique salts increases the difficulty of BULK dictionary attacks.
Having unique, cryptographically strong salt value would be ideal.
I would say that as long as the salt is different for each password, you will probably be ok. The point of the salt, is so that you can't use standard rainbow table to solve every password in the database. So if you apply a different salt to every password (even if it isn't random), the attacker would basically have to compute a new rainbow table for each password, since each password uses a different salt.
Using a salt with more entropy doesn't help a whole lot, because the attacker in this case is assumed to already have the database. Since you need to be able to recreate the hash, you have to already know what the salt is. So you have to store the salt, or the values that make up the salt in your file anyway. In systems like Linux, the method for getting the salt is known, so there is no use in having a secret salt. You have to assume that the attacker who has your hash values, probably knows your salt values as well.
The strength of a hash function is not determined by its input!
Using a salt that is known to the attacker obviously makes constructing a rainbow table (particularly for hard-coded usernames like root) more attractive, but it doesn't weaken the hash. Using a salt which is unknown to the attacker will make the system harder to attack.
The concatenation of a username and password might still provide an entry for an intelligent rainbow table, so using a salt of a series pseudo-random characters, stored with the hashed password is probably a better idea. As an illustration, if I had username "potato" and password "beer", the concatenated input for your hash is "potatobeer", which is a reasonable entry for a rainbow table.
Changing the salt each time the user changes their password might help to defeat prolonged attacks, as would the enforcement of a reasonable password policy, e.g. mixed case, punctuation, min length, change after n weeks.
However, I would say your choice of digest algorithm is more important. Use of SHA-512 is going to prove to be more of a pain for someone generating a rainbow table than MD5, for example.
Salt should have as much entropy as possible to ensure that should a given input value be hashed multiple times, the resulting hash value will be, as close as can be achieved, always different.
Using ever-changing salt values with as much entropy as possible in the salt will ensure that the likelihood of hashing (say, password + salt) will produce entirely different hash values.
The less entropy in the salt, the more chance you have of generating the same salt value, as thus the more chance you have of generating the same hash value.
It is the nature of the hash value being "constant" when the input is known and "constant" that allow dictionary attacks or rainbow tables to be so effective. By varying the resulting hash value as much as possible (by using high entropy salt values) ensures that hashing the same input+random-salt will produce many different hash value results, thereby defeating (or at least greatly reducing the effectiveness of) rainbow table attacks.
Entropy is the point of Salt value.
If there is some simple and reproducible "math" behind salt, than it's the same as the salt is not there. Just adding time value should be fine.

Resources