TLDR;
The hashing function generates a different hash every time for the same piece of data, but it can determine if a particular hash was generated with the piece of data or not.
Eg:
hash_func(xyz): abc123
hash_func(xyz): jhg342 // different hash, even if the data was same.
decode_hash(jhg324) == xyz
This gives true, because the hash function determined that jhg324 is indeed the hash of xyz
The Question
For an Open Source website, I want to store the email in hashed form (because all the users will be public), but the site needs to know if an email was used to register for another account so that it can ensure one account per email.
However, all the emails are from one organization only. This means, they all look exactly like uid#org_name.com. This means anyone can run through all the UIDs and find out which hash belongs to which email, and thus, which person.
Therefore, is there a way to hash the email such that the hash knows which email it belongs to, but hashing the same email does not generate the same hash.
P.S. Please note that I cannot use Salting as the site will be Open Source and the salt will be publicly available.
This doesn't make sense - you're conflating hashing and encryption in a very strange way. What you're describing wouldn't really be a cryptographically secure hash function. By definition, cryptographically secure hash functions are one way. In fact, if you could reverse it, there would be little point to using it at all because it would no longer be secure. This would make it possible to brute-force passwords and would "break" passwords that were used in multiple places.
Also, why would you want it to hash to different values each time? That's what you use a salt for.
If you want to be able to reverse it later, just use an encryption algorithm like AES. Even better, many databases even offer features for securely storing sensitive information; see, for example, SQL Server's Always Encrypted feature.
Related
I got the following problem:
In an Java Application I want to store some configuration data in an encrypted, local file. This file might be used for confidential data, like user credentials.
This file should be accessible by using a password (and only a password).
Now most trustworthy people and reference implementations use random salts. I completely understand that this is a reasonable choice. But if my application terminates and will be started later, the random salt is not available anymore. This application is stand-alone so no additional database could be used as a salt store.
For my software the user shall only type in the password (means: no user name, no salt, no favorite animal or colour).
Now my idea was deriving the salt from the password (e.g. by using the first 16 bytes of SHA-256).
My questions are:
How (in)secure would this implementation be?
What is a common way of encrypting stuff with only a password and would be a better alternative?
What is not the aim of this question:
Where to store salts
Secure algorithms and crypto implementation (of course, I did not implement crypto by myself)
Architectural improvements (nop, I do not want a global database for storing stuff)
First, I strongly recommend against devising novel encryption formats if you can help it. It is very difficult to do them correctly. If you want an encryption format that does what you're describing, see JNCryptor, which is an implementation of the RNCryptor format. The RNCryptor format is designed precisely for this problem, so the spec is a good source of information on how you could create your own if you don't want to use it directly. (I'm the author of RNCryptor.)
See also libsodium. It's a better encryption format than RNCryptor for various technical reasons, but it's a bit harder to install and use correctly. There are several Java bindings for libsodium.
When you say "of course, I did not implement crypto by myself," that's what you're doing. Crypto schemes are more than just the AES code. Deciding how to generate the salts in a novel way is implementing crypto. There are many ways to put together secure primitives (like salts) in simple ways and make them wildly insecure. That's why you want to use something well established.
The key take-away is that you store the salt with the data. I know you said this isn't about storing the salt, but that's how you do this. The simplest way to do this is to just glue the salt onto the start of the cipher text and store that. Then you just read the salt from the header. Similarly, you could put the whole thing in an envelope if that's more convenient. Something as simple as JSON:
{ "salt": "<base64-salt>",
"data": "<base64-data>" }
It's not the most efficient way to store the data, but it's easy, standard, and secure.
Remember, salts are not secrets. It is fine that everyone can read the salt.
OK, enough of how to do it right. Let's get to your actual question.
Your salting proposal is not a salt. It's just a slightly different hashing function. The point of a salt is if the same password is used twice (without intending to be the same password), then they will have different hashes. Your scheme fails that. If I implement the same approach as you do, and I pick the same password as yours, then the hash will be the same. Rainbow tables win.
The way you fix that is with a static salt, not a modified hash function. You should pick a salt that represents your system. I usually like reverse DNS for this, because it leads to uniqueness. For example: "com.example.mygreatapp". Someone else would naturally pick "org.example.ourawesomedb". You also can pick a long random string, but the important thing is uniqueness, so I like reverse DNS. (Random strings tend to make people think the salt is a secret, and the salt is not a secret.)
That's the whole system; just pick some constant salt, unique to your system. (If you had a username, you'd add the username to the salt. This is a standard way to construct a deterministic salt.)
But for file storage, I'd never do it that way.
To encrypt data, one needs a key not a password. There are key-derivation-functions to get a key from a user password.
A salt can be used for password hashing, but it cannot be used for encrypting data. There is a similar concept for encryption though, the random values there are called IV or Nonce and are stored together with the encrypted data.
The best thing you can do is
use a key-derivation-function with a salt, to get a key from the password.
With the resulting key you can encrypt the data.
In this case the salt can be stored inside the encrypted data container (the IV is already there), so there is no need for a global database.
To answer your original question: Derriving a salt from the password negates the whole purpose of the salt, it just becomes a more complex hash function.
I have legacy browser game which historicaly uses simple hashing function for password storage. I know that it' far from ideal. However time has proven that most of the cheaters (multiaccounts) use same password for all of fake accounts.
In update of my game I want to store passwords more safely. I already know, that passwords should by randomly salted, hashed by safe algorithms etc. That's all nice.
But is there any way, how to store passwords properly and determine that two (or more) users use same password? I don't want to know the password. I don't want to be able to search by password. I only need to tell, that suspect users A, B and C use same one.
Thanks.
If you store them correctly - no. This is one of the points of a proper password storage.
You could have very long passwords, beyond what is available on rainbow tables (not sure about the current state of the art, but it used to be 10 or 12 characters) and not salt them. In this case two passwords would have the same hash. This is a very bad idea (but a solution nevertheless) - if your passwords leak someone may be able to guess them indirectly (xkcd reference).
You may also look at homomorphic encryption, but this is in the realm of science fiction for now.
Well, if you use salt + hashing, you have all the salts as plain text. When a user enters a password, before storing/verifying it, you can hash it with all the salts available and see if you get the corresponding existing hash. :)
The obvious problem with this is that if you are doing it properly with bcrypt or pbkdf2 for hashing, this would be very slow - that's kind of the point in these functions.
I don't think there is any other way you can tell whether two passwords are the same - you need at least one of them plain text, which is only when the user enters it. And then you want to remove it from memory asap, which contradicts doing all these calculations with the plain text password in memory.
This will reduce the security of all passwords somewhat, since it leaks information about when two users have the same password. Even so, it is a workable trade-off and is straightforward to secure within that restriction.
The short answer is: use the same salt for all the passwords, but make that salt unique to your site.
Now the long answer:
First, to describe a standard and appropriate way to handle passwords. I'll get to the differences for you afterwards. (You may know all of this already, but it's worth restating.)
Start with a decent key-stretching algorithm, such as PBKDF2 (there are others, some even better, but PBKDF2 is ubiquitous and sufficient for most uses). Select a number of iterations depending on what is client-side environment is involved. For JavaScript, you'll want something like 1k-4k iterations. For languages with faster math, you can use 10k-100k.
The key stretcher will need a salt. I'll talk about the salt in a moment.
The client sends the password to the server. The server applies a fast hash (SHA-256 is nice) and compares that to the stored hash. (For setting the password, the server does the same thing; it accepts a PBKDF2 hash, applies SHA-256, and then stores it.)
All that is standard stuff. The question is the salt. The best salt is random, but no good for this. The second-best salt is built from service_id+user_id (i.e. use a unique identifier for the service and concatenate the username). Both of these make sure that every user's password hash is unique, even if their passwords are identical. But you don't want that.
So now finally to the core of your question. You want to use a per-service, but not per-user, static salt. So something like "com.example.mygreatapp" (obviously don't use that actual string; use a string based on your app). With a constant salt, all passwords on your service that are the same will stretch (PBKDF2) and hash (SHA256) to the same value and you can compare them without having any idea what the actual password is. But if your password database is stolen, attackers cannot compare the hashes in it to hashes in other sites' databases, even if they use the same algorithm (because they'll have a different salt).
The disadvantage of this scheme is exactly its goal: if two people on your site have the same password and an attacker steals your database and knows the password of one user, they know the password of the other user, too. That's the trade-off.
I would like to protect my users' username in an online service, as it may be personally identifying (e.g., an email-address), but am wondering if it's even possible...
My first inclination was to hash it (unsalted), but am worried about possible hash collisions. Not so much worried about the probability of a collision in an SHA256 32-bit hash, but more about the possibility that the class of usernames used could be just prone to collisions.
I also looked into perfect hashes, but as the users can be added dynamically, that's going to be too hard to manage.
Another option I've thought of was that (when adding the user) if there were a hash collision, I would reply to the client with a request for another hash, and repeat until there was no collision. I'd repeat this process during log-in. However, I am also wondering if this actually makes it easier for an attacker, as they'd have more feedback about what hashes were successful, and if the database were compromised, all the additional hashes would make recovering the original value easier.
I was also considering encrypting the username using the username as a password, but I'm guessing this also suffers from collisions (because each entry has a unique password--two different plain-texts with two different passwords could result in the same cipher-text), so I'm thinking it's not worth exploring this further.
I don't really want to go with a custom username (where the user has to come up with something that hasn't been taken when they sign-up), as I'm expecting the user to very rarely use the service, and are likely to forget their username.
I'm currently thinking I will just go with the first idea of hashing once, and if there is a collision, have the password decide (and hope there's no collision there too--I could put a warning when the user signs-up saying that their username/password is not allowed because it will log them in as another user perhaps /S).
Is there any non-colliding way of creating a secure form of username?
Thank you.
Assuming we are talking about emails, as there aren't many other options usable for login names.
I was also considering encrypting the username using the username as a password, but I'm guessing this also suffers from collisions (because each entry has a unique password--two different plain-texts with two different passwords could result in the same cipher-text), so I'm thinking it's not worth exploring this further.
Collisions here are the wrong thing to worry about here ...
Mandatory disclaimer: Encryption keys are not the same things as passwords. And encrypting the plainText with itself as the key is even worse.
The problem with encryption is that cipherTexts aren't searchable; i.e. you cannot verify for uniqueness, unless you decrypt all user records each time, so this just isn't sustainable - your server loads will grow exponentially with each new user record.
That's because while encryption makes use of IVs (Initialization Vector; i.e. the equivalent of salts in password hashing), which results in different cipherText even if you encrypt the same plainText twice, using the same key.
However, it is very likely that you will need to encrypt those emails, as if you need to send out password reset links, notifications, etc. - you'll need a two-way mechanism. You can't do these things with hashes, because they are one-way only.
There's a reason why every website couples its user accounts with email addresses, even if they are not the login names. :)
What you can do for login checks only, is to store a HMAC (Hash-based Message Authentication Code) of the email.
HMACs look just like regular hashes, but are actually "keyed hashes" (i.e. you would use a key while hashing, similarly to encryption). And in addition to that, nobody has managed to find collisions with the HMAC construct so far, even with the now famously insecure MD5 (still, please use a modern algorithm; at least SHA-2).
I should note that HMACs aren't nearly as strong as password hashing algorithms, so your users emails certainly won't be as strongly protected as their passwords, but it's not like there's anything else you can do about it, and it should be OK.
In summary, you'll need to have two separate cryptographic keys configured in your application - one for encryption, and one for the HMACs - and the following data stored:
userLoginLookup - HMAC of the email, using one of the two keys
userLoginMailer - cipherText of the email, using the second configured key
userPassword - a standard password hash; using bcrypt, PBKDF2 or scrypt
Note: Cryptography is always case-sensitive, so to accomodate lookups, you need to always normalize the email addresses first; i.e. make them all-lowercase or all-uppercase.
When a user attempts to login, you do a HMAC(emailInput, hmacKey) and search for a match with the userLoginLookup field in your database.
When you need to send a notification or password reset email, you decrypt the userLoginMailer.
Imagine you want to create a "secure" messaging app which must comply to:
If someone has access to server databases, he/she can not identify the user from the field your using to substitute the normal username / email.
This solution seems interesting.
But I wonder:
If there are any better (more secure) alternatives
What hashing mechanism one should use
Not really. Hashes are good for hiding secret information, like passwords. For information like email addresses, which are usually quite easily guessed/googled, an attacker could easily pre-generate a huge list of hashes for a database of email addresses and quickly use a reverse lookup to find out if a given hash (on your system) matches up with one of the addresses in the database. That's putting aside the fact that hashes are not unique, which probably isn't a problem with a big enough hash address space.
Generally, if you want anonymous IDs, you should use randomly generated ones.
If you use a salt before hashing a password - it will make the hash more secure. It makes sense, because rainbow table attacks become much more difficult (impossible?).
What if you use multiple salts? For example - you check if the day is Monday, or the Month, the hour, etc (or some combination). Then you have a database which stores the fields: (userid, hash1, hash2, hash3...).
Would this make the information any more (or less) secure?
Example:
1) User registers with password 'PASS'.
2) System (php in this example) stores values (md5($password.$this_day)) for each day (7 passwords). into table password, column hash_monday, hash_tuesday etc.
3) user logs in, and script checks password where 'hash_'.$this_day matches what is entered.
Your system will be no more secure - you end up with several single salt databases instead of one. In principle it may be even less secure, since you helpfully provide the attacker with 7 hashes to the same string to choose from and he only needs to guess one. These multiple hashes of the same plaintext may also lead to implications to cryptographic strength of the encryption used for passwords (not sure on that one and it will depend on the algorithm used).
Maybe you should have a look at this small article. There are several things wrong with your approach.
A salt does not protect against a dictionary attack. It protects against rainbow-tables if correctly used.
Use a unique salt for each password. The salt should be a random value, not derrived from known information. It has to be stored with the password.
Do not use MD5 for hashing passwords. Md5 is considered broken, and it is ways too fast to hash passwords. With an off-the-shelf GPU, you are able to calculate 8 Giga MD5-hashes per second (in 2012). That makes it possible to brute-force a whole english dictionary with about 500000 words, in less than 0.1 milliseconds!
Use Bcrypt for hashing passwords. It is recommended to use a well established library like phpass, and if you want to understand how it can be implemented, you can read the article above.
If you want to add a secret to your hash function (like a hidden key, or a hidden function), you can add a pepper to the password. The pepper should not be stored in the database, and should remain secret. The pepper can protect against dictionary attacks, as long as the attacker has only access to your password-hashes (SQL-Injection), but not to the server with the secret.
I do not believe multiple hashes are going to help you in this scenario, primarily because when someone compromises your database they will notice that you have 7 different salts to go against and may make an educated guess that they are based on days of the week. There is nothing fundamentally wrong with MD5, as so many people like to jump on that bandwagon. The types of people that say MD5 is a broken hash have a fundamental misunderstanding between a hash function and a cryptographic hash function, I would recommend ignoring them. In the event you need a cryptographic hash function, use SHA-2 (or something from that family or greater).
You will need to salt the user input, as you know, a random value is generally recommended,but it can also be a value you store in a separate application space (outside of the database), you just have to protect that information as well. I highly recommend making the password hashing function take several thousand iterations for any input. As this will slow down the automated process of matching hashes on the database.
If your users use easy to guess passwords, dictionary attacks will beat you every day, cant protect against stupidity.