Derive salt from password - How (in)secure is it?

Derive salt from password - How (in)secure is it? - security

I got the following problem:
In an Java Application I want to store some configuration data in an encrypted, local file. This file might be used for confidential data, like user credentials.
This file should be accessible by using a password (and only a password).
Now most trustworthy people and reference implementations use random salts. I completely understand that this is a reasonable choice. But if my application terminates and will be started later, the random salt is not available anymore. This application is stand-alone so no additional database could be used as a salt store.
For my software the user shall only type in the password (means: no user name, no salt, no favorite animal or colour).
Now my idea was deriving the salt from the password (e.g. by using the first 16 bytes of SHA-256).
My questions are:
How (in)secure would this implementation be?
What is a common way of encrypting stuff with only a password and would be a better alternative?
What is not the aim of this question:
Where to store salts
Secure algorithms and crypto implementation (of course, I did not implement crypto by myself)
Architectural improvements (nop, I do not want a global database for storing stuff)

First, I strongly recommend against devising novel encryption formats if you can help it. It is very difficult to do them correctly. If you want an encryption format that does what you're describing, see JNCryptor, which is an implementation of the RNCryptor format. The RNCryptor format is designed precisely for this problem, so the spec is a good source of information on how you could create your own if you don't want to use it directly. (I'm the author of RNCryptor.)
See also libsodium. It's a better encryption format than RNCryptor for various technical reasons, but it's a bit harder to install and use correctly. There are several Java bindings for libsodium.
When you say "of course, I did not implement crypto by myself," that's what you're doing. Crypto schemes are more than just the AES code. Deciding how to generate the salts in a novel way is implementing crypto. There are many ways to put together secure primitives (like salts) in simple ways and make them wildly insecure. That's why you want to use something well established.
The key take-away is that you store the salt with the data. I know you said this isn't about storing the salt, but that's how you do this. The simplest way to do this is to just glue the salt onto the start of the cipher text and store that. Then you just read the salt from the header. Similarly, you could put the whole thing in an envelope if that's more convenient. Something as simple as JSON:
{ "salt": "<base64-salt>",
"data": "<base64-data>" }
It's not the most efficient way to store the data, but it's easy, standard, and secure.
Remember, salts are not secrets. It is fine that everyone can read the salt.
OK, enough of how to do it right. Let's get to your actual question.
Your salting proposal is not a salt. It's just a slightly different hashing function. The point of a salt is if the same password is used twice (without intending to be the same password), then they will have different hashes. Your scheme fails that. If I implement the same approach as you do, and I pick the same password as yours, then the hash will be the same. Rainbow tables win.
The way you fix that is with a static salt, not a modified hash function. You should pick a salt that represents your system. I usually like reverse DNS for this, because it leads to uniqueness. For example: "com.example.mygreatapp". Someone else would naturally pick "org.example.ourawesomedb". You also can pick a long random string, but the important thing is uniqueness, so I like reverse DNS. (Random strings tend to make people think the salt is a secret, and the salt is not a secret.)
That's the whole system; just pick some constant salt, unique to your system. (If you had a username, you'd add the username to the salt. This is a standard way to construct a deterministic salt.)
But for file storage, I'd never do it that way.

To encrypt data, one needs a key not a password. There are key-derivation-functions to get a key from a user password.
A salt can be used for password hashing, but it cannot be used for encrypting data. There is a similar concept for encryption though, the random values there are called IV or Nonce and are stored together with the encrypted data.
The best thing you can do is
use a key-derivation-function with a salt, to get a key from the password.
With the resulting key you can encrypt the data.
In this case the salt can be stored inside the encrypted data container (the IV is already there), so there is no need for a global database.
To answer your original question: Derriving a salt from the password negates the whole purpose of the salt, it just becomes a more complex hash function.

Related

Store passwords safely but determine same passwords

I have legacy browser game which historicaly uses simple hashing function for password storage. I know that it' far from ideal. However time has proven that most of the cheaters (multiaccounts) use same password for all of fake accounts.
In update of my game I want to store passwords more safely. I already know, that passwords should by randomly salted, hashed by safe algorithms etc. That's all nice.
But is there any way, how to store passwords properly and determine that two (or more) users use same password? I don't want to know the password. I don't want to be able to search by password. I only need to tell, that suspect users A, B and C use same one.
Thanks.

If you store them correctly - no. This is one of the points of a proper password storage.
You could have very long passwords, beyond what is available on rainbow tables (not sure about the current state of the art, but it used to be 10 or 12 characters) and not salt them. In this case two passwords would have the same hash. This is a very bad idea (but a solution nevertheless) - if your passwords leak someone may be able to guess them indirectly (xkcd reference).
You may also look at homomorphic encryption, but this is in the realm of science fiction for now.

Well, if you use salt + hashing, you have all the salts as plain text. When a user enters a password, before storing/verifying it, you can hash it with all the salts available and see if you get the corresponding existing hash. :)
The obvious problem with this is that if you are doing it properly with bcrypt or pbkdf2 for hashing, this would be very slow - that's kind of the point in these functions.
I don't think there is any other way you can tell whether two passwords are the same - you need at least one of them plain text, which is only when the user enters it. And then you want to remove it from memory asap, which contradicts doing all these calculations with the plain text password in memory.

This will reduce the security of all passwords somewhat, since it leaks information about when two users have the same password. Even so, it is a workable trade-off and is straightforward to secure within that restriction.
The short answer is: use the same salt for all the passwords, but make that salt unique to your site.
Now the long answer:
First, to describe a standard and appropriate way to handle passwords. I'll get to the differences for you afterwards. (You may know all of this already, but it's worth restating.)
Start with a decent key-stretching algorithm, such as PBKDF2 (there are others, some even better, but PBKDF2 is ubiquitous and sufficient for most uses). Select a number of iterations depending on what is client-side environment is involved. For JavaScript, you'll want something like 1k-4k iterations. For languages with faster math, you can use 10k-100k.
The key stretcher will need a salt. I'll talk about the salt in a moment.
The client sends the password to the server. The server applies a fast hash (SHA-256 is nice) and compares that to the stored hash. (For setting the password, the server does the same thing; it accepts a PBKDF2 hash, applies SHA-256, and then stores it.)
All that is standard stuff. The question is the salt. The best salt is random, but no good for this. The second-best salt is built from service_id+user_id (i.e. use a unique identifier for the service and concatenate the username). Both of these make sure that every user's password hash is unique, even if their passwords are identical. But you don't want that.
So now finally to the core of your question. You want to use a per-service, but not per-user, static salt. So something like "com.example.mygreatapp" (obviously don't use that actual string; use a string based on your app). With a constant salt, all passwords on your service that are the same will stretch (PBKDF2) and hash (SHA256) to the same value and you can compare them without having any idea what the actual password is. But if your password database is stolen, attackers cannot compare the hashes in it to hashes in other sites' databases, even if they use the same algorithm (because they'll have a different salt).
The disadvantage of this scheme is exactly its goal: if two people on your site have the same password and an attacker steals your database and knows the password of one user, they know the password of the other user, too. That's the trade-off.

how salt can be implemented to prevent pre-computation dictionary attack on password

A salt makes every users password hash unique, and adding a salt to a password before hashing to protect against a dictionary attack. But how?

The tool you almost certainly want is called PBKDF2 (Password-Based Key Derivation Function 2). It's widely available, either under the name "pbkdf2" or "RFC 2898". PBKDF2 provides both salting (making two otherwise identical passwords different) and stretching (making it expensive to guess passwords).
Whatever system you are developing for probably has a function available that takes a password, a salt, a number of iterations, and an output size. Given those, it will output some string of bytes. There are several ways to actually make use of this depending on your situation (most notably are you dealing with local authentication or remote authentication?)
Most people are looking for remote authentication, so let's walk through a reasonable way to implement that using a mix of deterministic and random salts. (See further discussion below w/ #SilverlightFox.)
First, the high-level approach:
Hash on the client against a deterministic salt. The client should never send a bare password to the server. Users reuse their passwords all the time. You don't want to know their actual password. You'd rather never see it.
Salt randomly and stretch on the server and then compare.
Here's the actual breakdown:
Choose an app-specific component for your salt. For example, "net.robnapier.mygreatapp" might be my prefix.
Choose a user-specific component for your salt. The userid is usually ideal here.
Concatenate them to create your salt. For example, my salt might be "net.robnapier.mygreatapp:suejones#example.org". The actual salt does not matter too much. What matters is that it is at least "mostly" unique across all of your users and across all other sites that might also hash passwords from your users. The scheme I've given achieves that.
Choose a local number of iterations for PBKDF2. That number is almost certainly 1000. This is too few iterations, but is about all JavaScript can handle reasonably. The more iterations, the more secure the system, but the worse the performance. It's a tension.
Choose a length for your hash. 32 bytes is generally a good choice.
Choose a "PRF" if your system allows you to pick one. HMAC-SHA-256 is a good choice.
You now have all the basic pieces in place. Let's compute some hashes.
On the client, take the password and pass it through PBKDF2 with the above settings. That will give you 32 bytes to send to the server.
On the server, if this is the account creation, create 8 or 16 bytes of random data as your salt for this account. Save that in the database along with the username. Use that salt, and another set of iterations (usually 10,000 or 100,000 if you're not in Node) and apply PBKDF2 to the data that the user sent. Store that in the database. If you're testing the password, just read the salt from the database and reapply PBKDF2 to validate.
Everywhere I say "PBKDF2" here there are another options, probably the most common of which is scrypt (there is also bcrypt). The other options are technically better than PBKDF2. I don't think anyone would disagree with that. I usually recommend PBKDF2 because it's so ubiquitous and there's nothing really wrong with it. But if you have scrypt available, feel free to use that. The client and server do not have to use the same algorithm (the client can use PBKDF2 and the server can use scrypt if you like).

What's the md5 hash of "superCommonPassword"? That's easy to pre-calculate.
It's b77755edafab848ffcb9580307e97414
If you steal a password database and see that hash value, you know the password is probably "superCommonPassword".
What's the md5 hash ("aStringYouDontKnowUntilYouStealAPasswordDatabase" + "superCommonPassword")? Oh, you can't calculate that until you steal the database.
An unknown salt means pre-calculating hashes of common passwords is useless. An unknown salt per user means you need to calculate hashes of common passwords for each user. This slows down the attacker and increases his costs.
Don't use md5 for password hashing though. Use bcrypt or scrypt or PBKDF2.

Importance of salt when using Rfc2898DeriveBytes to create secure passwords from clear text passwords

I'd like to incorporate the encryption and decryption of files in one of my C# .NET apps. The scenario is simple: User A sends an AES256-encrypted file to user B. The clear text password is exchanged on a different channel (e.g. phone call or whatever).
From what I understand I should use Rfc2898DeriveBytes for converting the user's clear text password into a more secure password using maybe 10,000 rounds. (see this article).
What I don't understand is the role of salt in my scenario. Usually salt is used in hashing passwords to prevent dictionary attacks. But in my scenario the PBKDF2 algo is used to compensate weaknesses of short or easy to guess clear text passwords by adding extra calculations required by the PBKDF2-rounds.
If I choose a random salt then the receiver will need to know that salt also in order to decrypt correctly. If I use a constant salt, then hackers can easily reverse engineer my code and run brute force attacks using my constant salt (although they'll be really slow thanks to the PBKDF2 iterations).
From what I understand I have no choice but to use a constant salt in my scenario and enforce a good clear text password rule to make up for the weakness of constant salt. Is my assumption correct?

Salts, in the context of password hashing (and key derivation), are used to prevent precomputation attacks like rainbow tables.
Note that the salt must be different and unpredictable (preferably random) for every password. Also note that salts need not be secret – that's what the password is for. You gain no security by keeping the salt secret.
The recommended approach in your case is to generate a random salt every time a file is encrypted, and transmit the salt along with the ciphertext.
Is there a specific reason you're using AES-256 by the way? It's around 40% slower than AES-128 due to the extra rounds, and it offers no practical security benefit (particularly not in the case of password-based encryption).
It's also worth considering using a well-established standard like PGP rather than building your own protocol from cryptographic primitives, because building secure protocols is so hard that even experts don't always get it right.

Your assumption is correct. If they have access to the password, they will also have access to the salt. The BCrypt implementations I've seen put the number of iterations, the hash, and the salt all in the same result string!
The idea is: your hash should be secure even if the salt and number if iterations is known. (If we could always know that the salt and number of iterations and even the algorithm would be unknown to attackers, security would get a whole heck of a lot easier! Until attackers politely decline to read our salts, we must assume they will have access to them in the event of a breach.) So you're right, they can brute force it - if they have a few supercomputers and a couple million years of computing time at their disposal.

Is It Possible To Reconstruct a Cryptographic Hash's Key

We would like to cryptographically (SHA-256) hash a secret value in our database. Since we want to use this as a way to lookup individual records in our database, we cannot use a different random salt for each encrypted value.
My question is: given unlimited access to our database, and given that the attacker knows at least one secret value and hashed value pair, is it possible for the attacker to reverse engineer the cryptographic key? IE, would the attacker then be able to reverse all hashes and determine all secret values?
It seems like this defeats the entire purpose of a cryptographic hash if it is the case, so perhaps I'm missing something.

There are no published "first pre-image" attacks against SHA-256. Without such an attack to open a shortcut, it is impossible for an attacker to the recover a secret value from its SHA-256 hash.
However, the mention of a "secret key" might indicate some confusion about hashes. Hash algorithms don't use a key. So, if an attacker were able to attack one "secret-value–hash-value" pair, he wouldn't learn a "key" that would enable him to easily invert the rest of the hash values.
When a hash is attacked successfully, it is usually because the original message was from a small space. For example, most passwords are chosen from a relatively short list of real words, perhaps with some simple permutations. So, rather than systematically testing every possible password, the attacker starts with an ordered list of the few billion most common passwords. To avoid this, it's important to choose the "secret value" randomly from a large space.
There are message authentication algorithms that hash a secret key together with some data. These algorithms are used to protect the integrity of the message against tampering. But they don't help thwart pre-image attacks.

In short, yes.

No, a SHA hash is not reversible (at least not easily). When you Hash something if you need to reverse it you need to reconstruct the hash. This is usually done with a private (salt) and public key.
For example, if I'm trying to prevent access based off my user id. I would hash my user id and the salt. Let say MD5 for example. My user id is "12345" and the salt is "abcde"
So I will hash the string "12345_abcde", which return a hash of "7b322f78afeeb81ad92873b776558368"
Now I will pass to the validating application the hash and the public key, "12345" which is the public key and the has.
The validating application, knows the salt, so it hashes the same values. "12345_abcde", which in turn would generate the exact same hash. I then compare the hash i validated with the one passed off and they match. If I had somehow modified the public key without modifying the hash, a different has would have been generated resulting in a mismatch.

Yes it's possible, but not in this lifetime.

Modern brute-force attacks using multiple GPUs could crack this in short order. I recommend you follow the guidelines for password storage for this application. Here are the current password storage guidelines from OWASP. Currently, they recommend a long salt value, and PBKDF2 with 64,000 iterations, which iteratively stretches the key and makes it computationally complex to brute force the input values. Note that this will also make it computationally complex for you to generate your key values, but the idea is that you will be generating keys far less frequently than an attacker would have to. That said, your design requires many more key derivations than a typical password storage/challenge application, so your design may be fatally flawed. Also keep in mind that the iteration count should doubled every 18 months to make the computational complexity follow Moore's Law. This means that your system would need some way of allowing you to rehash these values (possibly by combining hash techniques). Over time, you will find that old HMAC functions are broken by cryptanalysts, and you need to be ready to update your algorithms. For example, a single iteration of MD5 or SHA-1 used to be sufficient, but it is not anymore. There are other HMAC functions that could also suit your needs that wouldn't require PBKDF2 (such as bcrypt or scrypt), but PBKDF2 is currently the industry standard that has received the most scrutiny. One could argue that bcrypt or scrypt would also be suitable, but this is yet another reason why a pluggable scheme should be used to allow you to upgrade HMAC functions over time.

Difference between Hashing a Password and Encrypting it

The current top-voted to this question states:
Another one that's not so much a security issue, although it is security-related, is complete and abject failure to grok the difference between hashing a password and encrypting it. Most commonly found in code where the programmer is trying to provide unsafe "Remind me of my password" functionality.
What exactly is this difference? I was always under the impression that hashing was a form of encryption. What is the unsafe functionality the poster is referring to?

Hashing is a one way function (well, a mapping). It's irreversible, you apply the secure hash algorithm and you cannot get the original string back. The most you can do is to generate what's called "a collision", that is, finding a different string that provides the same hash. Cryptographically secure hash algorithms are designed to prevent the occurrence of collisions. You can attack a secure hash by the use of a rainbow table, which you can counteract by applying a salt to the hash before storing it.
Encrypting is a proper (two way) function. It's reversible, you can decrypt the mangled string to get original string if you have the key.
The unsafe functionality it's referring to is that if you encrypt the passwords, your application has the key stored somewhere and an attacker who gets access to your database (and/or code) can get the original passwords by getting both the key and the encrypted text, whereas with a hash it's impossible.
People usually say that if a cracker owns your database or your code he doesn't need a password, thus the difference is moot. This is naïve, because you still have the duty to protect your users' passwords, mainly because most of them do use the same password over and over again, exposing them to a greater risk by leaking their passwords.

Hashing is a one-way function, meaning that once you hash a password it is very difficult to get the original password back from the hash. Encryption is a two-way function, where it's much easier to get the original text back from the encrypted text.
Plain hashing is easily defeated using a dictionary attack, where an attacker just pre-hashes every word in a dictionary (or every combination of characters up to a certain length), then uses this new dictionary to look up hashed passwords. Using a unique random salt for each hashed password stored makes it much more difficult for an attacker to use this method. They would basically need to create a new unique dictionary for every salt value that you use, slowing down their attack terribly.
It's unsafe to store passwords using an encryption algorithm because if it's easier for the user or the administrator to get the original password back from the encrypted text, it's also easier for an attacker to do the same.

As shown in the above image, if the password is encrypted it is always a hidden secret where someone can extract the plain text password. However when password is hashed, you are relaxed as there is hardly any method of recovering the password from the hash value.
Extracted from Encrypted vs Hashed Passwords - Which is better?
Is encryption good?
Plain text passwords can be encrypted using symmetric encryption algorithms like DES, AES or with any other algorithms and be stored inside the database. At the authentication (confirming the identity with user name and password), application will decrypt the encrypted password stored in database and compare with user provided password for equality. In this type of an password handling approach, even if someone get access to database tables the passwords will not be simply reusable. However there is a bad news in this approach as well. If somehow someone obtain the cryptographic algorithm along with the key used by your application, he/she will be able to view all the user passwords stored in your database by decryption. "This is the best option I got", a software developer may scream, but is there a better way?
Cryptographic hash function (one-way-only)
Yes there is, may be you have missed the point here. Did you notice that there is no requirement to decrypt and compare? If there is one-way-only conversion approach where the password can be converted into some converted-word, but the reverse operation (generation of password from converted-word) is impossible. Now even if someone gets access to the database, there is no way that the passwords be reproduced or extracted using the converted-words. In this approach, there will be hardly anyway that some could know your users' top secret passwords; and this will protect the users using the same password across multiple applications. What algorithms can be used for this approach?

I've always thought that Encryption can be converted both ways, in a way that the end value can bring you to original value and with Hashing you'll not be able to revert from the end result to the original value.

Hashing algorithms are usually cryptographic in nature, but the principal difference is that encryption is reversible through decryption, and hashing is not.
An encryption function typically takes input and produces encrypted output that is the same, or slightly larger size.
A hashing function takes input and produces a typically smaller output, typically of a fixed size as well.
While it isn't possible to take a hashed result and "dehash" it to get back the original input, you can typically brute-force your way to something that produces the same hash.
In other words, if a authentication scheme takes a password, hashes it, and compares it to a hashed version of the requires password, it might not be required that you actually know the original password, only its hash, and you can brute-force your way to something that will match, even if it's a different password.
Hashing functions are typically created to minimize the chance of collisions and make it hard to just calculate something that will produce the same hash as something else.

Hashing:
It is a one-way algorithm and once hashed can not rollback and this is its sweet point against encryption.
Encryption
If we perform encryption, there will a key to do this. If this key will be leaked all of your passwords could be decrypted easily.
On the other hand, even if your database will be hacked or your server admin took data from DB and you used hashed passwords, the hacker will not able to break these hashed passwords. This would actually practically impossible if we use hashing with proper salt and additional security with PBKDF2.
If you want to take a look at how should you write your hash functions, you can visit here.
There are many algorithms to perform hashing.
MD5 - Uses the Message Digest Algorithm 5 (MD5) hash function. The output hash is 128 bits in length. The MD5 algorithm was designed by Ron Rivest in the early 1990s and is not a preferred option today.
SHA1 - Uses Security Hash Algorithm (SHA1) hash published in 1995. The output hash is 160 bits in length. Although most widely used, this is not a preferred option today.
HMACSHA256, HMACSHA384, HMACSHA512 - Use the functions SHA-256, SHA-384, and SHA-512 of the SHA-2 family. SHA-2 was published in 2001. The output hash lengths are 256, 384, and 512 bits, respectively,as the hash functions’ names indicate.

Ideally you should do both.
First Hash the pass password for the one way security. Use a salt for extra security.
Then encrypt the hash to defend against dictionary attacks if your database of password hashes is compromised.

As correct as the other answers may be, in the context that the quote was in, hashing is a tool that may be used in securing information, encryption is a process that takes information and makes it very difficult for unauthorized people to read/use.

Here's one reason you may want to use one over the other - password retrieval.
If you only store a hash of a user's password, you can't offer a 'forgotten password' feature.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string