I just looked at the implementation of password hashing in Django and noticed that it prepends the salt, so the hash is created like sha1(salt + password), for example.
In my opinion, salts are good for two purposes
Preventing rainbow table lookups
Alright, prepending/appending the salt doesn't really make a difference for rainbow tables.
Hardening against brute-force/dictionary attacks
This is what my question is about. If someone wants to attack a single password from a stolen password database, he needs to try a lot of passwords (e.g. dictionary words or [A-Za-z0-9] permutations).
Let's assume my password is "abcdef", the salt is "salt" and the attacker tries all [a-z]{6} passwords.
With a prepended salt, one must calculate hash("salt"), store the hash algorithm's state and then go on from that point for each permutation. That is, going through all permutations would take 26^6 copy-hash-algorithm's-state-struct operations and 26^6 hash(permutation of [a-z]{6}) operations. As copying the hash algorithm's state is freakin fast, the salt hardly adds any complexity here, no matter how long it is.
But, with an appended salt, the attacker must calculate hash(permutation of [a-z]{6} + salt) for each permutation, leading to 26^10 hash operations. So obviously, appending salts adds complexity depending on the salt length.
I don't believe this is for historical reasons because Django is rather new. So what's the sense in prepending salts?
Do neither, use a standard Key derivation function like PBKDF2. Never roll your own crypto. It's much too easy to get it wrong. PBKDF2 uses many iterations to protect against bruteforce which is a much bigger improvement than the simple ordering.
And your trick pre-calculating the internal state of the hash-function after processing the salt probably isn't that easy to pull off unless the length of the salt corresponds to the block-length of the underlying block-cypher.
If salt is prepended, attacker can make hash state database for salts (assuming salt is long enough to make a hashing step) and then run dictionary attack.
But if salt is appended, attacker can make such database for password dictionary and additionally compute only salt's hash. Given that salt is usually shorter than password (like 4 chars salt and 8 char password), it will be faster attack.
You are making a valid point, of course; but , really, if you want to increase time it takes to calculate hash, just use longer hash. SHA256 instead of SHA1, for example.
Related
I have created hash of some fields and storing in database using 'crypto' npm.
var crypto = require('crypto');
var hashFirtName = crypto.createHash('md5').update(orgFirtName).digest("hex"),
QUESTION: How can I get the original value from the hash value when needed?
The basic definition of a "hash" is that it's one-way. You cannot get the originating value from the hash. Mostly because a single value will always produce the same hash, but a hash isn't always related to a single value, since most hash functions return a string of finite/fixed length.
Additional Information
I wanted to provide some additional information, as I felt I may have left this too short.
As #xShirase pointed out in his answer, you can use a table to reverse a Hash. These are known as Rainbow Tables. You can generate them or download them from the internet, usually from nefarious sources [ahem].
To expand on my other statement about a hash value possibly relating to multiple original values, lets take a look at MD5.
MD5 is a 128-bit hash. This means it can hold 2^128 bits, or (unsigned) 0 through 340,282,366,920,938,463,463,374,607,431,768,211,455. That's a REALLY big number. So, for any given input you have a 1 in 340,282,366,920,938,463,463,374,607,431,768,211,456 chance that it will collide with the same hash result of another input value.
Now, for simple data like passwords, the chances are astronomical. And for those purposes, who cares? Most of the time you are simply taking an input, hashing it, then comparing the hashes. For reasons I will not get into, when using hashes for passwords you should ALWAYS store the data already hashed. You don't want to leave plain-text passwords just lying about. Keep in mind that a hash is NOT the same as encryption.
Hashes can also be used for other reasons. For instance, they can be used to create a fast-lookup data structure known as a Hash Table. A Hash Table uses a hash as sort of a "primary key", allowing it to search a huge set of data in relatively few number of instructions, approaching O(1) (On-order of 1). Depending on the implementation of the Hash Table and the hashing algorithm, you have to deal with collisions, usually by means of a sorted list. This is why the Hash Table isn't "exactly" O(1), but close. If your hash algorithm is bad, the performance of your Hash Table can begin to approach O(n).
Another use for a hash it to tell if a file's contents have been altered, or match an original. You will see many OSS project provide binary downloads that also have an MD5 and/or SHA-2 hash values. This is so you can download the files, do a hash locally, and compare the results against theirs to make sure the file you are getting is the file they posted. Again, since the odds of two files matching another is 1 in 340,282,366,920,938,463,463,374,607,431,768,211,456, the odds of a hacker successfully generating a file of the same size with a bad payload that hashes to the exact same MD5/SHA-2 hash is pretty low.
Hope this discussion can help either you or someone in the future.
If you could get the original value from the hash, it wouldn't be that secure.
If you need to compare a value to what you have previously stored as a hash, you can create a hash for this value and compare the hashes.
In practice there is only one way to 'decrypt' a hash. It involves using a massive database of decrypted hashes, and compare them to yours. An example here
If this password's ( qwqwqw123456 ) hash is $2a$07$sijdbfYKmgWdcGhPPn$$$.C98C0wmy6jsqA3fUKODD0OFBKJkHdn.
What is the password of this hash $2a$07$sijdbfYKmgWdcGhPPn$$$.9PTdICzon3EUNHZvOOXgTY4z.UTQTqG
And Can I know which hash algorithm is it ?
You could try to guess which algorithm was used,
depending on the format and length of the hash,
your known value etc. but there is no definitive way to know it.
And the purpose of any "hash" function is
that it is NOT reversible/decryptable/whatever.
Depending on some factors you could try to guess the original value too
(Brute force attack: Try to hash all possible values and check which hash
is equal to yours) but, depending on the count of possibilities,
the used algortihm etc. that could take millions of years. (you could also be lucky
and get the correct value within short time, but that´s unlikely).
There are other things than bruteforce-ing, but in the end,
it´s pretty much impossible to reverse a good hash function
I've read a lot of posting here about Rfc2898DeriveBytes() and it seems that in all of them, the salt is pre-calculated and passed to the constructor. However, there is a constructor that accepts a salt length input, and the salt will be calculated for you. It is available afterwards in the Salt property.
Any disadvantage to letting the method calc the salt? In my case, the usage is for password hashing.
Specifying the salt length instead of the salt itself may reduce the chance of choosing the salt insecurely when deriving a new key (or obscuring a password for storage). The salt should be chosen by a cryptographic random bit generator, and should be changed each time the password is updated. Presumably, this constructor will use a high-quality RNG that was properly seeded. Leaving that up the the application allows for mistakes at worst, and at best creates unnecessary complexity.
Of course, if you are recovering a key, for example to check user input against the stored password, you'd need to specify the salt that was used initially.
I am salting newly created passwords before hashing them with an encryption algorithm. I generate my salts using a random number function.
Are you compromising security if your salts are only comprised of numbers (with no letters) or does this make no difference at all?
A salt should be unique (ideally for every password in the world), and unpredictable. The best you can do with a deterministic computer is, to get a random number, and hope that the returned value is nearly unique. So the more possible combinations you have, the bigger is the chance that the salt is unique.
Some hash algorithms define a number and an alphabet of accepted characters. PHP's BCrypt for example, expects a salt containing 22 characters from this alphabet:
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
You get the most possible combinations, using all characters of the alphabet, and not only the characters 0-9. Of course a longer salt with a small alphabet (0-9) can have as much combinations, as a shorter salt with a big alphabet (0-9,a-z,...).
To make it short, use all possible characters, and as many characters as your hash algorithm expects.
P.S: If you use a key-derivation function like BCrypt (and you really should), then you cannot salt the password befor hashing, instead you have to pass the salt to the hash function.
During a discussion with a couple of other people, I read the argument that
sha512(salt + username + password) is bad,
sha512(username + password) is worse and
sha512(password) is plain idiotic.
While I partly agree, what's really the best security? Is there anything safer than using an user unique salt along with a slow hashing method such as SHA512? What's the real way to go? Argue on!
Please edit the title if you find it bad.
Generate random salt for each password.
Avoid MD5, and even SHA-1.
Use a slow hashing algorithm; SHA-256 seems to be a good choice for now.
Password storage is one of those rare occasions where there is some benefit to having your own (overall) algorithm. Consider an attacker with a rainbow table; if your password storage algorithm varies from the one used to generate their rainbow table enough to change the generated values, that rainbow table is of no use. The attacker would need to know your algorithm, then generate a new table. If you choose a slow hashing algorithm, generating a new table is very expensive.
By "overall" algorithm, I mean the complete definition of how you transform the plaintext password into the stored value. E.g. SHA-256("mypassword" + "[[" + 40-char-random-alphanum-salt + "]]"). If you change that to use angle brackets instead of square brackets, you've changed the rainbow table necessary to exploit your stored passwords. Note that I'm not advocating writing your own hash algorithm; you should still choose a cryptographically secure hash algorithm.
See this article by the author of MD5. He makes the two main points I repeated above: 1) if you use a fast hashing algorithm, you're missing the point, and 2) reuse of overall algorithms allows re-use of rainbow tables.
When discussing the recent LinkedIn leak, somebody brought up this link about bcrypt. I think I agree... we should be using functions that increase the calculation time exponentially according to a factor. That's the only way we can beat people trying to use clusters or GPUs to do their hashing calculations.
My understanding is, that repeated hashing (for computational cost) & a good random salt, should defeat all but seriously determined cryptographic attackers.
Hashing passwords in the database, and over the network, avoids plaintext being recoverable (and usable elsewhere) by a snooper or attacker who does get in.
Basically this is more or less the scheme, used by the Wordpress authentication:
var SALT = 64 random characters;
var NUM_HASHES = about 1000; // can be randomized
var hashedResult = inputPassword;
for (int i = 0; i < NUM_HASHES; i++) {
var dataToHash = SALT + hashedResult;
hashedResult = secureHash( dataToHash);
}
//... can now store or send.
This use of a random salt, and looping hash, defeats any rainbow tables or single-level 'hash collision', 'hash weakness' attack. Only brute-forcing the complete keyspace, each key through 1000 iterations of the hash function, is believed to defeat it :)