Data type that mimics BINARY(64) in YugabyteDB - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I was wondering if anyone knows the proper data type that mimics BINARY(64) in SQL for storing hashed passwords in YugabyteDB?
I am looking to hash a password, store that and then generate a salt randomly on insert.

You can use CHAR(60) type. This is suggested alongside BINARY(64). Reference: What data type to use for hashed password field and what length?
If your hash is not encoded as Base64 or hex or whatever, use BYTEA.

Related

Is knowing the salt used in Rfc2898DeriveBytes a security risk?

Sorry if this is a duplicate but I could not find anything that was not in some way related to DB encryption. My problem is not with a DB. I have a set of files encrypted using RijndaelManaged. In the encrypting code I am using Rfc2898DeriveBytes to generate the key given a password and a salt and a certain number of iterations. The salt, as it happens, is not stored securely (just a string).
I was wondering: people with access to my code could easily get the salt (disassembling the dll for example) and of course the number of iterations.
What is the security risk of this, given for granted that the password in itself is not so easily retrievable (yes let's give it for granted now)?
I am assuming that without the password decrypting would be impossible, or at least it would require some time to brute force... or is it some analysis of the decrypted files possible?
An obvious concern is that stolen code is less easily detectable than a stolen DB...
In short, the salt is fine to be stored in clear text. However, you should store a unique salt for each password in your file(s)(see this). That way no one could create a Rainbow table for all of the passwords stored in the file(s)(note that they could still create a rainbow table for one password in the file).
For more context on the whole hashing/password storing process see:
Hashing

Keeping Encrypted Strings Safe with Multiple Encrypts

A system I have been working on for a while requires DPA, and asked a question about keeping the data passcodes safe. I have since them come up with an idea to fix that, which involves having the data decrypt password for the database stored on the database, but have that encrypted with validated users password (which is stored as an MD5 key) after a different type of hashing.
The question is that does encrypting the password multiple times with different keys (at least 20 characters long, with possible extension) make it considerably easier to decrypt without prior knowledge or information on the password?
No, in general a good cipher should have the property that you cannot retrieve data even if you know the plaintext. Having the data encrypted should not have much influence, geven a good cipher and a big enough key space.
First off, MD5 is no longer considered a secure encryption algorithm. See http://www.kb.cert.org/vuls/id/836068 for details.
Secondly, the encryption key for the data should not be stored in the database itself. It should be stored separately. That way there are at least two things that have to be obtained (the database file and the key) to decrypt the data. If the key is stored in the database itself, it probably wouldn't take long to find it once someone has the database file.
Find a separate method for storing the key. It should either be coded into the application or stored in a file that is obfuscated in some way.

How can bcrypt have built-in salts?

Coda Hale's article "How To Safely Store a Password" claims that:
bcrypt has salts built-in to prevent rainbow table attacks.
He cites this paper, which says that in OpenBSD's implementation of bcrypt:
OpenBSD generates the 128-bit bcrypt salt from an arcfour
(arc4random(3)) key stream, seeded with random data the kernel
collects from device timings.
I don't understand how this can work. In my conception of a salt:
It needs to be different for each stored password, so that a separate rainbow table would have to be generated for each
It needs to be stored somewhere so that it's repeatable: when a user tries to log in, we take their password attempt, repeat the same salt-and-hash procedure we did when we originally stored their password, and compare
When I'm using Devise (a Rails login manager) with bcrypt, there is no salt column in the database, so I'm confused. If the salt is random and not stored anywhere, how can we reliably repeat the hashing process?
In short, how can bcrypt have built-in salts?
This is bcrypt:
Generate a random salt. A "cost" factor has been pre-configured. Collect a password.
Derive an encryption key from the password using the salt and cost factor. Use it to encrypt a well-known string. Store the cost, salt, and cipher text. Because these three elements have a known length, it's easy to concatenate them and store them in a single field, yet be able to split them apart later.
When someone tries to authenticate, retrieve the stored cost and salt. Derive a key from the input password, cost and salt. Encrypt the same well-known string. If the generated cipher text matches the stored cipher text, the password is a match.
Bcrypt operates in a very similar manner to more traditional schemes based on algorithms like PBKDF2. The main difference is its use of a derived key to encrypt known plain text; other schemes (reasonably) assume the key derivation function is irreversible, and store the derived key directly.
Stored in the database, a bcrypt "hash" might look something like this:
$2a$10$vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa
This is actually three fields, delimited by "$":
2a identifies the bcrypt algorithm version that was used.
10 is the cost factor; 210 iterations of the key derivation function are used (which is not enough, by the way. I'd recommend a cost of 12 or more.)
vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa is the salt and the cipher text, concatenated and encoded in a modified Base-64. The first 22 characters decode to a 16-byte value for the salt. The remaining characters are cipher text to be compared for authentication.
This example is taken from the documentation for Coda Hale's ruby implementation.
I believe that phrase should have been worded as follows:
bcrypt has salts built into the generated hashes to prevent rainbow table attacks.
The bcrypt utility itself does not appear to maintain a list of salts. Rather, salts are generated randomly and appended to the output of the function so that they are remembered later on (according to the Java implementation of bcrypt). Put another way, the "hash" generated by bcrypt is not just the hash. Rather, it is the hash and the salt concatenated.
This is a simple terms...
Bcrypt does not have a database it stores the salt...
The salt is added to the hash in base64 format....
The question is how does bcrypt verifies the password when it has no database...?
What bcrypt does is that it extract the salt from the password hash... Use the salt extracted to encrypt the plain password and compares the new hash with the old hash to see if they are the same...
To make things even more clearer,
Registeration/Login direction ->
The password + salt is encrypted with a key generated from the: cost, salt and the password. we call that encrypted value the cipher text. then we attach the salt to this value and encoding it using base64. attaching the cost to it and this is the produced string from bcrypt:
$2a$COST$BASE64
This value is stored eventually.
What the attacker would need to do in order to find the password ? (other direction <- )
In case the attacker got control over the DB, the attacker will decode easily the base64 value, and then he will be able to see the salt. the salt is not secret. though it is random.
Then he will need to decrypt the cipher text.
What is more important : There is no hashing in this process, rather CPU expensive encryption - decryption. thus rainbow tables are less relevant here.
Lets imagine a table that has 1 hashed password. If hacker gets access he would know the salt but he will have to calculate a big list for all the common passwords and compare after each calculation. This will take time and he would have only cracked 1 password.
Imagine a second hashed password in the same table. The salt is visible but the same above calculation needs to happen again to crack this one too because the salts are different.
If no random salts were used, it would have been much easier, why? If we use simple hashing we can just generate hashes for common passwords 1 single time (rainbow table) and just do a simple table search, or simple file search between the db table hashes and our pre-calculated hashes to find the plain passwords.

Encrypting/Hashing plain text passwords in database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
I've inherited a web app that I've just discovered stores over 300,000 usernames/passwords in plain text in a SQL Server database. I realize that this is a Very Bad Thing™.
Knowing that I'll have to update the login and password update processes to encrypt/decrypt, and with the smallest impact on the rest of the system, what would you recommend as the best way to remove the plain text passwords from the database?
Any help is appreciated.
Edit: Sorry if I was unclear, I meant to ask what would be your procedure to encrypt/hash the passwords, not specific encryption/hashing methods.
Should I just:
Make a backup of the DB
Update login/update password code
After hours, go through all records in the users table hashing the password and replacing each one
Test to ensure users can still login/update passwords
I guess my concern is more from the sheer number of users so I want to make sure I'm doing this correctly.
EDIT (2016): use Argon2, scrypt, bcrypt, or PBKDF2, in that order of preference. Use as large a slowdown factor as is feasible for your situation. Use a vetted existing implementation. Make sure you use a proper salt (although the libraries you're using should be making sure of this for you).
When you hash the passwords use DO NOT USE PLAIN MD5.
Use PBKDF2, which basically means using a random salt to prevent rainbow table attacks, and iterating (re-hashing) enough times to slow the hashing down - not so much that your application takes too long, but enough that an attacker brute-forcing a large number of different password will notice
From the document:
Iterate at least 1000 times, preferably more - time your implementation to see how many iterations are feasible for you.
8 bytes (64 bits) of salt are sufficient, and the random doesn't need to be secure (the salt is unencrypted, we're not worried someone will guess it).
A good way to apply the salt when hashing is to use HMAC with your favorite hash algorithm, using the password as the HMAC key and the salt as the text to hash (see this section of the document).
Example implementation in Python, using SHA-256 as the secure hash:
EDIT: as mentioned by Eli Collins this is not a PBKDF2 implementation. You should prefer implementations which stick to the standard, such as PassLib.
from hashlib import sha256
from hmac import HMAC
import random
def random_bytes(num_bytes):
return "".join(chr(random.randrange(256)) for i in xrange(num_bytes))
def pbkdf_sha256(password, salt, iterations):
result = password
for i in xrange(iterations):
result = HMAC(result, salt, sha256).digest() # use HMAC to apply the salt
return result
NUM_ITERATIONS = 5000
def hash_password(plain_password):
salt = random_bytes(8) # 64 bits
hashed_password = pbkdf_sha256(plain_password, salt, NUM_ITERATIONS)
# return the salt and hashed password, encoded in base64 and split with ","
return salt.encode("base64").strip() + "," + hashed_password.encode("base64").strip()
def check_password(saved_password_entry, plain_password):
salt, hashed_password = saved_password_entry.split(",")
salt = salt.decode("base64")
hashed_password = hashed_password.decode("base64")
return hashed_password == pbkdf_sha256(plain_password, salt, NUM_ITERATIONS)
password_entry = hash_password("mysecret")
print password_entry # will print, for example: 8Y1ZO8Y1pi4=,r7Acg5iRiZ/x4QwFLhPMjASESxesoIcdJRSDkqWYfaA=
check_password(password_entry, "mysecret") # returns True
The basic strategy is to use a key derivation function to "hash" the password with some salt. The salt and the hash result are stored in the database. When a user inputs a password, the salt and their input are hashed in the same way and compared to the stored value. If they match, the user is authenticated.
The devil is in the details. First, a lot depends on the hash algorithm that is chosen. A key derivation algorithm like PBKDF2, based on a hash-based message authentication code, makes it "computationally infeasible" to find an input (in this case, a password) that will produce a given output (what an attacker has found in the database).
A pre-computed dictionary attack uses pre-computed index, or dictionary, from hash outputs to passwords. Hashing is slow (or it's supposed to be, anyway), so the attacker hashes all of the likely passwords once, and stores the result indexed in such a way that given a hash, he can lookup a corresponding password. This is a classic tradeoff of space for time. Since password lists can be huge, there are ways to tune the tradeoff (like rainbow tables), so that an attacker can give up a little speed to save a lot of space.
Pre-computation attacks are thwarted by using "cryptographic salt". This is some data that is hashed with the password. It doesn't need to be a secret, it just needs to be unpredictable for a given password. For each value of salt, an attacker would need a new dictionary. If you use one byte of salt, an attacker needs 256 copies of their dictionary, each generated with a different salt. First, he'd use the salt to lookup the correct dictionary, then he'd use the hash output to look up a usable password. But what if you add 4 bytes? Now he needs 4 billion copies of the the dictionary. By using a large enough salt, a dictionary attack is precluded. In practice, 8 to 16 bytes of data from a cryptographic quality random number generator makes a good salt.
With pre-computation off the table, an attacker has compute the hash on each attempt. How long it takes to find a password now depends entirely on how long it takes to hash a candidate. This time is increased by iteration of the hash function. The number iterations is generally a parameter of the key derivation function; today, a lot of mobile devices use 10,000 to 20,000 iterations, while a server might use 100,000 or more. (The bcrypt algorithm uses the term "cost factor", which is a logarithmic measure of the time required.)
I would imagine you will have to add a column to the database for the encrypted password then run a batch job over all records which gets the current password, encrypts it (as others have mentiond a hash like md5 is pretty standard edit: but should not be used on its own - see other answers for good discussions), stores it in the new column and checks it all happened smoothly.
Then you will need to update your front-end to hash the user-entered password at login time and verify that vs the stored hash, rather than checking plaintext-vs-plaintext.
It would seem prudent to me to leave both columns in place for a little while to ensure that nothing hinky has gone on, before eventually removing the plaintext passwords all-together.
Don't forget also that anytime the password is acessed the code will have to change, such as password change / reminder requests. You will of course lose the ability to email out forgotten passwords, but this is no bad thing. You will have to use a password reset system instead.
Edit:
One final point, you might want to consider avoiding the error I made on my first attempt at a test-bed secure login website:
When processing the user password, consider where the hashing takes place. In my case the hash was calculated by the PHP code running on the webserver, but the password was transmitted to the page from the user's machine in plaintext! This was ok(ish) in the environment I was working in, as it was inside an https system anyway (uni network). But, in the real world I imagine you would want to hash the password before it leaves the user system, using javascript etc. and then transmit the hash to your site.
Follow Xan's advice of keeping the current password column around for a while so if things go bad, you can rollback quick-n-easy.
As far as encrypting your passwords:
use a salt
use a hash algorithm that's meant for passwords (ie., - it's slow)
See Thomas Ptacek's Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes for some details.
I think you should do the following:
Create a new column called HASHED_PASSWORD or something similar.
Modify your code so that it checks for both columns.
Gradually migrate passwords from the non-hashed table to the hashed one. For example, when a user logs in, migrate his or her password automatically to the hashed column and remove the unhashed version. All newly registered users will have hashed passwords.
After hours, you can run a script which migrates n users a time
When you have no more unhashed passwords left, you can remove your old password column (you may not be able to do so, depends on the database you are using). Also, you can remove the code to handle the old passwords.
You're done!
As the others mentioned, you don't want to decrypt if you can help it. Standard best practice is to encrypt using a one-way hash, and then when the user logs in hash their password to compare it.
Otherwise you'll have to use a strong encryption to encrypt and then decrypt. I'd only recommend this if the political reasons are strong (for example, your users are used to being able to call the help desk to retrieve their password, and you have strong pressure from the top not to change that). In that case, I'd start with encryption and then start building a business case to move to hashing.
For authentication purposes you should avoid storing the passwords using reversible encryption, i.e. you should only store the password hash and check the hash of the user-supplied password against the hash you have stored. However, that approach has a drawback: it's vulnerable to rainbow table attacks, should an attacker get hold of your password store database.
What you should do is store the hashes of a pre-chosen (and secret) salt value + the password. I.e., concatenate the salt and the password, hash the result, and store this hash. When authenticating, do the same - concatenate your salt value and the user-supplied password, hash, then check for equality. This makes rainbow table attacks unfeasible.
Of course, if the user send passwords across the network (for example, if you're working on a web or client-server application), then you should not send the password in clear text across, so instead of storing hash(salt + password) you should store and check against hash(salt + hash(password)), and have your client pre-hash the user-supplied password and send that one across the network. This protects your user's password as well, should the user (as many do) re-use the same password for multiple purposes.
Encrypt using something like MD5, encode it as a hex string
You need a salt; in your case, the username can be used as the salt (it has to be unique, the username should be the most unique value available ;-)
use the old password field to store the MD5, but tag the MD5 (i.e.g "MD5:687A878....") so that old (plain text) and new (MD5) passwords can co-exist
change the login procedure to verify against the MD5 if there is an MD5, and against the plain password otherwise
change the "change password" and "new user" functions to create MD5'ed passwords only
now you can run the conversion batch job, which might take as long as needed
after the conversion has been run, remove the legacy-support
Step 1: Add encrypted field to database
Step 2: Change code so that when password is changed, it updates both fields but logging in still uses old field.
Step 3: Run script to populate all the new fields.
Step 4: Change code so that logging in uses new field and changing passwords stops updating old field.
Step 5: Remove unencrypted passwords from database.
This should allow you to accomplish the changeover without interruption to the end user.
Also:
Something I would do is name the new database field something that is completely unrelated to password like "LastSessionID" or something similarly boring. Then instead of removing the password field, just populate with hashes of random data. Then, if your database ever gets compromised, they can spend all the time they want trying to decrypt the "password" field.
This may not actually accomplish anything, but it's fun thinking about someone sitting there trying to figure out worthless information
As with all security decisions, there are tradeoffs. If you hash the password, which is probably your easiest move, you can't offer a password retrieval function that returns the original password, nor can your staff look up a person's password in order to access their account.
You can use symmetric encryption, which has its own security drawbacks. (If your server is compromised, the symmetric encryption key may be compromised also).
You can use public-key encryption, and run password retrieval/customer service on a separate machine which stores the private key in isolation from the web application. This is the most secure, but requires a two-machine architecture, and probably a firewall in between.
MD5 and SHA1 have shown a bit of weakness (two words can result in the same hash) so using SHA256-SHA512 / iterative hashes is recommended to hash the password.
I would write a small program in the language that the application is written in that goes and generates a random salt that is unique for each user and a hash of the password. The reason I tend to use the same language as the verification is that different crypto libraries can do things slightly differently (i.e. padding) so using the same library to generate the hash and verify it eliminates that risk. This application could also then verify the login after the table has been updated if you want as it knows the plain text password still.
Don't use MD5/SHA1
Generate a good random salt (many crypto libraries have a salt generator)
An iterative hash algorithm as orip recommended
Ensure that the passwords are not transmitted in plain text over the wire
I would like to suggest one improvement to the great python example posted by Orip. I would redefine the random_bytes function to be:
def random_bytes(num_bytes):
return os.urandom(num_bytes)
Of course, you would have to import the os module. The os.urandom function provides a random sequence of bytes that can be safely used in cryptographic applications. See the reference help of this function for further details.
To hash the password you can use the HashBytes function. Returns a varbinary, so you'd have to create a new column and then delete the old varchar one.
Like
ALTER TABLE users ADD COLUMN hashedPassword varbinary(max);
ALTER TABLE users ADD COLUMN salt char(10);
--Generate random salts and update the column, after that
UPDATE users SET hashedPassword = HashBytes('SHA1',salt + '|' + password);
Then you modify the code to validate the password, using a query like
SELECT count(*) from users WHERE hashedPassword =
HashBytes('SHA1',salt + '|' + <password>)
where <password> is the value entered by the user.
I'm not a security expert, but i htink the current recommendation is to use bcrypt/blowfish or a SHA-2 variant, not MD5 / SHA1.
Probably you need to think in terms of a full security audit, too

Difference between Hashing a Password and Encrypting it

The current top-voted to this question states:
Another one that's not so much a security issue, although it is security-related, is complete and abject failure to grok the difference between hashing a password and encrypting it. Most commonly found in code where the programmer is trying to provide unsafe "Remind me of my password" functionality.
What exactly is this difference? I was always under the impression that hashing was a form of encryption. What is the unsafe functionality the poster is referring to?
Hashing is a one way function (well, a mapping). It's irreversible, you apply the secure hash algorithm and you cannot get the original string back. The most you can do is to generate what's called "a collision", that is, finding a different string that provides the same hash. Cryptographically secure hash algorithms are designed to prevent the occurrence of collisions. You can attack a secure hash by the use of a rainbow table, which you can counteract by applying a salt to the hash before storing it.
Encrypting is a proper (two way) function. It's reversible, you can decrypt the mangled string to get original string if you have the key.
The unsafe functionality it's referring to is that if you encrypt the passwords, your application has the key stored somewhere and an attacker who gets access to your database (and/or code) can get the original passwords by getting both the key and the encrypted text, whereas with a hash it's impossible.
People usually say that if a cracker owns your database or your code he doesn't need a password, thus the difference is moot. This is naïve, because you still have the duty to protect your users' passwords, mainly because most of them do use the same password over and over again, exposing them to a greater risk by leaking their passwords.
Hashing is a one-way function, meaning that once you hash a password it is very difficult to get the original password back from the hash. Encryption is a two-way function, where it's much easier to get the original text back from the encrypted text.
Plain hashing is easily defeated using a dictionary attack, where an attacker just pre-hashes every word in a dictionary (or every combination of characters up to a certain length), then uses this new dictionary to look up hashed passwords. Using a unique random salt for each hashed password stored makes it much more difficult for an attacker to use this method. They would basically need to create a new unique dictionary for every salt value that you use, slowing down their attack terribly.
It's unsafe to store passwords using an encryption algorithm because if it's easier for the user or the administrator to get the original password back from the encrypted text, it's also easier for an attacker to do the same.
As shown in the above image, if the password is encrypted it is always a hidden secret where someone can extract the plain text password. However when password is hashed, you are relaxed as there is hardly any method of recovering the password from the hash value.
Extracted from Encrypted vs Hashed Passwords - Which is better?
Is encryption good?
Plain text passwords can be encrypted using symmetric encryption algorithms like DES, AES or with any other algorithms and be stored inside the database. At the authentication (confirming the identity with user name and password), application will decrypt the encrypted password stored in database and compare with user provided password for equality. In this type of an password handling approach, even if someone get access to database tables the passwords will not be simply reusable. However there is a bad news in this approach as well. If somehow someone obtain the cryptographic algorithm along with the key used by your application, he/she will be able to view all the user passwords stored in your database by decryption. "This is the best option I got", a software developer may scream, but is there a better way?
Cryptographic hash function (one-way-only)
Yes there is, may be you have missed the point here. Did you notice that there is no requirement to decrypt and compare? If there is one-way-only conversion approach where the password can be converted into some converted-word, but the reverse operation (generation of password from converted-word) is impossible. Now even if someone gets access to the database, there is no way that the passwords be reproduced or extracted using the converted-words. In this approach, there will be hardly anyway that some could know your users' top secret passwords; and this will protect the users using the same password across multiple applications. What algorithms can be used for this approach?
I've always thought that Encryption can be converted both ways, in a way that the end value can bring you to original value and with Hashing you'll not be able to revert from the end result to the original value.
Hashing algorithms are usually cryptographic in nature, but the principal difference is that encryption is reversible through decryption, and hashing is not.
An encryption function typically takes input and produces encrypted output that is the same, or slightly larger size.
A hashing function takes input and produces a typically smaller output, typically of a fixed size as well.
While it isn't possible to take a hashed result and "dehash" it to get back the original input, you can typically brute-force your way to something that produces the same hash.
In other words, if a authentication scheme takes a password, hashes it, and compares it to a hashed version of the requires password, it might not be required that you actually know the original password, only its hash, and you can brute-force your way to something that will match, even if it's a different password.
Hashing functions are typically created to minimize the chance of collisions and make it hard to just calculate something that will produce the same hash as something else.
Hashing:
It is a one-way algorithm and once hashed can not rollback and this is its sweet point against encryption.
Encryption
If we perform encryption, there will a key to do this. If this key will be leaked all of your passwords could be decrypted easily.
On the other hand, even if your database will be hacked or your server admin took data from DB and you used hashed passwords, the hacker will not able to break these hashed passwords. This would actually practically impossible if we use hashing with proper salt and additional security with PBKDF2.
If you want to take a look at how should you write your hash functions, you can visit here.
There are many algorithms to perform hashing.
MD5 - Uses the Message Digest Algorithm 5 (MD5) hash function. The output hash is 128 bits in length. The MD5 algorithm was designed by Ron Rivest in the early 1990s and is not a preferred option today.
SHA1 - Uses Security Hash Algorithm (SHA1) hash published in 1995. The output hash is 160 bits in length. Although most widely used, this is not a preferred option today.
HMACSHA256, HMACSHA384, HMACSHA512 - Use the functions SHA-256, SHA-384, and SHA-512 of the SHA-2 family. SHA-2 was published in 2001. The output hash lengths are 256, 384, and 512 bits, respectively,as the hash functions’ names indicate.
Ideally you should do both.
First Hash the pass password for the one way security. Use a salt for extra security.
Then encrypt the hash to defend against dictionary attacks if your database of password hashes is compromised.
As correct as the other answers may be, in the context that the quote was in, hashing is a tool that may be used in securing information, encryption is a process that takes information and makes it very difficult for unauthorized people to read/use.
Here's one reason you may want to use one over the other - password retrieval.
If you only store a hash of a user's password, you can't offer a 'forgotten password' feature.

Resources