Related
I'm looking for a way to encrypt the entire DB and keep the ability to search for data although it's encrypted.
I have seen a lot of questions regarding encryption of at rest data in Mongo, but none of it got an answer that can help one complete a full flow for their application.
I hope to present here my findings and get feedback and more ideas (I still have some questions).
Encryption options:
1.mongoose-encryption.
Complete solution! Can encrypt all fo the db with minimal work for you!.
2. Procona mongodb - I didn't had a chance to test it, I've spent hours trying to install and get it to run, without luck (this is probably just me though..).
3. Create get and send methods to encrypt and decrypt your data in the Module level.
My requirements for at rest data encryption are:
Application layer does not need to be involved in the encryption- decryption process. Should be like we don't even have the data encrypted (for the most part).
We can perform search and lookups on encrypted data.
I don't know how to do that but hopefully search for partial words and phrases in encrypted text fields.
Of course that all data is encrypted expect for Object IDs.
My approach:
I want to try and use mongoose-encryption to use all the benefits of this amazing plugin.
I also want to add to the schema the Hash of the Real value in the encrypted field so I could preform find operations on encrypted field.
The problem:
I can't seem to find the correct mongoose Hook to temper with the non-encrypted data before mongoose-encryptions hides it. So I can't generate my Hash.
This doesn't work:
Users.pre('save', () => {
this.hashedName = hash(this.name)
console.log(":(")
});
Also as mentioned above, searching for partials and phrases in encrypted data.
With my approach we could find someone named "Danielle" but we can't search in Hash for users with a name that starts with "Dani".
Please give me your opinion as well for my approach. I know that this is a topic without easy to find solutions.
If you want to encrypt the data on disk, encrypt the entire disk and encrypt the swap. If someone gets a copy of the database (e.g. you forgot to put auth on the database and someone connects to the database and dumps the data) the plaintext is exposed.
If you want the database to store encrypted data only, use client side encryption. This requires key management on the client side but makes it so that someone dumping your database doesn't get the plaintext.
I am making a social media type website, and I store user details such as emails, names and other personal details.
I will be encrypting the personal details using an Encrypt-then-MAC concept. When the user registers, a cryptographically secure string will be made to use as the private encryption key. When the user selects a password, the encryption key will be encrypted using the password.
The password will NOT be stored in the database, but will be the private key to decrypt the encryption key used to encrypt the personal details. The only person who knows the password is the user. My question is: how can I store the encryption key once decrypted?
I have thought of having a table with one column for IP and another column for the encryption key, but some people close the browser window without logging out, therefore there would not a possible way to remove the entry from the database when they have finished their session on the website.
Another way would be to store it in a cookie, but that could be intercepted when sent back to the server. I would like to know if there is a secure, nearly foolproof way to store the encryption key, client side or server side.
Thanks in advance.
EDIT:
In reply to TheGreatContini's answer -
The idea of a "zero-knowledge web application" (in your blog) is a good one, however, for zero-knowledge, even the key cannot be stored in the database, this complicates things a bit, as you would then have to use the user's password as the key. Using the password isn't as secure, as it is a bit harder to verify the password to prevent data which has been "decypted with the wrong key" from passing. There is the concept of Encrypt-then-MAC but that only verifies if the data is legit, and will assume that a hacker has messed with some data and data cannot be trusted, however, as you cannot actually verify the password (the hash would not be stored as it is "zero-knowledge"), so the password may just be wrong.
Not sure I have the answer, but a few considerations:
(1) Sessions need to be timed out. Perhaps you can do this by periodically running batch jobs that scan the database looking for sessions that have lacked activity. This requires storing in the db the date of the last action from the user.
(2) Generally keys are higher value than the content they protect because the keys have a longer lifetime than the individual data elements that the protect (because the data may change or additional data may be added). Rather than storing the key in the db, you can store the decrypted contents in the database for the length of the session. Of course, this is provided that you did (1).
Perhaps I am not adding much beyond what you already know, however may be worth considering a blog I wrote exactly about this topic. The low level details start in the section "A second line of defence for all the sensitive data." Prior to that it mainly motivates the concept. Glad to see somebody actually doing something like this.
Suppose we have website that use MD5 hash in URL like this:
http://somewebsite.com/XXX/
where XXX is MD5 hash.
Content of this website may have sensitive information like transaction details with personal data.
There is no other authorization to this website, so if you have URL you can access it.
How safe is it? I mean, if no one will share URL with anyone, then can I assume no one will access it?
How much time could it take web crawler, to crawl through all combinations of such url?
I ask because I am using some web shop, that store transaction details with personal data in such manner, I am saying them that it's not secure and someone can view their clients sensitive data, but they are not convinced. To build web crawler it's simple to me, I know how to do this, but I don't know how much time it will crawl through all combinations, maybe at shop they are right? This is not about my website, I am end user of that shop, and I need to convince them they are wrong.
This is an example of "security by obscurity", and it should be considered unsafe.
Just for the fun of it, lets think about it a bit more.
You say it's a hash. A hash of what? Is it possible that a hacker guesses or knows what you are hashing, and starts guessing from there? BTW, how will you ensure that hashes are unique?
Sure, an MD5 hash is 128 bits, but if the thing you're hashing doesn't have 128 bits of information, it doesn't really help. Maybe the thing you're hashing is a sequence number? In that case, guessing the next hash could be trivially easy.
Even then, MD5 is considered broken now, and not fit for security usage. For more information, take a look at the Wikipedia article on MD5.
More importantly, you create these URLs, but you need to send them somehow to the user. How will you do that? Not in a clear text e-mail I hope. Maybe you publish them on a properly secured website, where they can click on the link after properly logging on. Oh, wait... no probably not.
If you want to keep something safely confidential, secure it properly. Require a logon. And use SSL/TLS, i.e. HTTPS instead of HTTP.
Many users – myself included – would like the security of having everything they do on a web service encrypted. That is, they don't won't any one at the web service to be able to look at their: posts, info, tasks, etc...
This is also major complaint in this discussion of an otherwise cool service: http://news.ycombinator.com/item?id=1549115
Since this data needs to be recoverable, some sort of two-way encryption is required. But unless you're prompting the user for the encryption key on every request, this key will need to be stored on the server, and the point of encrypting the data is basically lost.
What is a way to securely encrypt user data without degrading the user experience (asking for some key on every request)?
-- UPDATE --
From #Borealid's answer, I've focused on two possibilities: challenge-response protocols, where no data (password included) is sent in the "clear", and non-challenge-response protocols, where data (password included) is sent in the "clear" (although over HTTPS).
Challenge-response protocols (specifically SRP: http://srp.stanford.edu/)
It seems that its implementation would need to rely on either a fully AJAX site or using web storage. This is so the browser can persist the challenge-response data during encryption and also the encryption key between different "pages". (I'm assuming after authentication is completed I would send them back the encrypted encryption key, which they would decrypt client-side to obtain the real encryption key.)
The problem is that I'm either:
fully AJAX, which I don't like because I love urls and don't won't a user to live exclusively on a single url, or
I have to store data encryption keys in web storage, which based on http://dev.w3.org/html5/webstorage/ will persist even after the browser is closed and could be a security vulnerability
In addition, as SRP takes more than one request ( http://srp.stanford.edu/design.html ), there needs to be some persistence on the server-side. This is just another difficulty.
Traditionally
If I'm ok transmitting passwords and data in the clear (although over HTTPS), then the client-side issues above are not present.
On registration, I'll generate a random unique encryption key for the user, and encrypt it using their password and a random salt.
In the database, I'll store the user's password hash and salt (through bcrypt), encrypted encryption key, encryption key salt, and encryption iv.
After an authentication, I'll also need to use their password to decrypt the encryption key so that they may view and enter new data. I store this encryption key only temporarily and delete it when they explicitly "log out".
The problems with this approach is that (like #Borealid points out) evil sysadmins can still look at your data when you are logged in.
I'm also not sure how to store the encryption keys when users are logged in. If they are in the same data store, a stolen database would reveal all data of those who were logged in at the time of theft.
Is there a better in-memory data store for storing these encryption keys (and challenge data during an SRP authentication)? Is this something Redis would be good for?
If the data need to be recoverable in the event of user error, you can't use something like a cookie (which could get deleted). And as you point out, server-side keys don't actually secure the user against malicious sysadmins; they only help with things like databases stolen offline.
However, if you're running a normal web service, you've already gotten pretty lucky - the user, in order to be unique and non-ephemeral, must be logged in. This means they go through some authentication step which proves their identity. In order to prove their identity, most web sites use a passed credential (a password).
So long as you don't use a challenge-response authentication protocol, which most web sites don't, you can use an encryption key derived from a combination of a server-side secret and the user's password. Store the encryption key only while the user is authenticated.
If you do this, the users are still vulnerable to sysadmins peeking while they're using the service (or stealing their passwords). You might want to go a step further. To go one up, don't send the password to the server at all. Instead, use a challenge-response protocol for authentication to your website, and encrypt the data with a derivative of the user's password via JavaScript before uploading anything.
This is foolproof security: if you try to steal the user's password, the user can see what you're doing because the code for the theft is right there in the page you sent them. Your web service never touches their data unencrypted. This is also no hindrance to the normal user experience. The user just enters their password to log in, as per normal.
This method is what is used by Lacie's storage cloud service. It's very well done.
Note: when I say "use foo to encrypt", I really mean "use foo to encrypt a secure symmetric key which is then used with a random salt to encrypt". Know your cryptography. I'm only talking about the secret, not the methodology.
None of those other solutions are going to maintain the feature set requested -- which specifically wants to preserve the user experience. If you look at the site referenced in the link, they email you a nightly past journal entry. You're not going to get that with JavaScript trickery per above because you don't have the browser to depend on. So basically this is all leading you down a path to a degraded user experience.
What you would want, or more precisely the best solution you're going to find in this space, is not so much what wuala does per above, but rather something like hush.com. The handling of user data needs to be done on the client side at all times -- this is generally accomplished via full client-side Java (like the Facebook photo uploader, etc), but HTML/JavaScript might get you there these days. JavaScript encryption is pretty poor, so you may be better off ignoring it.
OK, so now you've got client-side Java running a Journal entry encryption service. The next feature was to email past journal entries to users every night. Well, you're not going to get that in an unencrypted email obviously. This is where you're going to need to change the user experience one way or the other. The simplest solution is not to email the entry and instead to provide for instance a journal entry browser in the Java app that reminds them of some old entry once they get to the website based on a link in the daily email. A much more complex solution would be to use JavaScript encryption to decrypt the entry as an attachment inline in the email. This isn't rocket science but there is a fairly huge amount of trickery involved. This is the general path used by several web email encryption services such as IronPort. You can get a demo email by going to http://www.ironport.com/securedemo/.
As much as I'd love to see a properly encrypted version of all this, my final comment would be that journal entries are not state secrets. Given a solid privacy policy and good site security semantics, I'm sure 99% of your users will feel just fine about things. Doing all this right with true security will take an enormous amount of effort per above and at least some design/UE changes.
You should look into the MIT project CryptDB which supports querying an encrypted database using a subset of SQL. (see the forbes article, mefi thread, or Homomorphic encryption on wikipedia)
There is the Tahoe-LAFS project for cloud storage too, which conceivably could be leveraged into a fully anonymous social networking application, one day in the distant future.
If you want to perform computations on a server without even the server being able to see the data, you may be interested in knowing about fully homomorphic encryption. A fully homomorphic encryption scheme lets you perform arbitrary computations on encrypted data, even if you can't decrypt it. However, this is still a topic of research.
For now, I guess your best bet would be to encrypt all posts and assign meaningless (e.g. sequential) IDs to each one. For a more in-depth discussion of how to encrypt server-side data with today's technology, look up.
As much as I understand it is a good idea to keep passwords secret from the site administrator himself because he could try to take a user's email and log into his mailbox using the same password (since many users use the same password everywhere).
Beyond that I do not see the point. I know it makes more difficult the dictionary attack but... if someone unauthorized got into the database, isn't it too late to worry about passwords? The guy has now access to all tables in the database and in a position to take all the data and do whatever he wants.
Or am I missing something?
The bigger problem is that people tend to use the same password everywhere. So if you obtain a database of usernames and unsalted passwords, chances are good they might work elsewhere, like hotmail, gmail etc.
The guy might be in a position to do everything he/she wants to your system, but you shouldn't allow him/her to do anything with other systems (by using your users' passwords).
Password is a property of your users. You should keep it safely.
Many of your users use the same credentials (usernames/passwords) at your site as they do at their bank. If someone can get the credentials table, they can get instant access to a bunch of bank accounts. Fail.
If you don't actually store passwords, then attackers can't steal your users' bank accounts just by grabbing the credentials table.
It relies on the fact that a hash is a one way function. In other words, its very easy to convert a password into a hash, but very difficult to do the opposite.
So when a user registers you convert their chosen password into a hash and store it. At a later point they login using their password and you convert the password to its hash and compares it this is because, to a high level of probablity if (passwordhashA == passwordhashB) then passwordA=passwordB.
Salting is a solution to a related problem. If you know that someones passwordhash is, say ABCDEF, then you can try calcuolating hashes for all possible passwords. Sooner or later you may find that hash('dog') = ABCDEF, so you know their password. This takes a very long time, but the process can be speeded up by using pre-created 'dictionaries' where, for a given hash you can look up the corresponding password. Salting, however means that the text that is hashed isnt a simple english word, or a simple combinationofwords. For example, the case I gave above, the text that would be hashed is not 'dog', but is 'somecrazymadeuptextdog'. This means that any readily available dictionary is useless, since the likelyhood of it containing the hash for that text is a lot less than the likelihood of it containing the hash for 'dog' This likelihood becomes even lower if the salt is a random alphanumeric string.
The site admin may not be the only person who gets access to your password. There is always the possibility of a dump of the whole database ending up on a public share by accident. In that case, everybody in the world who has internet access could download it and read the password which was so conveniently stored in cleartext.
Yes, this has happened. With credit card data, too.
Yes, it is highly probable that it will happen again.
"if someone unauthorized got into the database, isn't it too late to worry about passwords?"
You're assuming a poor database design in which the authorization data is comingled with application data.
The "Separation of Concerns" principle and the "Least Access" principle suggest that user credentials should be kept separate from everything else.
For example, keep your user credentials in an LDAP server.
Also, your question assumes that database credentials are the only credentials. Again, the least access principle suggests that you have application credentials which are separate from database credentials.
Your web application username and password is NOT the database username and password. Similarly for a desktop application. The application authentication may not necessarily be the database authentication.
Further, good security suggests that access to usernames and passwords be kept separate from application data. In a large organization with lots of database users, one admin should be "security officer" and handle authentication and authorization. No other users can modify authorization and the security officer is not authorized to access application data.
It's a quick audit to be sure that the security officer never accesses data. It's a little more complex, but another audit can be sure that the folks with data authorization are real people, not aliases for the security officer.
Hashed passwords is one part of a working security policy.
Of course, storing hashes of passwords instead of plain-text does not make your application secure. But it is one measure that increases the security. As you mentioned if your server is comprised this measure won't save you, but it limits the damage.
A chain is only as strong as its weakest link
Hashing passwords is only strengthening one link of the chain. So you will have to do more than that.
In addition to what has already been said regarding salting, there's another problem salting solves :
If you use the same salt everywhere (or no salt at all), it's possible to say just by looking at the database that user foo and user bar both have the same password (even if you don't know what the password is).
Then, if one achieve to get foo's password (using social engineering for example), bar's password is known as well.
Also, if the salt is everywhere the same, one can build up a dictionary dedicated to this specific salt, and then run a brute-force attack using this 'salted' dictionary.
This may be a bit off topic, but once in a while, I notice some websites are not using hashing (for example, when I click the forgot password button, they send me my password in cleartext instead of allowing me to chose another one).
I usually just unsubscribe, because I don't think I can trust a website designed by people not taking the elementary precaution of hashing passwords.
That's one more reason for salting:)
People seem far too complacent about this! The threat isn't some guy with shell access to your system or to the backup media, it could be any script kiddie who can see the unprotected (but dynamic) part of your site(*) and a single overlooked SQL injection threat. One query and suddenly he can log in as any user, or even as an admin. Hashing the passwords make it far less likely that the attacker can log in as any particular user using their password -or- update a record with their own password.
(*) "unprotected" includes any part of the site that can be accessed as a self-registered user. Contrast this to a bank site, for instance, where you must have an existing bank account to gain access to much of the site. An attacker could still open a bank account to gain access to the site, but it would be far easier to send big guys with bigger guns after him when he tries to crack the system.