What’s the best way to build a link containing an activation or password reset token? The URL would be delivered via email and the user would be expected to click on it to either activate their account or reset their password. There are a bunch of threads on this but there doesn’t appear to be a consistent approach, and each approach appears to have advantages and disadvantages from both a security and development perspective:
Link includes token as a parameter and email address in the query string
http://www.example.com/activations/:token?email=joe#gmail.com
With this approach, the token is hashed in the database and the email is used to look up the user. The token is then compared to the hashed version in the database using a library like brycpt. It seems though that including sensitive data, such as an email address in the query string, exposes some security risks.
Link only includes token either as parameter or query string.
http://www.example.com/activations/:token
This would appear to be the ideal solution, but I don’t know how to compare the tokens unless the token stored in the database is unhashed. From a security standpoint, some have argued that the token in the database doesn't need to be hashed, while others argue the token should be hashed. Assuming we keep a hashed token in the database, iterating through each token in the database and comparing with the token in the link seems very time consuming, particularly in apps with lots of users. So perhaps I'm wrong about this, but it seems that using this approach would require that I store the token in the database unhashed.
Anybody have any thoughts on the best approach?
There are no arguments against using method 2 containing only the token.
It is correct that only a hash of the token should be stored in the database (SQL-injection). The problem you see, comes from using BCrypt:
Bcrypt includes a salt, which prevents searching for the token in the database.
Because BCrypt applies key-stretching the verification is very slow, so it is impossible to check against every stored hash.
Those two techniques, salting and key-stretching, are mandatory for storing weak passwords, they make brute-forcing unpracticable. Because our token is a very strong "password" (minimum 20 characters 0-9 a-z A-Z) there is no need for these techniques, you can store an unsalted SHA-256 of the token in the database. Such a hash can be searched for directly with SQL.
To show that it is really safe, lets do a simple calculation. A 20 character token would allow for 7E35 combinations. If we can calculate 3 Giga SHA-256 per second, we would still expect about 3'700'000'000'000'000'000 years to find a match.
Just make sure the token is created from a cryptographically random source.
Related
I'm trying to authenticate a user after registration. What's the correct or standard way to go about it?
Using this method as the way to implement it, in step 3, how can I generate the random hash to send to the users email? I see two different options:
crypto
JWT token
I'm currently using JWT for login, so would it make sense to use the same token for user verification? Why or why not, and if not, what's the correct way?
The answer to your question of whether you should use a crypto hash or a token is neither.
The hash you are generating to use as a verification method does not need to be cryptographically secure, it only needs to be a unique verification hash that is not easy to guess.
In the past I have used a v4 UUID with the UUID lib and it works just fine. You could also base64 some known piece of information about the user, like their id or email concatenated with something random, like the time in mircoseconds and a random hex string with substantial length, but honestly the time it takes to build out something like that is wasted when UUID v4 works just fine.
Your hash also doesn't need to be unique (different for each user, yes, but avoid all potential collisions? No) - hitting an endpoint with only the hash is not a great idea. The endpoint should also take an identifier for your user combined with the verification hash. This way, you don't need to worry about the hash being unique in your datastore. Find user by ID, check that verification hashes match, verify. I would only suggest that you obfuscate the user's know information in a way that you can decode on your end (ex: base64 encode their user ID + email + some const string you use).
[EDIT]
Verifying or validating a user is really just asking them to prove that the email address (or phone number) they entered does in fact exist and that it belongs to the user. This is an attempt to make sure the user didn't enter the information incorrectly or that the registration is spam. For this we don't need cryptographic authentication, a simple shared secret is more than enough.
When you store your user's registration data, you generate the shared secret you will use to verify the account. This can be anything that is (relatively) unique and contains enough length and entropy that it is not easy to be guessed. We aren't encoding or encrypting information that will be unpacked later, we are doing a literal string comparison to make sure the secret we provided to the user was echoed back to us intact. This is why a simple one-way hash is OK to use. I suggested a UUID v4 because the components of this hash are generated from random information (other versions of UUID make use of the machine's MAC or the time or other known pieces of information). You can use any method you like as long as it can't be easily decoded or guessed.
After you generate the verification hash you send it to the user (in a nicely formatted URL that they only need to click) in order for them to complete their account registration. URL guidelines are totally up to you, but here are some suggestions:
BAD
/verify/<verification hash>
or
/verify?hash=<verification hash>
With only the verification hash in the URL, you are relying on this value to be globally unique in your datastore. If you can reliably generate unique values that never contain collisions, then it would be OK, but why would you want to worry about that? Don't rely on the verification hash by itself.
GOOD
/users/<id>/verify/<verification hash>
or
/users/<id>?action=verify&hash=<verification hash>
Out of these two examples you can see that the point is to provide two pieces of data, 1. is a way to identify the user, and 2. the verification hash you are checking.
In this process you start by finding the user in your datastore by ID, and then literally compare the secret you generated and stored against the value given in the URL. If the user is found and the verification hashes match, set their account to Active and you're good to go. If the user is found but the hashes don't match... either you provided a malformed URL or someone is trying to brute force your verification. What you do here is up to you, but to be safe you might regenerate the hash and send out a new email and try the process again. This leads very quickly into a black hole about how to prevent spam and misuse of your system, which is a different conversation.
The above URL schemas really only work if your user IDs are safe for public display. As a general rule you should never use your datastore IDs in a URL, especially if they are sequential INTs. There are many options for IDs that you would use in a URL like UUID v1 or HashIDs or any implementation of a short ID.
ALSO
A good way to see how this is done in the wild is to look at the emails you have received from other systems asking you to verify your own email address. Many may use the format:
/account/verify/<very long hash>
In this instance, the "very long hash" is usually generated by a library that either creates a datastore table just for the purpose of account verification (and the hash is stored in that table) or is decoded to reveal a user identifier as well as some sort of verification hash. This string is encoded in a way that is not easily reversible so it can not be guessed or brute forced. This is typically done by encoding the components with some sort of unique salt value for each string.
NOTE - while this method may be the most "secure", I only mention this because it is based on the typical methods used by third-party libs which do not make assumptions about your user data model. You can implement this style if you want, but it would be more work. My answer is focused your intent to do basic verification based on data in your user model.
BONUS
Many verification systems are also time constricted so that the verification URL expires after some period of time. This is easily able to be set up by also storing a future timestamp with your user data that is checked when the verification endpoint is hit and the user is found. What to do when an expired link is clicked is up to you, but the main benefit is to help you more easily clean up dead registrations that you know cannot be verified.
If I want to persist the users' login such that they do not have to re-login even after, say 1 year of inactivity, then is storing a permanent access token as good as storing the password directly (perhaps hashed), since the (permanent) access token would essentially be the "alternative password"?
Storing an access token is surely safer than storing the password directly, but let's see why:
An attacker can only get this token, but not the original password. This is better, because passwords are often reused on other sites, and/or can reveal password schemes. ➽ Make sure the token is random and not derrived from the password.
The token is not just another password. While passwords choosen by a user are often weak, a token is very strong. They are so strong, that brute-forcing is impractical. ➽ Generate random, long enough tokens, they should be at least 20 characters a-z,A-Z,0-9.
Generally speaking, yes. But, with a lot of caveats.
A long, random token, generated by a CSPRNG (this is very important, there are different ways to generate "random" strings and not all of them are really random), is stronger than a password - yes. However, the way you intend to use this token means that it is effectively a password by itself, and that means the same criteria applies:
It can't be permanent.
A key property of passwords is that they are not constants and users can change their passwords when stolen, or otherwise over time. Any kind of token should be no different, except that it should be automatically changed (rotated) by your application, on regular intervals.
It MUST be hashed!
(with a strong algorithm: bcrypt, scrypt, Argon-2I, PBKDF2; anything else is plain wrong)
Don't ever store user passwords in plain-text format, anywhere. Even if it is guaranteed that the user doesn't use this password on any other site, a plain-text password means that anybody who gets their hands on the database (even for a brief time), can hijack user accounts.
You have a responsibility to protect your users not only from "hackers", but from yourself as well.
Don't store it in a cookie, even if hashed or encrypted.
The way you've worded the question implies that you would do something like this. Cookies are not a secure location to store passwords of any kind. Temporary, short-lived tokens - sure, but not passwords.
It looks like you're trying to design your own authentication protocol, which is not an easy thing to do. It may be easy to make it work, but that's about 5% of the job; there's just too many details to consider. And all of this, for the tiny benefit of saving the minor inconvenience of a user typing-in their password once in a while - people are used to this; it's not worth the security risks.
In case you are hell-bent on providing long-lived logins, I would recommend using an existing authentication protocol. Every such protocol uses on cryptographic signatures, avoiding reliance on user passwords altogether and thus eliminating all of the above problems almost entirely.
Personally, I would just allow the so-called "social logins" - via Facebook, Google, Twitter. You wouldn't have to handle passwords at all, and anybody can login with a single click of a button.
Is better to store an access token than storing the password or the hashed password (can always try brute force to find the password) and I think you should give a token a lifetime.
Although the answer is yes, but it also depends on the place you are storing the token. And you might want to auth the user with XSRF/CSRF token as well along with the token.
But storing token is better than storing password.
What security measures should I put in place to ensure that, were my database to be compromised, long-life access tokens could not be stolen?
A long-life access token is as good as a username and password for a particular service, but from talking to others it seems most (myself included) store access tokens in plain text. This seems to be to be just as bad as storing a password in plain text. Obviously one cannot salt & hash the token.
Ideally I'd want to encrypt them, but I'm unsure of the best way to do this, especially on an open source project.
I imagine the answer to this question is similar to one on storing payment info and PCI compliance, but I'd also ask why there isn't more discussion of this? Perhaps I'm missing something.
Do you just want to verify a token provided by others? If so, treat it as you would a password. Use a byte derivation algorithm like Password Based Key Derivation Function 2 (PBKDF2) (also described in RFC 2898) with 10,000 iterations and store the first 20 bytes or so. When the token is received. It is not practically reversible.
Do you want to present the token to others for authentication? If so, this is a challenge because, if your application can decrypt or otherwise get access to the token, so can an attacker. Think Shannon's Maxim, the attacker knows the system, especially for an open source project.
In this case, the best approach is to encrypt the tokens with a strong algorithm (e.g. AES256), generate keys using a strong cryptographic standard random number generator and store the key(s) securely in a different location to the data, such as in a permission protected file outside the database in the example above. The latter means that SQL injection attacks will not reveal the keys.
When it comes to remember me cookies, there are 2 distinct approaches:
Hashes
The remember me cookie stores a string that can identify the user (i.e. user ID) and a string that can prove that the identified user is the one it pretends to be - usually a hash based on the user password.
Tokens
The remember me cookie stores a random (meaningless), yet unique string that corresponds with with a record in a tokens table, that stores a user ID.
Which approach is more secure and what are its disadvantages?
You should use randomly generated tokens if possible. Of course, the downside is that you have to write some extra code to store and use them on the server side, so this might not be warranted for all web applications. But from a security standpoint, this has distinct advantages:
An attacker cannot generate tokens from user IDs, but he can definitely generate hashes. This is a big problem, even if you use salt when generating hashes (and you should), your users are screwed if the salt ever gets into the wrong hands.
Giving out these tokens enables your users (or your admin if need be) to "log out" certain sessions that they might want to get rid of. This is actually a cool feature to have, Google and Facebook use it for example.
So, if you have time and budget: tokens, absolutely.
Typically you keep the token -> user mapping secure on the server side. So ultimately your security is all based around keeping the token safe and ensuring that its lifetime is controlled (e.g. it expires and/or is only valid when given to you from the same IP as that used by the original provider of the credentials - again, just an example)
Security of token based authentication
Hope this helps.
Yes tokens would be more secure if they produce a random string each time.
On the other hand, the whole point of remember me is that the user doesn't have to log in again, so unless they click log out your rarely going to need to re-produce a new token unless it expires.
I guess you should stick with tokens and not sacrifice security for lazyness :-p
I realize that the OAuth spec doesn't specify anything about the origin of the ConsumerKey, ConsumerSecret, AccessToken, RequestToken, TokenSecret, or Verifier code, but I'm curious if there are any best practices for creating significantly secure tokens (especially Token/Secret combinations).
As I see it, there are a few approaches to creating the tokens:
Just use random bytes, store in DB associated with consumer/user
Hash some user/consumer-specific data, store in DB associated with consumer/user
Encrypt user/consumer-specific data
Advantages to (1) are the database is the only source of the information which seems the most secure. It would be harder to run an attack against than (2) or (3).
Hashing real data (2) would allow re-generating the token from presumably already known data. Might not really provide any advantages to (1) since would need to store/lookup anyway. More CPU intensive than (1).
Encrypting real data (3) would allow decrypting to know information. This would require less storage & potentially fewer lookups than (1) & (2), but potentially less secure as well.
Are there any other approaches/advantages/disadvantages that should be considered?
EDIT: another consideration is that there MUST be some sort of random value in the Tokens as there must exist the ability to expire and reissue new tokens so it must not be only comprised of real data.
Follow On Questions:
Is there a minimum Token length to make significantly cryptographically secure? As I understand it, longer Token Secrets would create more secure signatures. Is this understanding correct?
Are there advantages to using a particular encoding over another from a hashing perspective? For instance, I see a lot of APIs using hex encodings (e.g. GUID strings). In the OAuth signing algorithm, the Token is used as a string. With a hex string, the available character set would be much smaller (more predictable) than say with a Base64 encoding. It seems to me that for two strings of equal length, the one with the larger character set would have a better/wider hash distribution. This seems to me that it would improve the security. Is this assumption correct?
The OAuth spec raises this very issue in 11.10 Entropy of Secrets.
OAuth says nothing about token except that it has a secret associated with it. So all the schemes you mentioned would work. Our token evolved as the sites get bigger. Here are the versions we used before,
Our first token is an encrypted BLOB with username, token secret and expiration etc. The problem is that we can't revoke tokens without any record on host.
So we changed it to store everything in database and the token is simply an random number used as the key to the database. It has a username index so it's easy to list all the tokens for a user and revoke it.
We get quite few hacking activities. With random number, we have to go to database to know if the token is valid. So we went back to encrypted BLOB again. This time, the token only contains encrypted value of the key and expiration. So we can detect invalid or expired tokens without going to the database.
Some implementation details that may help you,
Add a version in the token so you can change token format without breaking existing ones. All our token has first byte as version.
Use URL-safe version of Base64 to encode the BLOB so you don't have to deal with the URL-encoding issues, which makes debugging more difficult with OAuth signature, because you may see triple encoded basestring.