I realize that the OAuth spec doesn't specify anything about the origin of the ConsumerKey, ConsumerSecret, AccessToken, RequestToken, TokenSecret, or Verifier code, but I'm curious if there are any best practices for creating significantly secure tokens (especially Token/Secret combinations).
As I see it, there are a few approaches to creating the tokens:
Just use random bytes, store in DB associated with consumer/user
Hash some user/consumer-specific data, store in DB associated with consumer/user
Encrypt user/consumer-specific data
Advantages to (1) are the database is the only source of the information which seems the most secure. It would be harder to run an attack against than (2) or (3).
Hashing real data (2) would allow re-generating the token from presumably already known data. Might not really provide any advantages to (1) since would need to store/lookup anyway. More CPU intensive than (1).
Encrypting real data (3) would allow decrypting to know information. This would require less storage & potentially fewer lookups than (1) & (2), but potentially less secure as well.
Are there any other approaches/advantages/disadvantages that should be considered?
EDIT: another consideration is that there MUST be some sort of random value in the Tokens as there must exist the ability to expire and reissue new tokens so it must not be only comprised of real data.
Follow On Questions:
Is there a minimum Token length to make significantly cryptographically secure? As I understand it, longer Token Secrets would create more secure signatures. Is this understanding correct?
Are there advantages to using a particular encoding over another from a hashing perspective? For instance, I see a lot of APIs using hex encodings (e.g. GUID strings). In the OAuth signing algorithm, the Token is used as a string. With a hex string, the available character set would be much smaller (more predictable) than say with a Base64 encoding. It seems to me that for two strings of equal length, the one with the larger character set would have a better/wider hash distribution. This seems to me that it would improve the security. Is this assumption correct?
The OAuth spec raises this very issue in 11.10 Entropy of Secrets.
OAuth says nothing about token except that it has a secret associated with it. So all the schemes you mentioned would work. Our token evolved as the sites get bigger. Here are the versions we used before,
Our first token is an encrypted BLOB with username, token secret and expiration etc. The problem is that we can't revoke tokens without any record on host.
So we changed it to store everything in database and the token is simply an random number used as the key to the database. It has a username index so it's easy to list all the tokens for a user and revoke it.
We get quite few hacking activities. With random number, we have to go to database to know if the token is valid. So we went back to encrypted BLOB again. This time, the token only contains encrypted value of the key and expiration. So we can detect invalid or expired tokens without going to the database.
Some implementation details that may help you,
Add a version in the token so you can change token format without breaking existing ones. All our token has first byte as version.
Use URL-safe version of Base64 to encode the BLOB so you don't have to deal with the URL-encoding issues, which makes debugging more difficult with OAuth signature, because you may see triple encoded basestring.
Related
I'm trying to authenticate a user after registration. What's the correct or standard way to go about it?
Using this method as the way to implement it, in step 3, how can I generate the random hash to send to the users email? I see two different options:
crypto
JWT token
I'm currently using JWT for login, so would it make sense to use the same token for user verification? Why or why not, and if not, what's the correct way?
The answer to your question of whether you should use a crypto hash or a token is neither.
The hash you are generating to use as a verification method does not need to be cryptographically secure, it only needs to be a unique verification hash that is not easy to guess.
In the past I have used a v4 UUID with the UUID lib and it works just fine. You could also base64 some known piece of information about the user, like their id or email concatenated with something random, like the time in mircoseconds and a random hex string with substantial length, but honestly the time it takes to build out something like that is wasted when UUID v4 works just fine.
Your hash also doesn't need to be unique (different for each user, yes, but avoid all potential collisions? No) - hitting an endpoint with only the hash is not a great idea. The endpoint should also take an identifier for your user combined with the verification hash. This way, you don't need to worry about the hash being unique in your datastore. Find user by ID, check that verification hashes match, verify. I would only suggest that you obfuscate the user's know information in a way that you can decode on your end (ex: base64 encode their user ID + email + some const string you use).
[EDIT]
Verifying or validating a user is really just asking them to prove that the email address (or phone number) they entered does in fact exist and that it belongs to the user. This is an attempt to make sure the user didn't enter the information incorrectly or that the registration is spam. For this we don't need cryptographic authentication, a simple shared secret is more than enough.
When you store your user's registration data, you generate the shared secret you will use to verify the account. This can be anything that is (relatively) unique and contains enough length and entropy that it is not easy to be guessed. We aren't encoding or encrypting information that will be unpacked later, we are doing a literal string comparison to make sure the secret we provided to the user was echoed back to us intact. This is why a simple one-way hash is OK to use. I suggested a UUID v4 because the components of this hash are generated from random information (other versions of UUID make use of the machine's MAC or the time or other known pieces of information). You can use any method you like as long as it can't be easily decoded or guessed.
After you generate the verification hash you send it to the user (in a nicely formatted URL that they only need to click) in order for them to complete their account registration. URL guidelines are totally up to you, but here are some suggestions:
BAD
/verify/<verification hash>
or
/verify?hash=<verification hash>
With only the verification hash in the URL, you are relying on this value to be globally unique in your datastore. If you can reliably generate unique values that never contain collisions, then it would be OK, but why would you want to worry about that? Don't rely on the verification hash by itself.
GOOD
/users/<id>/verify/<verification hash>
or
/users/<id>?action=verify&hash=<verification hash>
Out of these two examples you can see that the point is to provide two pieces of data, 1. is a way to identify the user, and 2. the verification hash you are checking.
In this process you start by finding the user in your datastore by ID, and then literally compare the secret you generated and stored against the value given in the URL. If the user is found and the verification hashes match, set their account to Active and you're good to go. If the user is found but the hashes don't match... either you provided a malformed URL or someone is trying to brute force your verification. What you do here is up to you, but to be safe you might regenerate the hash and send out a new email and try the process again. This leads very quickly into a black hole about how to prevent spam and misuse of your system, which is a different conversation.
The above URL schemas really only work if your user IDs are safe for public display. As a general rule you should never use your datastore IDs in a URL, especially if they are sequential INTs. There are many options for IDs that you would use in a URL like UUID v1 or HashIDs or any implementation of a short ID.
ALSO
A good way to see how this is done in the wild is to look at the emails you have received from other systems asking you to verify your own email address. Many may use the format:
/account/verify/<very long hash>
In this instance, the "very long hash" is usually generated by a library that either creates a datastore table just for the purpose of account verification (and the hash is stored in that table) or is decoded to reveal a user identifier as well as some sort of verification hash. This string is encoded in a way that is not easily reversible so it can not be guessed or brute forced. This is typically done by encoding the components with some sort of unique salt value for each string.
NOTE - while this method may be the most "secure", I only mention this because it is based on the typical methods used by third-party libs which do not make assumptions about your user data model. You can implement this style if you want, but it would be more work. My answer is focused your intent to do basic verification based on data in your user model.
BONUS
Many verification systems are also time constricted so that the verification URL expires after some period of time. This is easily able to be set up by also storing a future timestamp with your user data that is checked when the verification endpoint is hit and the user is found. What to do when an expired link is clicked is up to you, but the main benefit is to help you more easily clean up dead registrations that you know cannot be verified.
I'm building an application with ExpressJS, Mongodb(Mogoose). Application contains routes where user has to be authenticated before accessing it.
Currently I have written a express middleware to do the same. Here with the help of JWT token I'm making mongodb query to check whether user is authenticated or not. but feel this might put unnecessary request load on my database.
should I integrate redis for this specific task?
does it will improve API performance? or should go ahead with existing
mongodb approach?
would be helpful if I get more insights on this.
TLDR: If you want the capability to revoke a JWT at some point, you'll need to look it up. So yes, something fast like Redis can be useful for that.
One of the well documented drawbacks of using JWTs is that there's no simple way to revoke a token if for example a user needs to be logged out or the token has been compromised. Revoking a token would mean to look it up in some storage and then deciding what to do next. Since one of the points of JWTs is to avoid round trips to the db, a good compromise would be to store them in something less taxing than an rdbms. That's a perfect job for Redis.
Note however that having to look up tokens in storage for validity still reintroduces statefulness and negates some of the main benefits of JWTs. To mitigate this drawback make the list a blacklist (or blocklist, i.e. a list of invalid tokens). To validate a token, you look it up on the list and verify that it is not present. You can further improve on space and performance by staggering the lookup steps. For instance, you could have a tiny in-app storage that only tracks the first 2 or 3 bytes of your blacklisted tokens. Then the redis cache would track a slightly larger version of the same tokens (e.g. the first 4 or 5 bytes). You can then store a full version of the blacklisted tokens using a more persistent solution (filesystem, rdbms, etc). This is an optimistic lookup strategy that will quickly confirm that a token is valid (which would be the more common case). If a token happens to match an item in the in-app blacklist (because its first few bytes match), then move on to do an extra lookup on the redis store, then the persistent store if need be. Some (or all) of the stores may be implemented as tries or hash tables. Another efficient and relatively simple to implement data structure to consider is something called a Bloom filter.
As your revoked tokens expire (of old age), a periodic routine can remove them from the stores. Keep your blacklist short and manageable by also shortening the lifespan of your tokens.
Remember that JWTs shine in scenarios where revoking them is the exception. If you routinely blacklist millions of long-lasting tokens, it may indicate that you have a different problem.
You can use Redis for storing jwt label. Redis is much faster and convenient for storing such data. The request to Redis should not greatly affect the performance. You can try the library jwt-redis
JWT contains claims. you can store a claim such as
session : guid
and maintain a set in redis for all keys black listed. the key should stay in set as long as the jwt validity.
when your api is hit
verify jwt signature. if tempered stop
extract claims in a list of key value pairs
get the session key and check in redis in blacklisted set
if found, stop else continue
What’s the best way to build a link containing an activation or password reset token? The URL would be delivered via email and the user would be expected to click on it to either activate their account or reset their password. There are a bunch of threads on this but there doesn’t appear to be a consistent approach, and each approach appears to have advantages and disadvantages from both a security and development perspective:
Link includes token as a parameter and email address in the query string
http://www.example.com/activations/:token?email=joe#gmail.com
With this approach, the token is hashed in the database and the email is used to look up the user. The token is then compared to the hashed version in the database using a library like brycpt. It seems though that including sensitive data, such as an email address in the query string, exposes some security risks.
Link only includes token either as parameter or query string.
http://www.example.com/activations/:token
This would appear to be the ideal solution, but I don’t know how to compare the tokens unless the token stored in the database is unhashed. From a security standpoint, some have argued that the token in the database doesn't need to be hashed, while others argue the token should be hashed. Assuming we keep a hashed token in the database, iterating through each token in the database and comparing with the token in the link seems very time consuming, particularly in apps with lots of users. So perhaps I'm wrong about this, but it seems that using this approach would require that I store the token in the database unhashed.
Anybody have any thoughts on the best approach?
There are no arguments against using method 2 containing only the token.
It is correct that only a hash of the token should be stored in the database (SQL-injection). The problem you see, comes from using BCrypt:
Bcrypt includes a salt, which prevents searching for the token in the database.
Because BCrypt applies key-stretching the verification is very slow, so it is impossible to check against every stored hash.
Those two techniques, salting and key-stretching, are mandatory for storing weak passwords, they make brute-forcing unpracticable. Because our token is a very strong "password" (minimum 20 characters 0-9 a-z A-Z) there is no need for these techniques, you can store an unsalted SHA-256 of the token in the database. Such a hash can be searched for directly with SQL.
To show that it is really safe, lets do a simple calculation. A 20 character token would allow for 7E35 combinations. If we can calculate 3 Giga SHA-256 per second, we would still expect about 3'700'000'000'000'000'000 years to find a match.
Just make sure the token is created from a cryptographically random source.
What security measures should I put in place to ensure that, were my database to be compromised, long-life access tokens could not be stolen?
A long-life access token is as good as a username and password for a particular service, but from talking to others it seems most (myself included) store access tokens in plain text. This seems to be to be just as bad as storing a password in plain text. Obviously one cannot salt & hash the token.
Ideally I'd want to encrypt them, but I'm unsure of the best way to do this, especially on an open source project.
I imagine the answer to this question is similar to one on storing payment info and PCI compliance, but I'd also ask why there isn't more discussion of this? Perhaps I'm missing something.
Do you just want to verify a token provided by others? If so, treat it as you would a password. Use a byte derivation algorithm like Password Based Key Derivation Function 2 (PBKDF2) (also described in RFC 2898) with 10,000 iterations and store the first 20 bytes or so. When the token is received. It is not practically reversible.
Do you want to present the token to others for authentication? If so, this is a challenge because, if your application can decrypt or otherwise get access to the token, so can an attacker. Think Shannon's Maxim, the attacker knows the system, especially for an open source project.
In this case, the best approach is to encrypt the tokens with a strong algorithm (e.g. AES256), generate keys using a strong cryptographic standard random number generator and store the key(s) securely in a different location to the data, such as in a permission protected file outside the database in the example above. The latter means that SQL injection attacks will not reveal the keys.
In an effort to increase performance, I was thinking of trying to eliminate a plain 'session cookie', but encrypt all the information in the cookie itself.
A very simple example:
userid= 12345
time=now()
signature = hmac('SHA1',userid + ":" + time, secret);
cookie = userid + ':' + time + ':' + signature;
The time would be used for a maximum expirytime, so cookies won't live on forever.
Now for the big question: is this a bad idea?
Am I better off using AES256 instead? In my case the data is not confidential, but it must not be changed under any circumstances.
EDIT
After some good critique and comments, I'd like to add this:
The 'secret' would be unique per-user and unpredictable (random string + user id ?)
The cookie will expire automatically (this is done based on the time value + a certain amount of seconds).
If a user changes their password, (or perhaps even logs out?) the secret should change.
A last note: I'm trying come up with solutions to decrease database load. This is only one of the solutions I'm investigating, but it's kind of my favourite. The main reason is that I don't have to look into other storage mechanism better suited for this kind of data (memcache, nosql) and it makes the web application a bit more 'stateless'.
10 years later edit
JWT is now a thing.
A signed token is a good method for anything where you want to issue a token and then, when it is returned, be able to verify that you issued the token, without having to store any data on the server side. This is good for features like:
time-limited-account-login;
password-resetting;
anti-XSRF forms;
time-limited-form-submission (anti-spam).
It's not in itself a replacement for a session cookie, but if it can eliminate the need for any session storage at all that's probably a good thing, even if the performance difference isn't going to be huge.
HMAC is one reasonable way of generating a signed token. It's not going to be the fastest; you may be able to get away with a simple hash if you know about and can avoid extension attacks. I'll leave you to decide whether that's worth the risk for you.
I'm assuming that hmac() in whatever language it is you're using has been set up to use a suitable server-side secret key, without which you can't have a secure signed token. This secret must be strong and well-protected if you are to base your whole authentication system around it. If you have to change it, everyone gets logged out.
For login and password-resetting purposes you may want to add an extra factor to the token, a password generation number. You can re-use the salt of the hashed password in the database for this if you like. The idea is that when the user changes passwords it should invalidate any issued tokens (except for the cookie on the browser doing the password change, which gets replaced with a re-issued one). Otherwise, a user discovering their account has been compromised cannot lock other parties out.
I know this question is very old now but I thought it might be a good idea to update the answers with a more current response. For anyone like myself who may stumble across it.
In an effort to increase performance, I was thinking of trying to
eliminate a plain 'session cookie', but encrypt all the information in
the cookie itself.
Now for the big question: is this a bad idea?
The short answer is: No it's not a bad idea, in fact this is a really good idea and has become an industry standard.
The long answer is: It depends on your implementation. Sessions are great, they are fast, they are simple and they are easily secured. Where as a stateless system works well however, is a bit more involved to deploy and may be outside the scope of smaller projects.
Implementing an authentication system based on Tokens (cookies) is very common now and works exceedingly well for stateless systems/apis. This makes it possible to authenticate for many different applications with a single account. ie. login to {unaffiliated site} with Facebook / Google.
Implementing an oAuth system like this is a BIG subject in and of itself. So I'll leave you with some documentation oAuth2 Docs. I also recommend looking into Json Web Tokens (JWT).
extra
A last note: I'm trying come up with solutions to decrease database
load. This is only one of the solutions I'm investigating
Redis would work well for offloading database queries. Redis is an in memory simple storage system. Very fast, ~temporary storage that can help reduce DB hits.
Update: This answer pertains to the question that was actually asked, not to an imagined history where this question was really about JWT.
The most important deviations from today's signed tokens are:
The question as originally posed didn't evince any understanding of the need for a secret in token generation. Key management is vital for JWT.
The questioner stated that they could not use HTTPS, and so they lacked confidentiality for the token and binding between the token and the request. In the same way, even full-fledged JWT can't secure a plain HTTP request.
When the question was revised to explain how a secret could be incorporated, the secret chosen required server-side state, and so fell short of the statelessness provided by something like JWT.
Even today, this homebrew approach would be a bad idea. Follow a standard like JWT, where both the scheme and its implementations have been carefully scrutinized and refined.
Yes, this is a bad idea.
For starters, it's not secure. With this scheme, an attacker can generate their own cookie and impersonate any user.
Session identifiers should be chosen from a large (128-bit) space by a cryptographic random number generator.
They should be kept private, so that attackers cannot steal them and impersonate an authenticated user. Any request that performs an action that requires authorization should be tamper-proof. That is, the entire request must have some kind of integrity protection such as an HMAC so that its contents can't be altered. For web applications, these requirements lead inexorably to HTTPS.
What performance concerns do you have? I've never seen a web application where proper security created any sort of hotspot.
If the channel doesn't have privacy and integrity, you open yourself up to man-in-the-middle attacks. For example, without privacy, Alice sends her password to Bob. Eve snoops it and can log in later as Alice. Or, with partial integrity, Alice attaches her signed cookie to a purchase request and sends them to Bob. Eve intercepts the request and modifies the shipping address. Bob validates the MAC on the cookie, but can't detect that the address has been altered.
I don't have any numbers, but it seems to me that the opportunities for man-in-the-middle attacks are constantly growing. I notice restaurants using the wi-fi network they make available to customers for their credit-card processing. People at libraries and in work-places are often susceptible to sniffing if their traffic isn't over HTTPS.
You should not reinvent the wheel. The session handler that comes with your development platform far is more secure and certainly easier to implement. Cookies should always be very large random numbers that links to server side data. A cookie that contains a user id and time stamp doesn't help harden the session from attack.
This proposed session handler is more vulnerable to attack than using a Cryptographic nonce for each session. An attack scenario is as follows.
It is likely that you are using the same secret for your HMAC calculation for all sessions. Thus this secret could be brute forced by an attacker logging in with his own account. By looking at his session id he can obtain everything except for the secret. Then the attacker could brute force the secret until the hmac value can be reproduced. Using this secret he can rebuild a administrative cookie and change his user_id=1, which will probably grant him administrative access.
What makes you think this will improve performance vs. secure session IDs and retrieving the userid and time information from the server-side component of the session?
If something must be tamper-proof, don't put it in the toddlers' hands. As in, don't give it to the client at all, even with the tamper-proof locking.
Ignoring the ideological issues, this looks pretty decent. You don't have a nonce. You should add that. Just some random garbage that you store along with the userid and time, to prevent replay or prediction.