Unique hash as authorization for endpoint - security

I've already saw, that sometimes companies are sending customized links to get to some resource without logging in.
For example some company send me email with link to my invoices:
www.financial.service.com/<SOME_HASHED_VALUE>
and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value. I have very similar case but I have concerns:
firstly is it good approach ?
secondly how should I make this hash? sha512 on some random data?

This can be a completely valid approach, and is its own type of authentication. If constructed correctly, it proves that you have access to that email (it doesn't prove anything else, but it does prove that much).
These values often aren't hashes. They're often random, and that's their power. If they are hashes, they need to be constructed such that their output is "effectively random," so usually you might as well just make them random in the first place. For this discussion, I'll call it a "token."
The point of a token is that's unpredictable, and extremely sparse within its search space. By unpredictable, I mean that even if I know exactly who the token is for, it should be effectively impossible (i.e. within practical time contraints) to construct a legitimate token for that user. So, for instance, if this were the hash of the username and a timestamp (even a millisecond timestamp), that would be a terrible token. I could guess those very quickly. So random is best.
By "sparse" I mean that out of all the possible tokens (i.e. strings of the correct length and format), a vanishingly small number of them should be valid tokens, and those valid tokens should be scattered across the search space randomly. For example, if the tokens were sequential, that would be terrible. If I had a token, I could find other tokens by simply increasing or decreasing the value by one.
So a good token looks like this:
Select a random, long string
Store it in your database, along with metadata about what it means, and a timestamp
When a user shows up with it, read the data from the database
After some period of time, expire the token by deleting it from the database (optional, but preferred)
Another way to implement this kind of scheme is to encode the encrypted metadata (i.e. the userid, what page this goes to, a timestamp, etc). Then you don't need to store anything in a database, because its right there in the URL. I don't usually like this approach because it requires a very high-value crypto key that you then have to protect on production servers, and can be used to connect as anyone. Even if I have good ways to protect such a key (generally an attached HSM), I don't like such a key even existing. So generally I prefer a database. But for some applications, encrypting the data is better. Storing the metadata in the URL also significantly restricts how much metadata you can store, so again, tokens are nicer.

and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value.
Usually there is authorization before accessing the endpoint (you have authenticated before receiving the invoices). I see it as a common way to share resource with external parties. We use similar approach with expirable AWS S3 urls.
firstly is it good approach ?
It depends on your use case. For sharing some internal resources with option to control access (revoking access, time based access, one time access, ..)
secondly how should I make this hash? sha512 on some random data?
Until the SOME_HASHED_VALUE is not guessable with negligible collision probability (salted hash, long random unique value, ..) it should be ok.

Related

Generating unique tokens in a NodeJS, Crypto Token authentication environment

Using nodejs and crypto, right now, when a user logs in, I generate a random auth token:
var token = crypto.randomBytes(16).toString('hex');
I know it's unlikely, but there is a tiny chance for two tokens to be of the same value.
This means a user could, in theory, authenticate on another account.
Now, I see two obvious methods to get pass this:
When I generate the token, query the user's database and see if a
Token with the same value already exists. If it does, just generate another one. As you can see, this is not perfect since I am adding queries to the database.
Since every user has a unique username in my database, I could
generate a random token using the username as a secret generator key.
This way, there is no way of two tokens having the same value. Can crypto do that? Is it secure?
How would you do it?
It's too unlikely to worry about it happening by chance. I would not sacrifice performance to lock and check the database for it.
Consider this excerpt from Pro Git about the chance of random collisions between 20-byte SHA-1 sums:
Here’s an example to give you an idea of what it would take to get a
SHA-1 collision [by chance]. If all 6.5 billion humans on Earth were programming,
and every second, each one was producing code that was the equivalent
of the entire Linux kernel history (1 million Git objects) and pushing
it into one enormous Git repository, it would take 5 years until that
repository contained enough objects to have a 50% probability of a
single SHA-1 object collision. A higher probability exists [for average projects] that every
member of your programming team will be attacked and killed by wolves
in unrelated incidents on the same night.
(SHA-1 collisions can be directly constructed now, so the quote is now less applicable to SHA-1, but it's still valid when considering collisions of random values.)
If you are still worried about that probability, then you can easily use more random bytes instead of 16.
But regarding your second idea: if you hashed the random ID with the username, then that hash could collide, just like the random ID could. You haven't solved anything.
You should always add a UNIQUE constraint to your database column. This will create an implicit index to improve searches for this column and it will make sure that none of two records will ever has the same value. So, in the worst-case scenario you will get a database exception and not a security violation.
Also, depending on how frequently unique tokens are needed to be created, I think it's perfectly fine in most cases to use database lookups during generation. If your column, again, is properly indexed, it will be a pretty fast query. Most databases a very well horizontally scalable, so if your are building a next Facebook it is again an option. Furthermore, you will probably need to do a query to check for E-Mail uniqueness anyway.
Finally, if you are really concerned about performance you could always pre-generate a one-million of unique tokens and store them in the separate database table for quick use. Just setup a routine to periodically check it's usage and insert more records to it as needed. However, as #MacroMan stated in the comments, this could have a security implications if someone will get access to the list of pre-generated tokens, so this practice should be avoided.
PostgreSQL UNIQUE CONSTRAINT
MySQL: Unique Constraints

Remembering authenticated users with cookies

In the past I have written a CMS where authenticated users are remembered across HTTP requests with two cookies:
User Token - A random, multi-character (say 10-digit long) alphanumeric string that relates back to an actual User ID in the database.
Authentication Token - A random, mult-character (say 100-digit long) alphanumeric string that, once hashed, must match the stored value for said User ID in the database.
My question (for a new CMS) is as follows:
What is the point of using two cookies? Wouldn't it be just as secure if I instead used a single 110-digit long token that, once hashed, must match the stored value for some User ID in the database. When a match of this token is found in the database, the related User ID would be considered the authenticated user.
User and Auth Tokens vs. Combined Token
The best reason to keep them separate, is to keep your code more manageable, portable, and playing nice with others.
Security
If security is your only concern, then it would be more secure to combine the user and auth tokens into a single encrypted token, if and only if both sequences were generated via a method which does not result in any particular character being more heavily weighted. The reason being that the act of combining the two values essentially acts as an additional simple encryption step, as well as being a larger encryption key, and thus more difficult to spoof or crack.
The weight of a character refers to how likely it is to occur. Many methods of hashing or encrypting data result in very easily cracked (Excel's horribly insecure password for one).The main reason being that as certain characters are more likely to appear, then many can also be substituted for others. The final encryption result having hundreds of thousands of unintended and unknown encryption keys. (Try out that excel cracker for an example)
Maintainability and Performance
However, there are some drawbacks to combining the two values, primarily performance and maintainability related.
Creating or collecting the combined key will require at least 2 extra steps every time.
Any need to get or set either value will require extracting all of the information.
You may no longer update the auth-token, without also updating the user-token.
This can cause severe issues later on if you ever expect to tie the user to other sessions. i.e.: google login paired with your auth system.
Anyone else looking at your code will have to reverse engineer how you create the user-auth combo, if they intend to add any functionality, such as group-level permissions.
Conform already
I'm not much of a conformist myself. However, in many things the crowd will flow to the path of least resistance. More often than not, common practices become common only after experience showed that the other ways had bigger problems. This is one of those cases.
Minimal security impact plus needing one cookie instead of two, traded for a less portable and less performing platform. At the end of the day, it's your call.
Finally
It may be best to stop bothering with keeping both keys, and instead just a unique session hash. Then, just pair old sessions to users, IFF they re-authenticate after expiration.
NEVER use cookies (even encrypted) to auto-login without have several other checks and balances in place. Even with extra checks, if you're storing confidential information (names, addresses, phone, email), the security is between you and your user, so be extra cautious.
At the end of the day, you're the architect, go whichever route best fits your platform and environment.

Does anyone see any downsides of doing the following to prevent CSRF?

I'm wondering if the following method will completely prevent CSRF, and be compatible with all users.
Here it is:
In the form just include an extra parameter that is: encrypted(user's userID + request time). Server-side just decrypt and make sure it's the right userID and the request time was reasonably recent.
Aside from someone sniffing the user's traffic or breaking the encryption, is this completely secure? Are there any downsides?
While your approach is safe it is not standard. The standard way to prevent CSRF attacks is to generate pseudo-random number that you include in a hidden field and also in a cookie and then on the server side you verify that both values match. Take a look at this post.
One major downside is that your page will 'timeout' if the user leaves their browser open longer than the time frame you decide is reasonable before they post the form. I prefer sites not to rush their user into committing their action unless the action is inherently time-sensitive.
It's not completely secure, as it may be possible for an attacking site to guess the User ID.
If you use a per-session encryption key, it is secure. (But then all you need to do is send the raw key, and it's already secure)
Also, remember about timezones and inaccurate clocks.
It should work, yes. Though I would suggest you use a random number for UserID when you create new users, rather than a simple incrementing number (obviously, make sure it's unique when you create the user). That way, it's hard for an attacker to "guess".
Having the UserID and a DateTime is a start, but you also want a pseudo random number value preferably with high entropy in addition in the canary token. Basically, you need to reduce the predictability of the token in a page in a given context. Having only the UserID and a DateTime could in theory be possible to break after some time as it is not "random enough". Having said that, CSRF attacks are generally scripted and not directly monitored, so depending upon the exposure of your application, it may be enough.
Also, be sure to use a more secure encryption algorithm such as Rijndael/AES with a key of bits sufficient for the security of your application and a pseudo random initialization vector.
The security system you have proposed is vulnerable to attack.
Block ciphers like AES are commonly used as very secure random number generators. They are called CSPRNGs. However, like any random number generator you have to worry about what you are seeding the algorithm with. In this case you are using user's userID + request time both of which the attacker can know, your implementation doesn't have a Key or IV so I assume they are NULL. The attacker is building the request so he will always know the request time. The userId is likely a primary key, if you have 100 users then the attacker could forge 100 requests and one of them will work. But the attacker might just want to force the administrator to change his password, admin's usually have a primary key of 1.
Do not re-invent the wheal, very good random number generators have already been built and there are also anti-csrf libraries.

I have a simple database of content. Should I hash the "id" so that people don't look over it in the URL?

Is it recommended to create a column (unique key) that is a hash.
When people view my URL, it is currently like this:
url.com/?id=2134
But, people can look over this and data-mine all the content, right?
Is it RECOMMENDED to go 1 extra step to make this through hash?
url.com?id=3fjsdFNHDNSL
Thanks!
The first and most important step is to use some form of role-based security to ensure that no user can see data they aren't supposed to see. So, for example, if a user should only see their own information, then you should check that the id belongs to the logged-in user before you display it.
As a second level of protection, it's not a bad idea to have a unique key that doesn't let you predict other keys (a hash, as you suggest, or a UUID). However, that still means that, for example, a malicious user who obtained someone else's URL (e.g. by sniffing a network, by reading a log file, by viewing the history in someone's browser) could see that user's information. You need authentication and authorization, not simply obfuscating the IDs.
It sort of depends on your situation, but off hand I think if you think you need to hash you need to hash. If someone could data mine by, say, iterating through:
...
url.com?id=2134
url.com?id=2135
url.com?id=2136
...
Then using a hash for the id is necessary to avoid this, since it will be much harder to figure out the next one. Keep in mind, though, that you don't want to make the hash too obvious, so that a determined attacker would easily figure it out, e.g. just taking the MD5 of 2134 or whatever number you had.
Well, the problem here is that an actual Hash is technically one way. So if you hash the data you won't be able to recover it on the receiving side. Without knowing what technology you are using to create your web page it's hard to make any concrete suggestions, but if you must have sensitive information in your query string then I would recommend that you at least use a symmetric encryption algorithm on it to keep people from simply reading off the values and reverse engineering things.
Of course if you have the option - it's probably better to not have that information in the query string at all.

Are two security keys better than one?

I just implemented a "remember me" feature for a user login on a website. Most advice was to have the userid stored in a cookie, and then have some long, unguessable random key. If both of these match up, the user is considered authenticated.
Does having two strings actually help? Wouldn't a longer key do exactly the same thing?
In other words, aren't two keys equally susceptible to attacks as one longer key? (I imagine it would be the total length of the keys, regardless of how many you have)
Note: There might be some DB query efficiency issues here too, e.g., looking up a big UUID in the DB is not as easy as looking up a small number. (On a tangential note, Gmail uses a six digit number as their one-time login token along with the username.)
Robust discussion of that in this SO thread.
... the user is considered authenticated.
Should probably read authenticated but with limited authoriziation.
Per comment: Somewhat more secure since it's one time use and it's hard to guess. So if the cookie is compromised, the attacker has to act quickly or the token will be invalidated by the legitimate user loging in whereas the userid may not change for a long time.
I'm no crypto expert, but as long as you check for brute-forcing attempts, you should be able to use a short key (like Gmail's 6 digits). The real vulnerability is people listening when a user logs in (eg. SideJacking).
In sites I have previously created I made use of a user_id and a salted hash of the user's password. The primary reason I used two fields to authenticate a user is because it saved me the trouble of adding another table (and thus complicating the database design.) With the user_id also being stored in the cookie I could do an indexed look-up in users table and efficiently match the salted hash to the user. Of course you could concatenate both the user_id and the hash into one value and just store that in the cookie.
If you just have a random unguessable string then you would have to have a separate table to associate the random string with a user-id and do another look-up for that particular user.

Resources