Create and authenticate access tokens - security

I am planning to create a small application that users can login to use. To provide API access, I plan to provide them with access tokens just like a lot of services provide.
When users login with email/password, I search for email address and match the password hash against that database entry. How can I create an access token though, something like GitHub:
ghp_36-characters
I have seen GitHub's tokens to always start with ghp_ followed by 36 characters. Maybe the length of the characters can be a little more or less some times, but that's their format. How do they get a deterministic length and a specific format?
Possibilities I considered:
Encrypt the user ID with a strong encryption algorithm and a strong encryption password
Generate a random string
Use JWTs
Problems with each of those approaches:
The length of the string would depend upon the length of the user ID followed by encryption padding and other pieces of the equation. Also, I don't think it's possible to make it follow a specific format.
I can generate a random string, and store its hash like passwords, but then how can I find the entry in database to authenticate it against? What I mean is, I won't have a known value to compare the unknown value against in the database.
JWTs would heavily depend on the length of the string to encode and are extremely long even for really small data. Again, they have their own structure which won't be possible to escape.
Following a specific format is not a requirement for my application, so I can use any of these options successfully (possibly not the second one), but I'm curious to know how all services achieve this.

Related

Is there a hashing technique that works both ways?

TLDR;
The hashing function generates a different hash every time for the same piece of data, but it can determine if a particular hash was generated with the piece of data or not.
Eg:
hash_func(xyz): abc123
hash_func(xyz): jhg342 // different hash, even if the data was same.
decode_hash(jhg324) == xyz
This gives true, because the hash function determined that jhg324 is indeed the hash of xyz
The Question
For an Open Source website, I want to store the email in hashed form (because all the users will be public), but the site needs to know if an email was used to register for another account so that it can ensure one account per email.
However, all the emails are from one organization only. This means, they all look exactly like uid#org_name.com. This means anyone can run through all the UIDs and find out which hash belongs to which email, and thus, which person.
Therefore, is there a way to hash the email such that the hash knows which email it belongs to, but hashing the same email does not generate the same hash.
P.S. Please note that I cannot use Salting as the site will be Open Source and the salt will be publicly available.
This doesn't make sense - you're conflating hashing and encryption in a very strange way. What you're describing wouldn't really be a cryptographically secure hash function. By definition, cryptographically secure hash functions are one way. In fact, if you could reverse it, there would be little point to using it at all because it would no longer be secure. This would make it possible to brute-force passwords and would "break" passwords that were used in multiple places.
Also, why would you want it to hash to different values each time? That's what you use a salt for.
If you want to be able to reverse it later, just use an encryption algorithm like AES. Even better, many databases even offer features for securely storing sensitive information; see, for example, SQL Server's Always Encrypted feature.

Unique hash as authorization for endpoint

I've already saw, that sometimes companies are sending customized links to get to some resource without logging in.
For example some company send me email with link to my invoices:
www.financial.service.com/<SOME_HASHED_VALUE>
and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value. I have very similar case but I have concerns:
firstly is it good approach ?
secondly how should I make this hash? sha512 on some random data?
This can be a completely valid approach, and is its own type of authentication. If constructed correctly, it proves that you have access to that email (it doesn't prove anything else, but it does prove that much).
These values often aren't hashes. They're often random, and that's their power. If they are hashes, they need to be constructed such that their output is "effectively random," so usually you might as well just make them random in the first place. For this discussion, I'll call it a "token."
The point of a token is that's unpredictable, and extremely sparse within its search space. By unpredictable, I mean that even if I know exactly who the token is for, it should be effectively impossible (i.e. within practical time contraints) to construct a legitimate token for that user. So, for instance, if this were the hash of the username and a timestamp (even a millisecond timestamp), that would be a terrible token. I could guess those very quickly. So random is best.
By "sparse" I mean that out of all the possible tokens (i.e. strings of the correct length and format), a vanishingly small number of them should be valid tokens, and those valid tokens should be scattered across the search space randomly. For example, if the tokens were sequential, that would be terrible. If I had a token, I could find other tokens by simply increasing or decreasing the value by one.
So a good token looks like this:
Select a random, long string
Store it in your database, along with metadata about what it means, and a timestamp
When a user shows up with it, read the data from the database
After some period of time, expire the token by deleting it from the database (optional, but preferred)
Another way to implement this kind of scheme is to encode the encrypted metadata (i.e. the userid, what page this goes to, a timestamp, etc). Then you don't need to store anything in a database, because its right there in the URL. I don't usually like this approach because it requires a very high-value crypto key that you then have to protect on production servers, and can be used to connect as anyone. Even if I have good ways to protect such a key (generally an attached HSM), I don't like such a key even existing. So generally I prefer a database. But for some applications, encrypting the data is better. Storing the metadata in the URL also significantly restricts how much metadata you can store, so again, tokens are nicer.
and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value.
Usually there is authorization before accessing the endpoint (you have authenticated before receiving the invoices). I see it as a common way to share resource with external parties. We use similar approach with expirable AWS S3 urls.
firstly is it good approach ?
It depends on your use case. For sharing some internal resources with option to control access (revoking access, time based access, one time access, ..)
secondly how should I make this hash? sha512 on some random data?
Until the SOME_HASHED_VALUE is not guessable with negligible collision probability (salted hash, long random unique value, ..) it should be ok.

How to encrypt a astring in nodeJs that should be used as a an index

I am designing a microservice architecture using nodeJs and mongoDb. I have a usecase to save driver's license number which also be used to validate the user. Now, as DL number is PII, I don't want to save the string as is, I want to encrypt it before saving. I can use encryption logic to generate a common encrypted string everytime, so I can encrypt the DL number and do a lookup in db. But I am worried about the hackers can decrypt and get all DL numbers if they know the encryption logic for one. Can someone suggest me the best approach for this kind of use case?
Ignoring the index for a moment, it sounds like the best approach is to hash the license number using a keyed hash, and store the hash. This is similar to symmetric encryption in that you need to keep a secret, the key. However, it's one-way so attackers that obtain the secret will still need to brute-force each entry to obtain the number.
If the key is compromised, depending on the license number scheme, brute-forcing each number will vary in difficulty from easy to trivial. But, it's better than plaintext.
However, if you really need it as an index you have what appears to be conflicting priorities. I'll defer to someone else, I don't know much about DB indexing.
If it were me and I had time to spare I'd setup one table with the hashes and one with the plaintext license number as an index. Add 10 million rows (or some ceiling that's relevant to you) of test data and profile a few thousand random lookups of each one.

Remembering authenticated users with cookies

In the past I have written a CMS where authenticated users are remembered across HTTP requests with two cookies:
User Token - A random, multi-character (say 10-digit long) alphanumeric string that relates back to an actual User ID in the database.
Authentication Token - A random, mult-character (say 100-digit long) alphanumeric string that, once hashed, must match the stored value for said User ID in the database.
My question (for a new CMS) is as follows:
What is the point of using two cookies? Wouldn't it be just as secure if I instead used a single 110-digit long token that, once hashed, must match the stored value for some User ID in the database. When a match of this token is found in the database, the related User ID would be considered the authenticated user.
User and Auth Tokens vs. Combined Token
The best reason to keep them separate, is to keep your code more manageable, portable, and playing nice with others.
Security
If security is your only concern, then it would be more secure to combine the user and auth tokens into a single encrypted token, if and only if both sequences were generated via a method which does not result in any particular character being more heavily weighted. The reason being that the act of combining the two values essentially acts as an additional simple encryption step, as well as being a larger encryption key, and thus more difficult to spoof or crack.
The weight of a character refers to how likely it is to occur. Many methods of hashing or encrypting data result in very easily cracked (Excel's horribly insecure password for one).The main reason being that as certain characters are more likely to appear, then many can also be substituted for others. The final encryption result having hundreds of thousands of unintended and unknown encryption keys. (Try out that excel cracker for an example)
Maintainability and Performance
However, there are some drawbacks to combining the two values, primarily performance and maintainability related.
Creating or collecting the combined key will require at least 2 extra steps every time.
Any need to get or set either value will require extracting all of the information.
You may no longer update the auth-token, without also updating the user-token.
This can cause severe issues later on if you ever expect to tie the user to other sessions. i.e.: google login paired with your auth system.
Anyone else looking at your code will have to reverse engineer how you create the user-auth combo, if they intend to add any functionality, such as group-level permissions.
Conform already
I'm not much of a conformist myself. However, in many things the crowd will flow to the path of least resistance. More often than not, common practices become common only after experience showed that the other ways had bigger problems. This is one of those cases.
Minimal security impact plus needing one cookie instead of two, traded for a less portable and less performing platform. At the end of the day, it's your call.
Finally
It may be best to stop bothering with keeping both keys, and instead just a unique session hash. Then, just pair old sessions to users, IFF they re-authenticate after expiration.
NEVER use cookies (even encrypted) to auto-login without have several other checks and balances in place. Even with extra checks, if you're storing confidential information (names, addresses, phone, email), the security is between you and your user, so be extra cautious.
At the end of the day, you're the architect, go whichever route best fits your platform and environment.

Are two security keys better than one?

I just implemented a "remember me" feature for a user login on a website. Most advice was to have the userid stored in a cookie, and then have some long, unguessable random key. If both of these match up, the user is considered authenticated.
Does having two strings actually help? Wouldn't a longer key do exactly the same thing?
In other words, aren't two keys equally susceptible to attacks as one longer key? (I imagine it would be the total length of the keys, regardless of how many you have)
Note: There might be some DB query efficiency issues here too, e.g., looking up a big UUID in the DB is not as easy as looking up a small number. (On a tangential note, Gmail uses a six digit number as their one-time login token along with the username.)
Robust discussion of that in this SO thread.
... the user is considered authenticated.
Should probably read authenticated but with limited authoriziation.
Per comment: Somewhat more secure since it's one time use and it's hard to guess. So if the cookie is compromised, the attacker has to act quickly or the token will be invalidated by the legitimate user loging in whereas the userid may not change for a long time.
I'm no crypto expert, but as long as you check for brute-forcing attempts, you should be able to use a short key (like Gmail's 6 digits). The real vulnerability is people listening when a user logs in (eg. SideJacking).
In sites I have previously created I made use of a user_id and a salted hash of the user's password. The primary reason I used two fields to authenticate a user is because it saved me the trouble of adding another table (and thus complicating the database design.) With the user_id also being stored in the cookie I could do an indexed look-up in users table and efficiently match the salted hash to the user. Of course you could concatenate both the user_id and the hash into one value and just store that in the cookie.
If you just have a random unguessable string then you would have to have a separate table to associate the random string with a user-id and do another look-up for that particular user.

Resources