How are session identifiers generated?

How are session identifiers generated? - security

Most web applications depend on some kind of session with the user (for instance, to retain login status). The session id is kept as a cookie in the user's browser and sent with every request.
To make it hard to guess the next user's session these session-ids need to be sparse and somewhat random. The also have to be unique.
The question is - how to efficiently generate session ids that are sparse and unique?
This question has a good answer for unique random numbers, but it seems not scalable for a large range of numbers, simply because the array will end up taking a lot of memory.
EDIT:
GUIDs are considered unsafe as far as security (and randomness) go.
The core problem is making sure the numbers are unique, i.e. they don't repeat and making it efficient.

If you want them to be unique and not easily guessable, why not combine these?
Take a counter (generates unique value for new session) and append random bits generated by a CSPRNG. Make sure to get the minimum number of bits required right.
This should work on a farm as well without hitches: just prefix the counter that is local to a server with an id that is unique to that server.
SSSSCCCCCRRRRRR
Where S is server id that created the session, C is the server local counter and R is a crypto random.
(Disclaimer: the number of letters do not correspond to the number of digits/bits you should use in any way. :)
Unique, secure.

You could take a look at the RNGCryptoServiceProvider if you are using .NET.
http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=775
This is a cryptographically secure way of generating random numbers.

Related

Unique hash as authorization for endpoint

I've already saw, that sometimes companies are sending customized links to get to some resource without logging in.
For example some company send me email with link to my invoices:
www.financial.service.com/<SOME_HASHED_VALUE>
and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value. I have very similar case but I have concerns:
firstly is it good approach ?
secondly how should I make this hash? sha512 on some random data?

This can be a completely valid approach, and is its own type of authentication. If constructed correctly, it proves that you have access to that email (it doesn't prove anything else, but it does prove that much).
These values often aren't hashes. They're often random, and that's their power. If they are hashes, they need to be constructed such that their output is "effectively random," so usually you might as well just make them random in the first place. For this discussion, I'll call it a "token."
The point of a token is that's unpredictable, and extremely sparse within its search space. By unpredictable, I mean that even if I know exactly who the token is for, it should be effectively impossible (i.e. within practical time contraints) to construct a legitimate token for that user. So, for instance, if this were the hash of the username and a timestamp (even a millisecond timestamp), that would be a terrible token. I could guess those very quickly. So random is best.
By "sparse" I mean that out of all the possible tokens (i.e. strings of the correct length and format), a vanishingly small number of them should be valid tokens, and those valid tokens should be scattered across the search space randomly. For example, if the tokens were sequential, that would be terrible. If I had a token, I could find other tokens by simply increasing or decreasing the value by one.
So a good token looks like this:
Select a random, long string
Store it in your database, along with metadata about what it means, and a timestamp
When a user shows up with it, read the data from the database
After some period of time, expire the token by deleting it from the database (optional, but preferred)
Another way to implement this kind of scheme is to encode the encrypted metadata (i.e. the userid, what page this goes to, a timestamp, etc). Then you don't need to store anything in a database, because its right there in the URL. I don't usually like this approach because it requires a very high-value crypto key that you then have to protect on production servers, and can be used to connect as anyone. Even if I have good ways to protect such a key (generally an attached HSM), I don't like such a key even existing. So generally I prefer a database. But for some applications, encrypting the data is better. Storing the metadata in the URL also significantly restricts how much metadata you can store, so again, tokens are nicer.

and there is no any authorization behind this endpoint, they only rely on fact that I am only person who knows this hash value.
Usually there is authorization before accessing the endpoint (you have authenticated before receiving the invoices). I see it as a common way to share resource with external parties. We use similar approach with expirable AWS S3 urls.
firstly is it good approach ?
It depends on your use case. For sharing some internal resources with option to control access (revoking access, time based access, one time access, ..)
secondly how should I make this hash? sha512 on some random data?
Until the SOME_HASHED_VALUE is not guessable with negligible collision probability (salted hash, long random unique value, ..) it should be ok.

How to encrypt a astring in nodeJs that should be used as a an index

I am designing a microservice architecture using nodeJs and mongoDb. I have a usecase to save driver's license number which also be used to validate the user. Now, as DL number is PII, I don't want to save the string as is, I want to encrypt it before saving. I can use encryption logic to generate a common encrypted string everytime, so I can encrypt the DL number and do a lookup in db. But I am worried about the hackers can decrypt and get all DL numbers if they know the encryption logic for one. Can someone suggest me the best approach for this kind of use case?

Ignoring the index for a moment, it sounds like the best approach is to hash the license number using a keyed hash, and store the hash. This is similar to symmetric encryption in that you need to keep a secret, the key. However, it's one-way so attackers that obtain the secret will still need to brute-force each entry to obtain the number.
If the key is compromised, depending on the license number scheme, brute-forcing each number will vary in difficulty from easy to trivial. But, it's better than plaintext.
However, if you really need it as an index you have what appears to be conflicting priorities. I'll defer to someone else, I don't know much about DB indexing.
If it were me and I had time to spare I'd setup one table with the hashes and one with the plaintext license number as an index. Add 10 million rows (or some ceiling that's relevant to you) of test data and profile a few thousand random lookups of each one.

Generating unique tokens in a NodeJS, Crypto Token authentication environment

Using nodejs and crypto, right now, when a user logs in, I generate a random auth token:
var token = crypto.randomBytes(16).toString('hex');
I know it's unlikely, but there is a tiny chance for two tokens to be of the same value.
This means a user could, in theory, authenticate on another account.
Now, I see two obvious methods to get pass this:
When I generate the token, query the user's database and see if a
Token with the same value already exists. If it does, just generate another one. As you can see, this is not perfect since I am adding queries to the database.
Since every user has a unique username in my database, I could
generate a random token using the username as a secret generator key.
This way, there is no way of two tokens having the same value. Can crypto do that? Is it secure?
How would you do it?

It's too unlikely to worry about it happening by chance. I would not sacrifice performance to lock and check the database for it.
Consider this excerpt from Pro Git about the chance of random collisions between 20-byte SHA-1 sums:
Here’s an example to give you an idea of what it would take to get a
SHA-1 collision [by chance]. If all 6.5 billion humans on Earth were programming,
and every second, each one was producing code that was the equivalent
of the entire Linux kernel history (1 million Git objects) and pushing
it into one enormous Git repository, it would take 5 years until that
repository contained enough objects to have a 50% probability of a
single SHA-1 object collision. A higher probability exists [for average projects] that every
member of your programming team will be attacked and killed by wolves
in unrelated incidents on the same night.
(SHA-1 collisions can be directly constructed now, so the quote is now less applicable to SHA-1, but it's still valid when considering collisions of random values.)
If you are still worried about that probability, then you can easily use more random bytes instead of 16.
But regarding your second idea: if you hashed the random ID with the username, then that hash could collide, just like the random ID could. You haven't solved anything.

You should always add a UNIQUE constraint to your database column. This will create an implicit index to improve searches for this column and it will make sure that none of two records will ever has the same value. So, in the worst-case scenario you will get a database exception and not a security violation.
Also, depending on how frequently unique tokens are needed to be created, I think it's perfectly fine in most cases to use database lookups during generation. If your column, again, is properly indexed, it will be a pretty fast query. Most databases a very well horizontally scalable, so if your are building a next Facebook it is again an option. Furthermore, you will probably need to do a query to check for E-Mail uniqueness anyway.
Finally, if you are really concerned about performance you could always pre-generate a one-million of unique tokens and store them in the separate database table for quick use. Just setup a routine to periodically check it's usage and insert more records to it as needed. However, as #MacroMan stated in the comments, this could have a security implications if someone will get access to the list of pre-generated tokens, so this practice should be avoided.
PostgreSQL UNIQUE CONSTRAINT
MySQL: Unique Constraints

Are MongoDB ids guessable?

If you bind an api call to the object's id, could one simply brute force this api to get all objects? If you think of MySQL, this would be totally possible with incremental integer ids. But what about MongoDB? Are the ids guessable? For example, if you know one id, is it easy to guess other (next, previous) ids?
Thanks!

Update Jan 2019: As mentioned in the comments, the information below is true up until version 3.2. Version 3.4+ changed the spec so that machine ID and process ID were merged into a single random 5 byte value instead. That might make it harder to figure out where a document came from, but it also simplifies the generation and reduces the likelihood of collisions.
Original Answer:
+1 for Sergio's answer, in terms of answering whether they could be guessed or not, they are not hashes, they are predictable, so they can be "brute forced" given enough time. The likelihood depends on how the ObjectIDs were generated and how you go about guessing. To explain, first, read the spec here:
Object ID Spec
Let us then break it down piece by piece:
TimeStamp - completely predictable as long as you have a general idea of when the data was generated
Machine - this is an MD5 hash of one of several options, some of which are more easily determined than others, but highly dependent on the environment
PID - again, not a huge number of values here, and could be sleuthed for data generated from a known source
Increment - if this is a random number rather than an increment (both are allowed), then it is less predictable
To expand a bit on the sources. ObjectIDs can be generated by:
MongoDB itself (but can be migrated, moved, updated)
The driver (on any machine that inserts or updates data)
Your Application (you can manually insert your own ObjectID if you wish)
So, there are things you can do to make them harder to guess individually, but without a lot of forethought and safeguards, for a normal data set, the ranges of valid ObjectIDs should be fairly easy to work out since they are all prefixed with a timestamp (unless you are manipulating this in some way).

Mongo's ObjectId were never meant to be a protection from brute force attack (or any attack, for that matter). They simply offer global uniqueness. You should not assume that some object can't be accessed by a user because this user should not know its id.
For an actual protection of your resources, employ other techniques.
If you defend against an unauthorized access, place some authorization logic in your app (allow access to legitimate users, deny for everyone else).
If you want to hinder dumping all objects, use some kind of rate limiting. Combine with authorization if applicable.
Optional reading: Eric Lippert on GUIDs.

Is it a security risk to use parts of GUID as a random passwords?

When users create an account in my web application, I generate a GUID and use the first 8 characters as their password which is then sent via email.
Is there a security risk I am overlooking in using GUIDs as passwords? I've taken a look at the questionAre GUIDs good passwords?, but that question pertains to personal passwords not random/generated passwords. Ideally, users will login and change their password if they want to.

Using GUIDs as passwords is a very bad idea. GUIDs are generated in a very predictable and well defined manner. Or in other words given enough information it would allow an attacker to predict the passwords of other users.
Predictable and well defined is the exact opposite of what you want in a password generator.

Yes, unless you know exactly how the GUID is built. For example, some GUIDs bundle the MAC address of the host in to the GUID. If you happen to use those bits, then that compromises a large amount of the bit space for the "random" password.
Simply put, GUIDs may be unique, but they are not necessarily random.

"Cryptanalysis of the WinAPI GUID generator shows that, since the sequence of V4 GUIDs is pseudo-random; given full knowledge of the internal state, it is possible to predict previous and subsequent values." http://en.wikipedia.org/wiki/Globally_unique_identifier
I wouldn't use it. It's not that hard to use a random number generator, after all, which are designed to be as random as possible, rather than attempting to guarantee global uniqueness.

This article says don't use it.

GUIDs come in a number of flavors; some have parts that are predictable.
On the other hand, it is very, very easy to generate random numbers.
Why use a questionable technique when a secure alternative is readily available?

Using part of the GUID, or even the whole thing, is a very bad idea. Even if most of it happens to be random, there's no guarantee that any particular portion will be.
I'm not sure there'd be much trouble using a hash of a GUID, or better yet a hash that combined a GUID with some other source of randomness (e.g. one might hash the time when the program starts, and then generate a passcode by returning part of a hash of the previous hash and a new GUID). If there's any randomness at all in GUID generation, the entropy of the hash should increase with each iteration. Note that the passcode should not reveal the entire hash value; some of that should be kept as secret internal state.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string