If you bind an api call to the object's id, could one simply brute force this api to get all objects? If you think of MySQL, this would be totally possible with incremental integer ids. But what about MongoDB? Are the ids guessable? For example, if you know one id, is it easy to guess other (next, previous) ids?
Thanks!
Update Jan 2019: As mentioned in the comments, the information below is true up until version 3.2. Version 3.4+ changed the spec so that machine ID and process ID were merged into a single random 5 byte value instead. That might make it harder to figure out where a document came from, but it also simplifies the generation and reduces the likelihood of collisions.
Original Answer:
+1 for Sergio's answer, in terms of answering whether they could be guessed or not, they are not hashes, they are predictable, so they can be "brute forced" given enough time. The likelihood depends on how the ObjectIDs were generated and how you go about guessing. To explain, first, read the spec here:
Object ID Spec
Let us then break it down piece by piece:
TimeStamp - completely predictable as long as you have a general idea of when the data was generated
Machine - this is an MD5 hash of one of several options, some of which are more easily determined than others, but highly dependent on the environment
PID - again, not a huge number of values here, and could be sleuthed for data generated from a known source
Increment - if this is a random number rather than an increment (both are allowed), then it is less predictable
To expand a bit on the sources. ObjectIDs can be generated by:
MongoDB itself (but can be migrated, moved, updated)
The driver (on any machine that inserts or updates data)
Your Application (you can manually insert your own ObjectID if you wish)
So, there are things you can do to make them harder to guess individually, but without a lot of forethought and safeguards, for a normal data set, the ranges of valid ObjectIDs should be fairly easy to work out since they are all prefixed with a timestamp (unless you are manipulating this in some way).
Mongo's ObjectId were never meant to be a protection from brute force attack (or any attack, for that matter). They simply offer global uniqueness. You should not assume that some object can't be accessed by a user because this user should not know its id.
For an actual protection of your resources, employ other techniques.
If you defend against an unauthorized access, place some authorization logic in your app (allow access to legitimate users, deny for everyone else).
If you want to hinder dumping all objects, use some kind of rate limiting. Combine with authorization if applicable.
Optional reading: Eric Lippert on GUIDs.
Related
Assuming we build a web page that features some sort of profile page for users, then every user is supposed to be identified by use of a unique id. In MongoDB this is most likely the _id column of a document. When handling the link to a users profile, I don't feel comfortable exposing an actual database ID in the weblink, such as
https://www.foopage.com/userprofile/5612afeedd2387aeeebbdd211100ccdd
Instead, I'd like to introduce a shorter identifier, as can also be seen on youtube where every video has a unique 10 digit alphanumeric code.
My question now is:
1) From a security point of view: is it worth hiding the actual ID of a document from public or does it in fact not enhancing system security?
2) I am expecting performance drawbacks by using my own identifier, as the native _ID of MongoDB is most likely indexed and "optimized", whereas my identifier always requires some sort of data mapping or additional indexing. What is your experience? Is there any documented difference in using a native index and a manually created one for MongoDB?
1) From a security point of view: is it worth hiding the actual ID of a document from public or does it in fact not enhancing system security?
Usually an attacker should not have any benefit from knowing ObjectID's. That means unless you build some security vulnerability in your application which can be exploited by knowing an ID.
Another thing you need to be aware is that ObjectID's are not entirely random. They also leak a machine identifier, the process-id of the process which generated it and the date and time the ID was generated. See the documentation for more information. None of that information is really security-critical (although creation-time is something you might not want to reveal in some cases), but might be valuable information for an attacker trying to exploit some other vulnerability of your system.
That means hiding ObjectsIDs as part of a defense in depth strategy can be useful, but it is only security through obscurity. Do not expect ObjectIDs to be secret.
2) I am expecting performance drawbacks by using my own identifier, as the native _ID of MongoDB is most likely indexed and "optimized", whereas my identifier always requires some sort of data mapping or additional indexing.
Did you know that your _id values don't need to be an ObjectID? You can use any value of any type you want as long as it is unique on collection level. So if your data already has an unique identifier, you can use that as _id and save some bytes and an index. To do this, simply set the _id field of a document to the desired value before you insert it into the database.
If you want to retain the _id:ObjectID pattern, you can just create an unique index on your application-specific ID field. You now have at least two indexes which are updated on every insert, but the performance penalty for this should not be very severe unless your application is very write-heavy (which would be unusual in the context of web application).
By the way: The URL example.com/userprofile/5612afeedd2387aeeebbdd211100ccdd is not very search-engine friendly. It would be better to have URLs like example.com/userprofile/IgorP. They are more readable for both search engines and human users.
Using nodejs and crypto, right now, when a user logs in, I generate a random auth token:
var token = crypto.randomBytes(16).toString('hex');
I know it's unlikely, but there is a tiny chance for two tokens to be of the same value.
This means a user could, in theory, authenticate on another account.
Now, I see two obvious methods to get pass this:
When I generate the token, query the user's database and see if a
Token with the same value already exists. If it does, just generate another one. As you can see, this is not perfect since I am adding queries to the database.
Since every user has a unique username in my database, I could
generate a random token using the username as a secret generator key.
This way, there is no way of two tokens having the same value. Can crypto do that? Is it secure?
How would you do it?
It's too unlikely to worry about it happening by chance. I would not sacrifice performance to lock and check the database for it.
Consider this excerpt from Pro Git about the chance of random collisions between 20-byte SHA-1 sums:
Here’s an example to give you an idea of what it would take to get a
SHA-1 collision [by chance]. If all 6.5 billion humans on Earth were programming,
and every second, each one was producing code that was the equivalent
of the entire Linux kernel history (1 million Git objects) and pushing
it into one enormous Git repository, it would take 5 years until that
repository contained enough objects to have a 50% probability of a
single SHA-1 object collision. A higher probability exists [for average projects] that every
member of your programming team will be attacked and killed by wolves
in unrelated incidents on the same night.
(SHA-1 collisions can be directly constructed now, so the quote is now less applicable to SHA-1, but it's still valid when considering collisions of random values.)
If you are still worried about that probability, then you can easily use more random bytes instead of 16.
But regarding your second idea: if you hashed the random ID with the username, then that hash could collide, just like the random ID could. You haven't solved anything.
You should always add a UNIQUE constraint to your database column. This will create an implicit index to improve searches for this column and it will make sure that none of two records will ever has the same value. So, in the worst-case scenario you will get a database exception and not a security violation.
Also, depending on how frequently unique tokens are needed to be created, I think it's perfectly fine in most cases to use database lookups during generation. If your column, again, is properly indexed, it will be a pretty fast query. Most databases a very well horizontally scalable, so if your are building a next Facebook it is again an option. Furthermore, you will probably need to do a query to check for E-Mail uniqueness anyway.
Finally, if you are really concerned about performance you could always pre-generate a one-million of unique tokens and store them in the separate database table for quick use. Just setup a routine to periodically check it's usage and insert more records to it as needed. However, as #MacroMan stated in the comments, this could have a security implications if someone will get access to the list of pre-generated tokens, so this practice should be avoided.
PostgreSQL UNIQUE CONSTRAINT
MySQL: Unique Constraints
Currently, to pass a user id to the server on certain views I use the raw user id.
http://example.com/page/12345 //12345 Being the users id
Although there is no real security risk in my specific application by exposing this data, I can't help but feeling a little dirty about it. What is the proper solution? Should I somehow be disguising the data?
Maybe a better way to propose my question is to ask what the standard approach is. Is it common for applications to use user id's in plain view if it's not a security risk? If it is a security risk how is it handled? I'm just looking for a point in the right direction here.
There's nothing inherently wrong with that. Lots of sites do it. For instance, Stack Overflow users can be enumerated using URLs of the form:
http://stackoverflow.com/users/123456
Using a normalized form of the user's name in the URL, either in conjunction with the ID or as an alternative to it, may be a nicer solution, though, e.g:
http://example.com/user/yourusername
http://example.com/user/12345/yourusername
If you go with the former, you'll need to ensure that the normalized username is set up as a unique key in your user database.
If you go with the latter, you've got a choice: if the normalized username in the database doesn't match the one in the URL, you can either redirect to the correct URL (like Stack Overflow does), or return a 404 error.
In addition to duskwuff's great suggestion to use the username instead of the ID itself, you could use UUIDs instead of integers. They are 128-bit in length so infeasible to enumerate, and also avoid disclosing exactly how many users you have. As an added benefit, your site is future proofed against user id limits if it becomes massively popular.
For example, with integer ids, an attacker could find out the largest user_id on day one, and come back in a week or months time and find what the largest user_id is now. They can continually do this to monitor the rate of growth on your site - perhaps not a biggie for your example - but many organisations consider this sort of information commercially sensitive. Also helps avoid social engineering, e.g. makes it significantly harder for an attacker to email you asking to reset their password "because I've changed email providers and I've forgotten my old password but I remember my user id!". Give an attack an inch and they'll run a mile.
I prefer to use Version/Type 4 (Random) UUIDs, however you could also use Version/Type 5 (SHA-1-based) so you could go UUID.fromName(12345) and get a UUID derived from the integer value, which is useful if you want to migrate existing data and need to update a bunch of foreign key values. Most major languages support UUIDs natively or are included in popular libraries (C & C++), although some database software might require some tweaking - I've used them with postgres and myself and are easy transitions.
The downside is UUIDs are significantly longer and not memorable, but it doesn't sound like you need the ability for the user to type in the URLs manually. You do also need to check if the UUID already exists when creating a user, and if it does, just keep generating until an unused UUID is found - in practice given the size of the numbers, using Version 4 Random UUIDs you will have a better chance at winning the lottery than dealing with a collision, so it's not something that will impact performance etc.
Example URL: http://example.com/page/4586A0F1-2BAD-445F-BFC6-D5667B5A93A9
Should I be generating the id of the documents in a CouchDB or should I depend on CouchDB to generate it? What are the advantages or disadvantages in these approaches? Is there any performance implications on any of these options?
There is no difference as far as CouchDB is concerned. Frederick is right that sequential ids are slightly faster. If you query /_uuids?count=10 you will notice that the UUIDs are sequential (by default).
However, even with random IDs, once you run compaction, they will all be in the "right" order internally in the .couch file and at that point there is no difference. So in the long run, I don't usually worry about it.
The main thing is that you should use mostly sequential ids. As this article and this bit of the couchdb book explain, using random ids results in a much less efficient structure internally, both speed wise and in terms of space used on disc.
Self generated ids are almost impossible to deal with if you have two or more separated instances of your app. Because the synchronisation between the different instances is not instantaneous. A solution for this can be to have one server dedicated to generate (or check the availability of) the ids, for example using a SQL database, and acting as a gate for document creation.
On the other hand, if you have only one server and will never need more, there is one advantage I find interesting to self generated uids: since they have to be unique, you can use them in urls. For instance take the slug of the title of a blog post as the _id.
Performance-wise, the CouchDB's generated ids are pretty long so if your own ids are shorter, you will save significant disk space (assuming you have a looot of documents).
Both answers above tell about PROS of sequential IDs.
Here is a major problem arose by sequential IDs.
Predictability of other IDs in documents using a single ID.
Due to this we can't use sequential IDs in application URLs as identifiers due to other IDs being predictable using one ID, and using as url authentication is also not possible.( As done by file sharing services).
My coworker is insisting that the use of a global modification number to mask query string IDs is a good idea.
public static readonly int ModificationNumber = 9081234;
And Elsewhere:
_addressID = Convert.ToInt32(Request.QueryString["AddressId"]) - ModificationNumber;
I can't seem to get my head around this. If someone wanted to try some url hacking then a modification number makes no difference at all.
Are there other reasons this would make a site more secure?
Furthermore, are there explicit reasons this is bad? In my mind the less globals the better.
IMVHO your colleague is kind of on the right track, but not quite.
A good rule to follow is that you should never expose actual IDs in the query string, as that gives a clue as to the structure of your database, and makes it just that little bit easier for someone to carry out a SQL injection type attack (they can target specific records because they know the ID).
So your colleague is attempting to achieve this, albeit in a very round-about way. Personally I wouldn't do it this way because it will simply be a matter of time before a smart attacker works out what you are doing and then works out what the magic number is. It also doesn't really do anything to prevent a SQL injection attack against specific records, as the generated number may match an existing key anyway. If you are relying on this methodology to avoid SQL attacks then you have deeper issues that need to be addressed.
Edit
Mentioning an alternative is probably a fair thing to do.
As you are using C# and pulling parameters out of the querystring, I will assume you are using ASP.NET. In that case, important IDs can be kept in Session or the Cache. You can store a bunch of items in a custom data object, which you then store in Session (this saves having to keep track of lots of IDs, you just need to know one). ASP.NET manages the web app's Session for you, it is unique to each user, and you can use it to store stuff when you transition from page to page.
If you are manually tracking session or using a database to keep your session related info then you can still serialize the aforementioned data object into the database using a generated GUID as its key, and append that GUID to the query string (there is only an incredibly low chance of success if a user messes with a GUID to try and assume someone else's session, you can lower that chance even further by concatenating two GUIDs as a key, etc.).