API design and security: Why hide internal ids? - security

I've heard a few people say that you should never expose your internal ids to the outside world (for instance an auto_increment'ng primary key).
Some suggest having some sort of uuid column that you use instead for lookups.
I'm wondering really why this would be suggested and if it's truly important.
Using a uuid instead is basically just obfuscating the id. What's the point? The only thing I can think of is that auto_incrementing integers obviously point out the ordering of my db objects. Does it matter if an outside user knows that one thing was created before/after another?
Or is it purely that obfuscating the ids would prevent "guessing" at different operations on specific objects?
Is this even an issue I should thinking about when designing an external facing API?

Great answers, I'll add another reason to why you don't want to expose your internal auto incremented ID.
As a competitive company I can easily instrument how many new users/orders/etc you get every week/day/hour. I just need to create a user and/or order and subtract the new ID from what I got last time.
So not only for security reasons, it's business reasons as well.

Any information that you provide a malicious user about your application and its layout can and will be used against your application. One of the problems we face in (web) application security is that seemingly innocuous design decisions taken at the infancy of a project become achilles heels when the project scales larger. Letting an attacker make informed guesses about the ordering of entities can come back to haunt you in the following, somewhat unrelated ways:
The ID of the entity will inevitably be passed as a parameter at some point in your application. This will result in hackers eventually being able to feed your application arguments they ordinarily should not have access to. I've personally been able to view order details (on a very popular retailer's site) that I had no business viewing, as a URL argument no less. I simply fed the app sequential numbers from my own legitimate order.
Knowing the limits or at least the progression of primary key field values is invaluable fodder for SQL injection attacks, scope of which I can't cover here.
Key values are used not only in RDBMS systems, but other Key-Value mapping systems. Imagine if the JSESSION_ID cookie order could be predetermined or guessed? Everybody with opposable thumbs will be replaying sessions in web apps.
And many more that I'm sure other ppl here will come up with.
SEAL team 6 doesn't necessarily mean there are 6 seal teams. Just keeps the enemy guessing. And the time spent guessing by a potential attacker is more money in your pocket any way you slice it.

As with many security-related issues, it's a subtle answer - kolossus gives a good overview.
It helps to understand how an attacker might go about compromising your API, and how many security breaches occur.
Most security breaches are caused by bugs or oversights, and attackers look for those. An attacker who is trying to compromise your API will firstly try to collect information about it - as it's an API, presumably you publish detailed usage documentation. An attacker will use this document, and try lots of different ways to make your site crash (and thereby expose more information, if he's lucky), or react in ways you didn't anticipate.
You have to assume the attacker has lots of time, and will script their attack to try every single avenue - like a burglar with infinite time, who goes around your house trying every door and window, with a lock pick that learns from every attempt.
So, if your API exposes a method like getUserInfo(userid), and userID is an integer, the attacker will write a script to iterate from 0 upwards to find out how many users you have. They'll try negative numbers, and max(INT) + 1. Your application could leak information in all those cases, and - if the developer forgot to handle certain errors - may expose more data than you intended.
If your API includes logic to restrict access to certain data - e.g. you're allowed to execute getUserInfo for users in your friend list - the attacker may get lucky with some numbers because of a bug or an oversight, and he'll know that the info he is getting relates to a valid user, so they can build up a model of the way your application is designed. It's the equivalent of a burglar knowing that all your locks come from a single manufacturer, so they only need to bring that lock pick.
By itself, this may be of no advantage to the attacker - but it makes their life a tiny bit easier.
Given the effort of using UUIDs or another meaningless identifier, it's probably worth making things harder for the attacker. It's not the most important consideration, of course - it probably doesn't make the top 5 things you should do to protect your API from attackers - but it helps.

Related

Visible User ID in Address Bar

Currently, to pass a user id to the server on certain views I use the raw user id.
http://example.com/page/12345 //12345 Being the users id
Although there is no real security risk in my specific application by exposing this data, I can't help but feeling a little dirty about it. What is the proper solution? Should I somehow be disguising the data?
Maybe a better way to propose my question is to ask what the standard approach is. Is it common for applications to use user id's in plain view if it's not a security risk? If it is a security risk how is it handled? I'm just looking for a point in the right direction here.
There's nothing inherently wrong with that. Lots of sites do it. For instance, Stack Overflow users can be enumerated using URLs of the form:
http://stackoverflow.com/users/123456
Using a normalized form of the user's name in the URL, either in conjunction with the ID or as an alternative to it, may be a nicer solution, though, e.g:
http://example.com/user/yourusername
http://example.com/user/12345/yourusername
If you go with the former, you'll need to ensure that the normalized username is set up as a unique key in your user database.
If you go with the latter, you've got a choice: if the normalized username in the database doesn't match the one in the URL, you can either redirect to the correct URL (like Stack Overflow does), or return a 404 error.
In addition to duskwuff's great suggestion to use the username instead of the ID itself, you could use UUIDs instead of integers. They are 128-bit in length so infeasible to enumerate, and also avoid disclosing exactly how many users you have. As an added benefit, your site is future proofed against user id limits if it becomes massively popular.
For example, with integer ids, an attacker could find out the largest user_id on day one, and come back in a week or months time and find what the largest user_id is now. They can continually do this to monitor the rate of growth on your site - perhaps not a biggie for your example - but many organisations consider this sort of information commercially sensitive. Also helps avoid social engineering, e.g. makes it significantly harder for an attacker to email you asking to reset their password "because I've changed email providers and I've forgotten my old password but I remember my user id!". Give an attack an inch and they'll run a mile.
I prefer to use Version/Type 4 (Random) UUIDs, however you could also use Version/Type 5 (SHA-1-based) so you could go UUID.fromName(12345) and get a UUID derived from the integer value, which is useful if you want to migrate existing data and need to update a bunch of foreign key values. Most major languages support UUIDs natively or are included in popular libraries (C & C++), although some database software might require some tweaking - I've used them with postgres and myself and are easy transitions.
The downside is UUIDs are significantly longer and not memorable, but it doesn't sound like you need the ability for the user to type in the URLs manually. You do also need to check if the UUID already exists when creating a user, and if it does, just keep generating until an unused UUID is found - in practice given the size of the numbers, using Version 4 Random UUIDs you will have a better chance at winning the lottery than dealing with a collision, so it's not something that will impact performance etc.
Example URL: http://example.com/page/4586A0F1-2BAD-445F-BFC6-D5667B5A93A9

How unpredictable is the auto generated id of couchdb?

Can I rely on the assumption that each generated id is random, and there's no way to user can guess? So that I don't need to do more double check of ownership and what not, on my application.
Generally you will only get random uuids if you configure your instance to use the random algorithm. When looking at the source you can see that it then uses Erlang's crypto:rand_bytes/1. I'm not sure how predictable its results are, but you should note that there would also be crypto:strong_rand_bytes/1 - not sure why it hasn't been used.
I think the decision mostly depends on other circumstances.
What information are you trying to secure?
Why do you want to avoid ownership-checks as an additional security measure?
How do you ensure that the _ids are not leaked somehow (eg through CouchDB's /_log API)?
I wouldn't rely on that, no. A malicious user could make a lot of random guesses and could eventually hit something. Best to let identifiers identify things and security restrictions secure them.

Are MongoDB ids guessable?

If you bind an api call to the object's id, could one simply brute force this api to get all objects? If you think of MySQL, this would be totally possible with incremental integer ids. But what about MongoDB? Are the ids guessable? For example, if you know one id, is it easy to guess other (next, previous) ids?
Thanks!
Update Jan 2019: As mentioned in the comments, the information below is true up until version 3.2. Version 3.4+ changed the spec so that machine ID and process ID were merged into a single random 5 byte value instead. That might make it harder to figure out where a document came from, but it also simplifies the generation and reduces the likelihood of collisions.
Original Answer:
+1 for Sergio's answer, in terms of answering whether they could be guessed or not, they are not hashes, they are predictable, so they can be "brute forced" given enough time. The likelihood depends on how the ObjectIDs were generated and how you go about guessing. To explain, first, read the spec here:
Object ID Spec
Let us then break it down piece by piece:
TimeStamp - completely predictable as long as you have a general idea of when the data was generated
Machine - this is an MD5 hash of one of several options, some of which are more easily determined than others, but highly dependent on the environment
PID - again, not a huge number of values here, and could be sleuthed for data generated from a known source
Increment - if this is a random number rather than an increment (both are allowed), then it is less predictable
To expand a bit on the sources. ObjectIDs can be generated by:
MongoDB itself (but can be migrated, moved, updated)
The driver (on any machine that inserts or updates data)
Your Application (you can manually insert your own ObjectID if you wish)
So, there are things you can do to make them harder to guess individually, but without a lot of forethought and safeguards, for a normal data set, the ranges of valid ObjectIDs should be fairly easy to work out since they are all prefixed with a timestamp (unless you are manipulating this in some way).
Mongo's ObjectId were never meant to be a protection from brute force attack (or any attack, for that matter). They simply offer global uniqueness. You should not assume that some object can't be accessed by a user because this user should not know its id.
For an actual protection of your resources, employ other techniques.
If you defend against an unauthorized access, place some authorization logic in your app (allow access to legitimate users, deny for everyone else).
If you want to hinder dumping all objects, use some kind of rate limiting. Combine with authorization if applicable.
Optional reading: Eric Lippert on GUIDs.

Is a Modification Number to mask an Id in a query string necessary?

My coworker is insisting that the use of a global modification number to mask query string IDs is a good idea.
public static readonly int ModificationNumber = 9081234;
And Elsewhere:
_addressID = Convert.ToInt32(Request.QueryString["AddressId"]) - ModificationNumber;
I can't seem to get my head around this. If someone wanted to try some url hacking then a modification number makes no difference at all.
Are there other reasons this would make a site more secure?
Furthermore, are there explicit reasons this is bad? In my mind the less globals the better.
IMVHO your colleague is kind of on the right track, but not quite.
A good rule to follow is that you should never expose actual IDs in the query string, as that gives a clue as to the structure of your database, and makes it just that little bit easier for someone to carry out a SQL injection type attack (they can target specific records because they know the ID).
So your colleague is attempting to achieve this, albeit in a very round-about way. Personally I wouldn't do it this way because it will simply be a matter of time before a smart attacker works out what you are doing and then works out what the magic number is. It also doesn't really do anything to prevent a SQL injection attack against specific records, as the generated number may match an existing key anyway. If you are relying on this methodology to avoid SQL attacks then you have deeper issues that need to be addressed.
Edit
Mentioning an alternative is probably a fair thing to do.
As you are using C# and pulling parameters out of the querystring, I will assume you are using ASP.NET. In that case, important IDs can be kept in Session or the Cache. You can store a bunch of items in a custom data object, which you then store in Session (this saves having to keep track of lots of IDs, you just need to know one). ASP.NET manages the web app's Session for you, it is unique to each user, and you can use it to store stuff when you transition from page to page.
If you are manually tracking session or using a database to keep your session related info then you can still serialize the aforementioned data object into the database using a generated GUID as its key, and append that GUID to the query string (there is only an incredibly low chance of success if a user messes with a GUID to try and assume someone else's session, you can lower that chance even further by concatenating two GUIDs as a key, etc.).

How to implement/use a secure 'read-once' local file access system?

does anybody know of a secure 'read-once' local file access system? Or how one might create one? I realise that if data is to be used on a system, then it must be capable of being read, but I think it may be possible to severely limit how data is made available and reduce the possibility of it being copied and used elsewhere.
These are my requirements:
I want to store a 'secure/encrypted' data-file on a USB stick (could be read-only CD/DVD, but better if read/write USB or even a floppy) and have this file capable of being read once (and mainly only once), on a decoded block-by-block basis, once a password has been entered. The file content is probably basic text/xml (or text-encoded data) and is to be read mainly as a sequential stream. The data (ideally) can be read by normal windows file-access methods, ie: a std file, FSO objects (stream and text file), all BASIC PC (VB6/VB.NET) file handling methods, even Excel text (import). yes, I know this probably defeats the object (as such a file can then be opened/saved), but I would still want this possibility. Finally, once the 'access' criteria had been met, the device would prevent further access.
Access to the data would be on a local PC system only. No LAN, no device sharing supported. Data on the device should not be copyable by normal means. Data would be written to the device using normal methods if possible or a special application if necessary.
To keep things simple, just one password, one file, one use, and one user would be great, but other possible enhancements include: (as icing on the cake)...
allowing 'n' opens
having multiple passwords 2 or more users, acting individually
silo-passwords, having 2 more users sign together to get access (or even
having at least n from m more users sign together to get access)
Password prompt should be given on first block-access, independent of
application calling the first block
Password could be embedded/automatic
tie the access to a nominated machine/mac/ip/disk serial number (or
other machine-code)
tie the access to a nominated program /application
if possible, delete and securely overwrite the data file
My first guess at doing this suggests that it would need a 'psuedo-device' driver that would appear as an extention to (or replacement of) the std removable-device driver. The driver would handle each file block, sector by sector, and refuse to server further decoded blocks if not authorised. The device should not give normal directory listings, but some some form of content summary may be given to a user (optional).
Unlike a DRM system, I don't want any form of on-line acces/authentication (but would consider it), I would prefer a self-contained system.
I have looked long and hard for a such a device/system, and haven't found one yet. Most devices and system tools (eg: Iomega/ironkey) appear to unlock access to files, but without limit, ie: read-many, once unlocked.
Performance is not an issue. Slow floppy read-rate would be okay. Encyption method is agnostic, anything reasonably strong 40bit+ (128bit) would be fine. I can't tell you what the data is or whats its for, I just need a way to give data to somebody and limit its use as far as possible and what they can do with it. Its a real requirement to protect confidential data and not meant for DRM or MP3s/Videos or similar.
I am an 'office' developer and not really familiar with device-drivers or DRM - Now where would I start with such a project? Is there anything out-there available to joe-public already?
Thanks - Tim.
PS: Update
I should point out that I just wish to pass data between ourselves and a single specific nominated service-provider. I don't want them to copy the data we provide. It will be used once to support a 'singular' one-off process and then be done-with. As the data is 'streamed/read' it should be 'consumed'. if the process fails, we will re-issue the data to the service-provider. the data remains our property, it is not being sold/licensed.
I do realise that no solution will be foolproof, but the risk/reward ratio should dissuade casual attempts to break the system. The data has no explicit commercial value.
PPS: Its a real requirement... What would you do?
Judging by the upvotes on #eriksons thoughtful answer, you guys are saying 'not possible / don't bother' - but apart from personally supervising that the data is used according to our wishes, what would you do?
Executive summary: this isn't a realistic solution. Re-think the process so that "read-once" isn't necessary.
A few companies (Disappearing Inc. comes to mind, and they had at least one competitor) tried to make "self-destructing" email on general-purpose hardware in the late 90s. They spent millions of dot.com dollars to develop systems that didn't really work.
The only potential solution I know of is the use of a Trusted Platform Module. These are fairly common, as they are required in all computers bought by the US government. However, their capabilities vary. You'd need one that supported something called remote attestation, which allows software to perform integrity checks on itself. With this capability, you could write software that would enforce your data destruction policy. However, I don't think this feature is widely used. My laptop has a TPM, but it doesn't support this.
You should also be aware that there is a lot of hostility against "trusted computing," because it can be used to limit the functionality of a machine. This violates the right to do as you please with your property. TPMs might make sense for corporate or government machines, but not for personal computers.
Other aspects of your problem, such as granting multiple users access to the data, requiring multiple users to gain access to the data are easier.
Encrypting data for multiple users is typically achieved by generating a key, encrypting the data with that "content encryption key", then encrypting the key (which is relatively small) with a "key encryption key" (which could be a password) belonging to each intended recipient.
Requiring some number of users to enter a password can be done securely with Shamir Secret Sharing, as I learned here on SO.
Based on the comments on the question, especially the "mailing label printing service" analogy, I'm afraid my initial answer isn't really relevant.
In a case like that, I can only see a legal solution. Disallow storage of your data in the contract. If it's worth suing them for violating the contract, do so.
Cryptographically speaking, the best thing I could think of would be to "watermark" such a "mailing list" with information that would help me prove that a copy of the list was disclosed by a particular vendor. Knowing that a watermark exists might deter any deliberate disclosures, and could help leverage a fast settlement in the case of accidental disclosure. This could use steganographic techniques within records as well as fake records in the collection.
Algorithms for doing this might already exist, but I'm not familiar with the field. Researching "digital watermarks" might be useful. Even if it only turns up algorithms for protected video and audio, perhaps these could be adapted to work with other media.
There are several problems with your approach.
If you can read the data from any application, you can safe the data anywhere. I would think this would defeat the purpose of any 'only-one-access' policy.
To get a device driver to handle your scenario, you would need deep knowledge of file-system-programming, which at least under windows is no easy undertaking. Even then, it would be hard to enforce the one time access prerequisite.
Programs have different file-access strategies, which might break your assumptions. E.g. an application may open a file once to get its size, then close and reopen it, to load its data. How should this be enforced? Do you want to limit 'OpenFile' calls? do you want to limit 'read byte' calls? Do you want to limit ... jumping around in the file?
When your medium gets copied, by whatever means, you have no way of knowing that. The games industry tries to bind the game to the original CD for years, but failed miserably for years.
I think, what would be feasible, would be a container format, with a encoder/decoder, or something like that. (See Bitlocker in Windows7) That would guarantee, that you can only decode the data once to a local disc and would then delete the container on your medium (beware, check first if the medium is writable, and bind the container to an serial-number or name of the medium so that the container cannot be copied).
Another possibility would be a separate USB device, which you can only use once to extract the data from it. Then you would only need to write a driver once in user mode with WinUSB. Encrypted USB-Sticks use this approach.
But I really think this is a bad idea, because you can very easily get around any counter measurement, when the receiving person can read all data from the medium and safe it anywhere else.

Resources