Should we stress which specific HTTP response code to return when there's an error? - node.js

I am trying to determine which HTTP status code to return to the rest client under various error conditions. I find this task to be very stressful as reading HTTP status code definition is like reading the constitution, everyone can interpret the same thing differently.
For example, some people say to return 404 Not Found if the requested resource cannot be found, whereas some people say it shouldn't because it means endpoint is not available.
Another example is in this post:
What HTTP response code to use for failed POST request?, it is recommended by the answer to return 422 Unprocessable Entity instead of a generic error 400 Bad Request.
My question is, why not just start simple and return 400 Bad Request for all errors, provide context within response body, and only to include more HTTP status code when there is obvious value?
For example, previously we returned 200 OK when app access token has expired. To help app resolve this issue we provided an internal error ID in the response so client can request a new access token with their refresh token. But we realize that by returning 401 Unauthorized instead client's implementation can be much simpler because of the library that it uses. Now we think there is an obvious value here by adding a new HTTP status code.
So to summarize my question again, is there a need to stress which specific HTTP status code to return? What's wrong with returning 400 in my second example if context is provided in the response body?

I find this task to be very stressful as reading HTTP status code definition is like reading the constitution, everyone can interpret the same thing differently.
The most important thing is to recognize that HTTP status codes are of the transporting documents over a network domain, not your business domain. Remember, the basic idea is that every resource on every web server understands the status codes the same way, and general purpose components (like web browsers) don't need any special knowledge of a specific resource in order to interpret the status codes correctly.
The body of the response is how you communicate resource specific information to the client.
My question is, why not just start simple and return 400 Bad Request for all errors, provide context within response body, and only to include more HTTP status code when there is obvious value?
"Obvious value" is the whole trick, isn't it? Which is to say, yes, you can use 400 Bad Request for all client errors, in much the same way that you can use POST for all requests. But doing that conceals meaning that general purpose components can take advantage of.
Back in the day, 401 Unauthorized was the go to example for why you would want a specific status code -- a browser which had been anonymously submitting requests would know that this particular request needs authorization credentials, and by looking at other meta data in the response could work out how to compose a new request (for instance, by asking the human operator for a username and password, then encoding that information into the appropriate header).
Note the target audience here; we weren't expecting the human to understand what 401 means; we were expecting the general purpose tool to understand what 401 means, and to act appropriately. Your correct use of the meta data in the transport documents over a network domain improves my experience by giving my general purpose client the information it needs to be smart.
Please note in the above the emphasis on information about the transfer of documents. When you are trying to communicate information about problems in your domain, those details do belong in the response body. 403 Forbidden (I understand what you asked, and I'm not willing to do it) shows up quite often when a particular request would violate your domain's protocol.
We don't, after all, expect a general purpose component to have customization specific to our domain.

Related

IBM AppScan identified a password parameter that was received in the query string meaning

I am trying to fix the issues in IBM AppScan results and I\m getting the flag:
AppScan identified a password parameter that was received in the query string
with this command showing in the screen
GET /myapp.com/?username=user&password=**CONFIDENTIAL 1** HTTP/1.1
and I’m 100% sure that I'm not sending critical information in query params or even get requests I was thinking the about that the app is sending the request it self and want's me to block it.
Am I right or I'm missing something here?
It's quite common for application vulnerability scanners to misinterpret login forms that use JavaScript to make login requests. I am guessing the HTML form does not explicitly declare the request method as POST. Assuming when a user actually makes a request with a browser, a POST request is made, it's safe to assume that AppScan is generating this request itself.
One more issue to consider, if you make the request to https://myapp.com/?username=user&password=password#123, does that return a session token? This is often considered a vulnerability as well if the server does not reject all GET requests even if a user crafts it manually.

REST with complex permissions over resources

Background
I'm having a trouble with the design and implementation of a REST service which publishes content that some users cannot view (medical information, you know, country's laws), I'm using a ABAC-like/RBAC system to protect them, but what causes me concern is that I may be violating the REST pattern. My services does the following process for each query:
The security middleware reads a token from a session that an app/webpage sends using authorization header or cookies.
ABAC/RBAC Rules are applied to know if user can access the resource.
After authorize the token, my service executes the query and filters the results, hiding content that requesting user cannot see (if needed. POST, PUT and DELETE operations are almost exempt from this step). The filter is done using ABAC/RBAC rules.
An operation report is stored in logs.
I already know that sessions violates REST pattern, but I can replace it using BASIC/DIGEST authorizations. My real question is the following:
Question
Does hiding resources from list/retrieve operations violates REST pattern? As far I know, REST is stateless, so ... What happens if I use some context variables to filter my results (user id)? Am I violating REST? Not at all?
If I do, What are your recommendations? How can I implement this without breaking REST conventions?
First of all, client-side sessions don't violate REST at all. REST says the communication between client and server must be stateless, or in other words, the server should not require any information not available in the request itself to respond it properly. If the client keeps a session and sends all information needed on every request, it's fine.
As to your question, there's nothing wrong with changing the response based on the authenticated user. REST is an architectural style that attempts to apply the successful design decisions behind the web itself to software development. When you log in to Stack Overflow, what you see as your profile is different from what I see, even though we are both using the same URI, right? That's how REST is supposed to work.
I'd recommend returning status codes 401 (Unauthorized) if the user is not authorized to access a resource. And 404 (Not found) if you cannot confirm that the resource even exists.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4
A GET is meant to return a representation of the resource. Nowhere does it say that you must return everything you know about that resource.
Exactly what representation is returned will depend on the request headers. For example of you might return either JSON or XML depending on what the client requested. Extending this line of thinking; it is ok to return different representations of a resource based on the client's authentication without violating REST principals.

Is it OK to return a HTTP 401 for a non existent resource instead of 404 to prevent information disclosure?

Inspired by a thought while looking at the question "Correct HTTP status code when resource is available but not accessible because of permissions", I will use the same scenario to illustrate my hypothetical question.
Imagine I am building a a carpooling web service.
Suppose the following
GET /api/persons/angela/location
retrieves the current position of user "angela". Only angela herself and a possible driver that is going to pick her should be able to know her location, so if the request is not authenticated to an appropriate user, a 401 Unauthorized response is returned.
Also consider the request
GET /api/persons/john/location
when no user called john has registered with the system. There is no john resource let alone a resource for john's location, so this obviously returns a 404 Not Found. Or does it?
What if I don't want to reveal whether or not john is registered with the system?
(Perhaps the usernames are drawn from a small pool of university logins, and there is a very militant cycling group on campus that takes a very dim view of car usage, even if you are pooling? They could make requests to the URL for every user, and if they receive a 401 instead of 404, infer that the user is a car pooler)
Does it make sense to return a 401 Unauthorized for this request, even though the resource does not exist and there is no possible set of credentials that could be supplied in a request to have the server return a 200?
Actually, the W3C recommends (RFC 2616 §10.4.4 403 Forbidden) doing the opposite. If someone attempts to access a resource, but is not properly authenticated, return 404 then, rather than 403 (Forbidden). This still solves the information disclosure issue.
If the server does not wish to make
this information available to the
client, the status code 404 (Not
Found) can be used instead.
Thus, you would never return 403 (or 401). However, I think your solution is also reasonable.
EDIT: I think Gabe's on the right track. You would have to reconsider part of the design, but why not:
Not found - 404
User-specific insufficient permission - 404
General insufficient permission (no one can access) - 403
Not logged in - 401
If usernames are sensitive information, then don't put them directly in the URI. If you use hypermedia within your representations then you can make it just as easy for an authorized client applications to navigate your api without leaking information in your URLs.
Hackable urls are great for information that you want everyone to be able to access easily. However, for a RESTful client, there is no problem using URIs that are completely opaque.
Once you have removed the direct correlation between the user and the URI, it becomes difficult to infer any information from a 401 response code.
Return 401 Unauthorized in any case in which the user is not allowed to see a particular page, whether it exists or not.
From RFC 2616: "If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials."
Consider HTTP servers that use separate lists of credentials for authentication to different URLs. Obviously, a server should not check every list when a URL is requested, so if the credentials are not in that one applicable list, because HTTP requests are completely independent of each other, it makes sense to return 401 Unauthorized if the credentials are not valid for that particular URL.
Furthermore, the description of 403 Forbidden includes: "Authorization will not help and the request SHOULD NOT be repeated." In contrast, if the user chooses to log in using the correct credentials, Authorization will help.
I think the best solution would be to return 403 (forbidden) for every (potential) page in a class, if the user is not authenticated to see any of them. If the user is, return 404 for stuff that's not there and 200 for stuff that is.
I think it's fine if you want to return a 401 Unauthorized if the request is made by a client that is not a user. However, if a user makes the request and is authenticated, then I don't think that a 401 is the best solution. If you feel that returning a 404 would compromise the security of some users, then you may want to consider returning a 403 Forbidden or perhaps a 200 OK, but just don't specify a location. If I query for user bob and get a response and query for user sam and get an error response, be it 401, 403, 404, etc, then I can probably come to the conclusion that it means that user sam doesn't exist.
200 OK with no location specified may be the most disguised solution.
Edit: Just to illustrate what I am proposing. Return a 401 if the client isn't authorized. Otherwise, always return a 200 OK.
<user-location for="bob">
<location>geo-coordinates here</location>
</user-location>
<user-location for="sam">
<location/>
</user-location>
This doesn't really indicate if sam exists or not, or perhaps there just isn't any location data for him currently.

What can POST do, GET can't do? [duplicate]

From what I can gather, there are three categories:
Never use GET and use POST
Never use POST and use GET
It doesn't matter which one you use.
Am I correct in assuming those three cases? If so, what are some examples from each case?
Use POST for destructive actions such as creation (I'm aware of the irony), editing, and deletion, because you can't hit a POST action in the address bar of your browser. Use GET when it's safe to allow a person to call an action. So a URL like:
http://myblog.org/admin/posts/delete/357
Should bring you to a confirmation page, rather than simply deleting the item. It's far easier to avoid accidents this way.
POST is also more secure than GET, because you aren't sticking information into a URL. And so using GET as the method for an HTML form that collects a password or other sensitive information is not the best idea.
One final note: POST can transmit a larger amount of information than GET. 'POST' has no size restrictions for transmitted data, whilst 'GET' is limited to 2048 characters.
In brief
Use GET for safe andidempotent requests
Use POST for neither safe nor idempotent requests
In details
There is a proper place for each. Even if you don't follow RESTful principles, a lot can be gained from learning about REST and how a resource oriented approach works.
A RESTful application will use GETs for operations which are both safe and idempotent.
A safe operation is an operation which does not change the data requested.
An idempotent operation is one in which the result will be the same no matter how many times you request it.
It stands to reason that, as GETs are used for safe operations they are automatically also idempotent. Typically a GET is used for retrieving a resource (a question and its associated answers on stack overflow for example) or collection of resources.
A RESTful app will use PUTs for operations which are not safe but idempotent.
I know the question was about GET and POST, but I'll return to POST in a second.
Typically a PUT is used for editing a resource (editing a question or an answer on stack overflow for example).
A POST would be used for any operation which is neither safe or idempotent.
Typically a POST would be used to create a new resource for example creating a NEW SO question (though in some designs a PUT would be used for this also).
If you run the POST twice you would end up creating TWO new questions.
There's also a DELETE operation, but I'm guessing I can leave that there :)
Discussion
In practical terms modern web browsers typically only support GET and POST reliably (you can perform all of these operations via javascript calls, but in terms of entering data in forms and pressing submit you've generally got the two options). In a RESTful application the POST will often be overriden to provide the PUT and DELETE calls also.
But, even if you are not following RESTful principles, it can be useful to think in terms of using GET for retrieving / viewing information and POST for creating / editing information.
You should never use GET for an operation which alters data. If a search engine crawls a link to your evil op, or the client bookmarks it could spell big trouble.
Use GET if you don't mind the request being repeated (That is it doesn't change state).
Use POST if the operation does change the system's state.
Short Version
GET: Usually used for submitted search requests, or any request where you want the user to be able to pull up the exact page again.
Advantages of GET:
URLs can be bookmarked safely.
Pages can be reloaded safely.
Disadvantages of GET:
Variables are passed through url as name-value pairs. (Security risk)
Limited number of variables that can be passed. (Based upon browser. For example, Internet Explorer is limited to 2,048 characters.)
POST: Used for higher security requests where data may be used to alter a database, or a page that you don't want someone to bookmark.
Advantages of POST:
Name-value pairs are not displayed in url. (Security += 1)
Unlimited number of name-value pairs can be passed via POST. Reference.
Disadvantages of POST:
Page that used POST data cannot be bookmark. (If you so desired.)
Longer Version
Directly from the Hypertext Transfer Protocol -- HTTP/1.1:
9.3 GET
The GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process.
The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
The semantics of the GET method change to a "partial GET" if the request message includes a Range header field. A partial GET requests that only part of the entity be transferred, as described in section 14.35. The partial GET method is intended to reduce unnecessary network usage by allowing partially-retrieved entities to be completed without transferring data already held by the client.
The response to a GET request is cacheable if and only if it meets the requirements for HTTP caching described in section 13.
See section 15.1.3 for security considerations when used for forms.
9.5 POST
The POST method is used to request that the origin server accept the
entity enclosed in the request as a new subordinate of the resource
identified by the Request-URI in the Request-Line. POST is designed
to allow a uniform method to cover the following functions:
Annotation of existing resources;
Posting a message to a bulletin board, newsgroup, mailing list,
or similar group of articles;
Providing a block of data, such as the result of submitting a
form, to a data-handling process;
Extending a database through an append operation.
The actual function performed by the POST method is determined by the
server and is usually dependent on the Request-URI. The posted entity
is subordinate to that URI in the same way that a file is subordinate
to a directory containing it, a news article is subordinate to a
newsgroup to which it is posted, or a record is subordinate to a
database.
The action performed by the POST method might not result in a
resource that can be identified by a URI. In this case, either 200
(OK) or 204 (No Content) is the appropriate response status,
depending on whether or not the response includes an entity that
describes the result.
The first important thing is the meaning of GET versus POST :
GET should be used to... get... some information from the server,
while POST should be used to send some information to the server.
After that, a couple of things that can be noted :
Using GET, your users can use the "back" button in their browser, and they can bookmark pages
There is a limit in the size of the parameters you can pass as GET (2KB for some versions of Internet Explorer, if I'm not mistaken) ; the limit is much more for POST, and generally depends on the server's configuration.
Anyway, I don't think we could "live" without GET : think of how many URLs you are using with parameters in the query string, every day -- without GET, all those wouldn't work ;-)
Apart from the length constraints difference in many web browsers, there is also a semantic difference. GETs are supposed to be "safe" in that they are read-only operations that don't change the server state. POSTs will typically change state and will give warnings on resubmission. Search engines' web crawlers may make GETs but should never make POSTs.
Use GET if you want to read data without changing state, and use POST if you want to update state on the server.
My general rule of thumb is to use Get when you are making requests to the server that aren't going to alter state. Posts are reserved for requests to the server that alter state.
One practical difference is that browsers and webservers have a limit on the number of characters that can exist in a URL. It's different from application to application, but it's certainly possible to hit it if you've got textareas in your forms.
Another gotcha with GETs - they get indexed by search engines and other automatic systems. Google once had a product that would pre-fetch links on the page you were viewing, so they'd be faster to load if you clicked those links. It caused major havoc on sites that had links like delete.php?id=1 - people lost their entire sites.
Use GET when you want the URL to reflect the state of the page. This is useful for viewing dynamically generated pages, such as those seen here. A POST should be used in a form to submit data, like when I click the "Post Your Answer" button. It also produces a cleaner URL since it doesn't generate a parameter string after the path.
Because GETs are purely URLs, they can be cached by the web browser and may be better used for things like consistently generated images. (Set an Expiry time)
One example from the gravatar page: http://www.gravatar.com/avatar/4c3be63a4c2f539b013787725dfce802?d=monsterid
GET may yeild marginally better performance, some webservers write POST contents to a temporary file before invoking the handler.
Another thing to consider is the size limit. GETs are capped by the size of the URL, 1024 bytes by the standard, though browsers may support more.
Transferring more data than that should use a POST to get better browser compatibility.
Even less than that limit is a problem, as another poster wrote, anything in the URL could end up in other parts of the brower's UI, like history.
1.3 Quick Checklist for Choosing HTTP GET or POST
Use GET if:
The interaction is more like a question (i.e., it is a safe operation such as a query, read operation, or lookup).
Use POST if:
The interaction is more like an order, or
The interaction changes the state of the resource in a way that the user would perceive (e.g., a subscription to a service), or
The user be held accountable for the results of the interaction.
Source.
There is nothing you can't do per-se. The point is that you're not supposed to modify the server state on an HTTP GET. HTTP proxies assume that since HTTP GET does not modify the state then whether a user invokes HTTP GET one time or 1000 times makes no difference. Using this information they assume it is safe to return a cached version of the first HTTP GET. If you break the HTTP specification you risk breaking HTTP client and proxies in the wild. Don't do it :)
This traverses into the concept of REST and how the web was kinda intended on being used. There is an excellent podcast on Software Engineering radio that gives an in depth talk about the use of Get and Post.
Get is used to pull data from the server, where an update action shouldn't be needed. The idea being is that you should be able to use the same GET request over and over and have the same information returned. The URL has the get information in the query string, because it was meant to be able to be easily sent to other systems and people like a address on where to find something.
Post is supposed to be used (at least by the REST architecture which the web is kinda based on) for pushing information to the server/telling the server to perform an action. Examples like: Update this data, Create this record.
i dont see a problem using get though, i use it for simple things where it makes sense to keep things on the query string.
Using it to update state - like a GET of delete.php?id=5 to delete a page - is very risky. People found that out when Google's web accelerator started prefetching URLs on pages - it hit all the 'delete' links and wiped out peoples' data. Same thing can happen with search engine spiders.
POST can move large data while GET cannot.
But generally it's not about a shortcomming of GET, rather a convention if you want your website/webapp to be behaving nicely.
Have a look at http://www.w3.org/2001/tag/doc/whenToUseGet.html
From RFC 2616:
9.3 GET
The GET method means retrieve whatever information (in the form of
an entity) is identified by the
Request-URI. If the Request-URI refers
to a data-producing process, it is the
produced data which shall be returned
as the entity in the response and not
the source text of the process, unless
that text happens to be the output of
the process.
9.5 POST The POST method is used to request that the origin server
accept the entity enclosed in the
request as a new subordinate of the
resource identified by the Request-URI
in the Request-Line. POST is designed
to allow a uniform method to cover the
following functions:
Annotation of existing resources;
Posting a message to a bulletin board, newsgroup, mailing list, or
similar group of articles;
Providing a block of data, such as the result of submitting a form, to a
data-handling process;
Extending a database through an append operation.
The actual function performed by the
POST method is determined by the
server and is usually dependent on the
Request-URI. The posted entity is
subordinate to that URI in the same
way that a file is subordinate to a
directory containing it, a news
article is subordinate to a newsgroup
to which it is posted, or a record is
subordinate to a database.
The action performed by the POST
method might not result in a resource
that can be identified by a URI. In
this case, either 200 (OK) or 204 (No
Content) is the appropriate response
status, depending on whether or not
the response includes an entity that
describes the result.
I use POST when I don't want people to see the QueryString or when the QueryString gets large. Also, POST is needed for file uploads.
I don't see a problem using GET though, I use it for simple things where it makes sense to keep things on the QueryString.
Using GET will allow linking to a particular page possible too where POST would not work.
The original intent was that GET was used for getting data back and POST was to be anything. The rule of thumb that I use is that if I'm sending anything back to the server, I use POST. If I'm just calling an URL to get back data, I use GET.
Read the article about HTTP in the Wikipedia. It will explain what the protocol is and what it does:
GET
Requests a representation of the specified resource. Note that GET should not be used for operations that cause side-effects, such as using it for taking actions in web applications. One reason for this is that GET may be used arbitrarily by robots or crawlers, which should not need to consider the side effects that a request should cause.
and
POST
Submits data to be processed (e.g., from an HTML form) to the identified resource. The data is included in the body of the request. This may result in the creation of a new resource or the updates of existing resources or both.
The W3C has a document named URIs, Addressability, and the use of HTTP GET and POST that explains when to use what. Citing
1.3 Quick Checklist for Choosing HTTP GET or POST
Use GET if:
The interaction is more like a question (i.e., it is a
safe operation such as a query, read operation, or lookup).
and
Use POST if:
The interaction is more like an order, or
The interaction changes the state of the resource in a way that the user would perceive (e.g., a subscription to a service), or
o The user be held accountable for the results of the interaction.
However, before the final decision to use HTTP GET or POST, please also consider considerations for sensitive data and practical considerations.
A practial example would be whenever you submit an HTML form. You specify either post or get for the form action. PHP will populate $_GET and $_POST accordingly.
In PHP, POST data limit is usually set by your php.ini. GET is limited by server/browser settings I believe - usually around 255 bytes.
From w3schools.com:
What is HTTP?
The Hypertext Transfer Protocol (HTTP) is designed to enable
communications between clients and servers.
HTTP works as a request-response protocol between a client and server.
A web browser may be the client, and an application on a computer that
hosts a web site may be the server.
Example: A client (browser) submits an HTTP request to the server;
then the server returns a response to the client. The response
contains status information about the request and may also contain the
requested content.
Two HTTP Request Methods: GET and POST
Two commonly used methods for a request-response between a client and
server are: GET and POST.
GET – Requests data from a specified resource POST – Submits data to
be processed to a specified resource
Here we distinguish the major differences:
Well one major thing is anything you submit over GET is going to be exposed via the URL. Secondly as Ceejayoz says, there is a limit on characters for a URL.
Another difference is that POST generally requires two HTTP operations, whereas GET only requires one.
Edit: I should clarify--for common programming patterns. Generally responding to a POST with a straight up HTML web page is a questionable design for a variety of reasons, one of which is the annoying "you must resubmit this form, do you wish to do so?" on pressing the back button.
As answered by others, there's a limit on url size with get, and files can be submitted with post only.
I'd like to add that one can add things to a database with a get and perform actions with a post. When a script receives a post or a get, it can do whatever the author wants it to do. I believe the lack of understanding comes from the wording the book chose or how you read it.
A script author should use posts to change the database and use get only for retrieval of information.
Scripting languages provided many means with which to access the request. For example, PHP allows the use of $_REQUEST to retrieve either a post or a get. One should avoid this in favor of the more specific $_GET or $_POST.
In web programming, there's a lot more room for interpretation. There's what one should and what one can do, but which one is better is often up for debate. Luckily, in this case, there is no ambiguity. You should use posts to change data, and you should use get to retrieve information.
HTTP Post data doesn't have a specified limit on the amount of data, where as different browsers have different limits for GET's. The RFC 2068 states:
Servers should be cautious about
depending on URI lengths above 255
bytes, because some older client or
proxy implementations may not properly
support these lengths
Specifically you should the right HTTP constructs for what they're used for. HTTP GET's shouldn't have side-effects and can be safely refreshed and stored by HTTP Proxies, etc.
HTTP POST's are used when you want to submit data against a url resource.
A typical example for using HTTP GET is on a Search, i.e. Search?Query=my+query
A typical example for using a HTTP POST is submitting feedback to an online form.
Gorgapor, mod_rewrite still often utilizes GET. It just allows to translate a friendlier URL into a URL with a GET query string.
Simple version of POST GET PUT DELETE
use GET - when you want to get any resource like List of data based on any Id or Name
use POST - when you want to send any data to server. keep in mind POST is heavy weight operation because for updation we should use PUT instead of POST
internally POST will create new resource
use PUT - when you

Return "correct" error code, or protect privacy?

OK, probably best to give an example here of what I mean.
Imagine a web based forum system, where the user authentication is done by some external method, which the system is aware of.
Now, say for example, a user enters the URL for a thread that they do not have access to. For this should I return a 403 (Forbidden), letting the user know that they should try another authentication method, or a 404, not letting them know that there is something there to access.
Assuming I return a 403, should I also return a 403 when they access a URL for a topic that doesn't exist yet?
Edit: the example above was more of an example that something IRL.
Another Example, say I expose something like
/adminnotes/user
if there are Administrator notes about the user. Now, returning a 403 would let the user know that there is something there being said about them. A 404 would say nothing.
But, if I were to return a 403 - I could return it for adminnotes/* - which would resolve that issue.
Edit 2: Another example. Soft deleted Questions here return a 404. Yet, with the right authentication and access, you can still see them (I'd presume)
Above everything else, comply with HTTP spec. Returning 403 in place of 404 is not a good thing. Returning 404 in place of 403 probably is ok (or not a big blunder), but I would just let the software tell the truth. If user only knows the ID of a topic, it's not much anyway. And he could try timing attacks to determine whether this topic exists.
I would go for a 307 redirect to NoSuchPageOrNoPermissions.html where you nicely tell the user they either mistyped the url or don't have permissions.
This will not break compliance and not send out the incorrect message.
If you are very paranoid you could put in a random wait before returning the redirect so time analysis would be harder.
As for all the people here asking why protect directories try these examples
1. User Name
Imagine we are an ISP we give each user a webpage at www.isp.example/home/USERNAME and email address of USERNAME#isp.example. If an attacker does a dictionary attack sending requests to www.isp.example/home/[Random] and can tell if that is a valid user name we now can generate a list of valid email address to sell to bad people.
2. What Folder
Bob is running for office he has an account with the poster and uses his site to store personal information. But he has secured it by making it private folder his public pages are at:
www.example.com/Bob and his secret folder is www.example.com/Bob/IceCream he has marked this as private so any one requesting gets 403. however www.example.com/Bob/Cake returns a 404 as Bobs secret is icecream not cake.
Alice the reporter does a dictionary attack on Bobs site trying
www.example.com/Bob/Cake - 404
www.example.com/Bob/Donuts - 404
www.example.com/Bob/Lollies - 404
www.example.com/Bob/IceCream - 403
Now Alice knows Bobs secrets and can discredit him as an ice cream eater.
I think you should send 307 (Temporary Redirect) for requests for "/adminnotes/user" to redirect unprivileged clients to "/adminnotes/". So the client makes a request for "/adminnotes/", therefore you can send back 403, because it is forbidden.
This way your application stays HTTP compliant, and unprivileged users won't learn much about protected data.
What "privacy" is protected by hiding from users the existence of a particular thread?
I'd say that returning either 403 or 404 on a thread they cannot access is OK. Returning 403 on a thread that does not exist is a bad idea.
No website in the world does what you are suggesting, so by this example we see that it is probably best to follow the standard and return 404 when the resource does not exist and 403 when it is forbidden.
I don't see why you worry about privacy issue from the URL. In the case of stackoverflow, you can put any text after the QuestionID number. For example, Return "correct" error code, or protect privacy? still comes back to this question.
Don't forget that a 404 can also technically be revealing information. For example, you could tell who didn't have adminnotes. Depending on the circumstances, this could be just as bad as indicating that the resource did exist.
In my opinion, errors should not lie. If you give a 404, it should always be the case that the resource does not exist.
If you're dealing with sensitive information, then you can always say that the user doesn't have permission for the resource. This doesn't necessarily require that the resource exists. A client may not have permission to even know if the resource exists. Therefore you would need to provide a permission denied error for any combination of /adminnotes/.
That said, the official spec seems to disagree, here's what the official rfc says about the errors at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html:
10.4.4 403 Forbidden
The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.
10.4.5 404 Not Found
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
I'm no expert, but I think it's crappy to give a "not found", when a resource may exist. I'd prefer a "forbidden", without a guarantee that the resource exists, implying that you would need to authenticate somehow in order to find out.
Lets say you did return a "page not found" error when you detect that the user does not have the correct access rights. A malicious person with the intent of hacking will soon figure out that you would return this in place of the access denied.
But the real users who mistype a url or use a wrong login etc would be confused and it would take no end of explanations and release notes to explain your position to the customers, TAC etc. In exchange for what ?
The intention is good, but i'm afraid this policy you propose might not work out the way you wanted it to.
My suggestion is to:
If Not Exists_Thread then return 404
If Not User_Can_Access_to_this_Thread then return 403

Resources