401 Unauthorized vs 403 Forbidden: Which is the right status code for when the user has not logged in? [duplicate]

401 Unauthorized vs 403 Forbidden: Which is the right status code for when the user has not logged in? [duplicate] - http-status-code-403

This question already has answers here:
403 Forbidden vs 401 Unauthorized HTTP responses
(21 answers)
Closed 2 years ago.
After lots of Googling and Stackoverflowing, it still isn't clear to me because many articles and questions/answers were too general (including 403 Forbidden vs 401 Unauthorized HTTP responses which was not specifically for my use-case).
Question: What's the proper HTTP Status Code when the user has not logged in and requests to see some pages that should be shown only to logged-in users?

The exact satisfying one-time-for-all answer I found is:
Short answer:
401 Unauthorized
Description:
While we know first is authentication (has the user logged-in or not?) and then we will go into authorization (does he have the needed privilege or not?), but here's the key that makes us mistake:
But isn’t “401 Unauthorized” about authorization, not authentication?
Back when the HTTP spec (RFC 2616) was written, the two words may not
have been as widely understood to be distinct. It’s clear from the
description and other supporting texts that 401 is about
authentication.
From HTTP Status Codes 401 Unauthorized and 403 Forbidden for Authentication and Authorization (and OAuth).
So maybe, if we want to rewrite the standards! focusing enough on each words, we may refer to the following table:
Status Code | Old foggy naming | New clear naming | Use case
+++++++++++ | ++++++++++++++++ | ++++++++++++++++ | ++++++++++++++++++++++++++++++++++
401 | Unauthorized | Unauthenticated | User has not logged-in
403 | Forbidden | Unauthorized | User doesn't have enough privilege

It depends on the mechanism you use to perform the login.
The spec for 403 Forbidden says:
The 403 (Forbidden) status code indicates that the server
understood the request but refuses to authorize it. A server that
wishes to make public why the request has been forbidden can
describe that reason in the response payload (if any).
If authentication credentials were provided in the request, the server considers them insufficient to grant access. The client
SHOULD NOT automatically repeat the request with the same
credentials. The client MAY repeat the request with new or different
credentials. However, a request might be forbidden for reasons
unrelated to the credentials.
While 401 Unauthorized is not defined in the main HTTP status codes spec but is in the HTTP Authentication spec and says:
The 401 (Unauthorized) status code indicates that the request has not
been applied because it lacks valid authentication credentials for
the target resource. The server generating a 401 response MUST send
a WWW-Authenticate header field (Section 4.1) containing at least one
challenge applicable to the target resource.
So if you are using WWW-Authenticate and Authorization headers as your authentication mechanism, use 401. If you are using any other method, then use 403.

IMO It would depend the type of resource you are trying to query. Sounds more logical that way. Forbidden refers more to files or folders of a website, or resources in general, while Unauthorized is more logical to use if some sort of type of execution is required, page scripts etc.

Related

Should we stress which specific HTTP response code to return when there's an error?

I am trying to determine which HTTP status code to return to the rest client under various error conditions. I find this task to be very stressful as reading HTTP status code definition is like reading the constitution, everyone can interpret the same thing differently.
For example, some people say to return 404 Not Found if the requested resource cannot be found, whereas some people say it shouldn't because it means endpoint is not available.
Another example is in this post:
What HTTP response code to use for failed POST request?, it is recommended by the answer to return 422 Unprocessable Entity instead of a generic error 400 Bad Request.
My question is, why not just start simple and return 400 Bad Request for all errors, provide context within response body, and only to include more HTTP status code when there is obvious value?
For example, previously we returned 200 OK when app access token has expired. To help app resolve this issue we provided an internal error ID in the response so client can request a new access token with their refresh token. But we realize that by returning 401 Unauthorized instead client's implementation can be much simpler because of the library that it uses. Now we think there is an obvious value here by adding a new HTTP status code.
So to summarize my question again, is there a need to stress which specific HTTP status code to return? What's wrong with returning 400 in my second example if context is provided in the response body?

I find this task to be very stressful as reading HTTP status code definition is like reading the constitution, everyone can interpret the same thing differently.
The most important thing is to recognize that HTTP status codes are of the transporting documents over a network domain, not your business domain. Remember, the basic idea is that every resource on every web server understands the status codes the same way, and general purpose components (like web browsers) don't need any special knowledge of a specific resource in order to interpret the status codes correctly.
The body of the response is how you communicate resource specific information to the client.
My question is, why not just start simple and return 400 Bad Request for all errors, provide context within response body, and only to include more HTTP status code when there is obvious value?
"Obvious value" is the whole trick, isn't it? Which is to say, yes, you can use 400 Bad Request for all client errors, in much the same way that you can use POST for all requests. But doing that conceals meaning that general purpose components can take advantage of.
Back in the day, 401 Unauthorized was the go to example for why you would want a specific status code -- a browser which had been anonymously submitting requests would know that this particular request needs authorization credentials, and by looking at other meta data in the response could work out how to compose a new request (for instance, by asking the human operator for a username and password, then encoding that information into the appropriate header).
Note the target audience here; we weren't expecting the human to understand what 401 means; we were expecting the general purpose tool to understand what 401 means, and to act appropriately. Your correct use of the meta data in the transport documents over a network domain improves my experience by giving my general purpose client the information it needs to be smart.
Please note in the above the emphasis on information about the transfer of documents. When you are trying to communicate information about problems in your domain, those details do belong in the response body. 403 Forbidden (I understand what you asked, and I'm not willing to do it) shows up quite often when a particular request would violate your domain's protocol.
We don't, after all, expect a general purpose component to have customization specific to our domain.

Allowing CORS to provide error message?

Lets say you have an API that is primarily consumed by browsers from other origins.
Each customer has their own subdomain on the API, so .api.service.com
The service allows the customer to define which origins should be allowed to perform CORS-requests.
When a browser with an allowed origin performs a request, the server responds with the expected Access-Control-Allow-Origin header set to the same value as the Origin request header.
When a browser performs a request from an origin that is NOT allowed, the common way to handle this is to respond to the request with a 403 without specifying the Access-Control-Allow-Origin header, which will cause the browser to trigger an error on the request. The browser does not, however, expose any information that the error was caused by missing CORS-headers (although it logs a helpful error in the console, usually).
This makes it hard to programatically show a helpful "This origin is not allowed, please configure."-message, since there doesn't seem to be a good way to reliably decide whether the error was caused by a wifi glitch, a network error or an invalid/missing CORS-configuration.
My question is; when the server detects an origin that should not be allowed, instead of responding with no CORS-headers, could it respond with a 403 and include CORS-headers to allow the browser to read the error?
Since every request goes through this process on the API, I'm thinking this should be safe, but I might be overlooking something. Thoughts?

understanding basic authentication with a 401

I'm a little confused about Basic authentication in regards to web browsers. I had thought that the web browser would only send an Authorization header after having received an HTTP 401 status in the previous response. However, it appears that Chrome sends the Authorization header with every request thereafter. It has the data that I entered once upon a time in response to a 401 from my website and sends it with every message (according to the developer tools that ship with Chrome and my webserver). Is that expected behavior? Is there some header I should use with my 401 to infer that the Authorization stuff should not be cached? I'm using WWW-Authenticate header currently.

This is the expected behavior of the browser as defined in RFC 2617 (Section 2):
A client SHOULD assume that all paths at or deeper than the depth of
the last symbolic element in the path field of the Request-URI also
are within the protection space specified by the Basic realm value of
the current challenge. A client MAY preemptively send the
corresponding Authorization header with requests for resources in
that space without receipt of another challenge from the server.
Similarly, when a client sends a request to a proxy, it may reuse a
userid and password in the Proxy-Authorization header field without
receiving another challenge from the proxy server. See section 4 for
security considerations associated with Basic authentication.
to my knowledge, Basic HTTP authentication has no ability to perform a logout / re-authentication. This along with the lack of security of HTTP Basic authentication is why most websites now use forms and cookies for auth solutions.

From RFC 2617:
If a prior request has been authorized, the
same credentials MAY be reused for all other requests within that
protection space for a period of time determined by the
authentication scheme, parameters, and/or user preference.
From my experience it is quite common to see browsers automatically sending the Basic credentials for subsequent requests. It prevents having to do an extra round trip for additional resources.

Is it OK to return a HTTP 401 for a non existent resource instead of 404 to prevent information disclosure?

Inspired by a thought while looking at the question "Correct HTTP status code when resource is available but not accessible because of permissions", I will use the same scenario to illustrate my hypothetical question.
Imagine I am building a a carpooling web service.
Suppose the following
GET /api/persons/angela/location
retrieves the current position of user "angela". Only angela herself and a possible driver that is going to pick her should be able to know her location, so if the request is not authenticated to an appropriate user, a 401 Unauthorized response is returned.
Also consider the request
GET /api/persons/john/location
when no user called john has registered with the system. There is no john resource let alone a resource for john's location, so this obviously returns a 404 Not Found. Or does it?
What if I don't want to reveal whether or not john is registered with the system?
(Perhaps the usernames are drawn from a small pool of university logins, and there is a very militant cycling group on campus that takes a very dim view of car usage, even if you are pooling? They could make requests to the URL for every user, and if they receive a 401 instead of 404, infer that the user is a car pooler)
Does it make sense to return a 401 Unauthorized for this request, even though the resource does not exist and there is no possible set of credentials that could be supplied in a request to have the server return a 200?

Actually, the W3C recommends (RFC 2616 §10.4.4 403 Forbidden) doing the opposite. If someone attempts to access a resource, but is not properly authenticated, return 404 then, rather than 403 (Forbidden). This still solves the information disclosure issue.
If the server does not wish to make
this information available to the
client, the status code 404 (Not
Found) can be used instead.
Thus, you would never return 403 (or 401). However, I think your solution is also reasonable.
EDIT: I think Gabe's on the right track. You would have to reconsider part of the design, but why not:
Not found - 404
User-specific insufficient permission - 404
General insufficient permission (no one can access) - 403
Not logged in - 401

If usernames are sensitive information, then don't put them directly in the URI. If you use hypermedia within your representations then you can make it just as easy for an authorized client applications to navigate your api without leaking information in your URLs.
Hackable urls are great for information that you want everyone to be able to access easily. However, for a RESTful client, there is no problem using URIs that are completely opaque.
Once you have removed the direct correlation between the user and the URI, it becomes difficult to infer any information from a 401 response code.

Return 401 Unauthorized in any case in which the user is not allowed to see a particular page, whether it exists or not.
From RFC 2616: "If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials."
Consider HTTP servers that use separate lists of credentials for authentication to different URLs. Obviously, a server should not check every list when a URL is requested, so if the credentials are not in that one applicable list, because HTTP requests are completely independent of each other, it makes sense to return 401 Unauthorized if the credentials are not valid for that particular URL.
Furthermore, the description of 403 Forbidden includes: "Authorization will not help and the request SHOULD NOT be repeated." In contrast, if the user chooses to log in using the correct credentials, Authorization will help.

I think the best solution would be to return 403 (forbidden) for every (potential) page in a class, if the user is not authenticated to see any of them. If the user is, return 404 for stuff that's not there and 200 for stuff that is.

I think it's fine if you want to return a 401 Unauthorized if the request is made by a client that is not a user. However, if a user makes the request and is authenticated, then I don't think that a 401 is the best solution. If you feel that returning a 404 would compromise the security of some users, then you may want to consider returning a 403 Forbidden or perhaps a 200 OK, but just don't specify a location. If I query for user bob and get a response and query for user sam and get an error response, be it 401, 403, 404, etc, then I can probably come to the conclusion that it means that user sam doesn't exist.
200 OK with no location specified may be the most disguised solution.
Edit: Just to illustrate what I am proposing. Return a 401 if the client isn't authorized. Otherwise, always return a 200 OK.
<user-location for="bob">
<location>geo-coordinates here</location>
</user-location>
<user-location for="sam">
<location/>
</user-location>
This doesn't really indicate if sam exists or not, or perhaps there just isn't any location data for him currently.

Return "correct" error code, or protect privacy?

OK, probably best to give an example here of what I mean.
Imagine a web based forum system, where the user authentication is done by some external method, which the system is aware of.
Now, say for example, a user enters the URL for a thread that they do not have access to. For this should I return a 403 (Forbidden), letting the user know that they should try another authentication method, or a 404, not letting them know that there is something there to access.
Assuming I return a 403, should I also return a 403 when they access a URL for a topic that doesn't exist yet?
Edit: the example above was more of an example that something IRL.
Another Example, say I expose something like
/adminnotes/user
if there are Administrator notes about the user. Now, returning a 403 would let the user know that there is something there being said about them. A 404 would say nothing.
But, if I were to return a 403 - I could return it for adminnotes/* - which would resolve that issue.
Edit 2: Another example. Soft deleted Questions here return a 404. Yet, with the right authentication and access, you can still see them (I'd presume)

Above everything else, comply with HTTP spec. Returning 403 in place of 404 is not a good thing. Returning 404 in place of 403 probably is ok (or not a big blunder), but I would just let the software tell the truth. If user only knows the ID of a topic, it's not much anyway. And he could try timing attacks to determine whether this topic exists.

I would go for a 307 redirect to NoSuchPageOrNoPermissions.html where you nicely tell the user they either mistyped the url or don't have permissions.
This will not break compliance and not send out the incorrect message.
If you are very paranoid you could put in a random wait before returning the redirect so time analysis would be harder.
As for all the people here asking why protect directories try these examples
1. User Name
Imagine we are an ISP we give each user a webpage at www.isp.example/home/USERNAME and email address of USERNAME#isp.example. If an attacker does a dictionary attack sending requests to www.isp.example/home/[Random] and can tell if that is a valid user name we now can generate a list of valid email address to sell to bad people.
2. What Folder
Bob is running for office he has an account with the poster and uses his site to store personal information. But he has secured it by making it private folder his public pages are at:
www.example.com/Bob and his secret folder is www.example.com/Bob/IceCream he has marked this as private so any one requesting gets 403. however www.example.com/Bob/Cake returns a 404 as Bobs secret is icecream not cake.
Alice the reporter does a dictionary attack on Bobs site trying
www.example.com/Bob/Cake - 404
www.example.com/Bob/Donuts - 404
www.example.com/Bob/Lollies - 404
www.example.com/Bob/IceCream - 403
Now Alice knows Bobs secrets and can discredit him as an ice cream eater.

I think you should send 307 (Temporary Redirect) for requests for "/adminnotes/user" to redirect unprivileged clients to "/adminnotes/". So the client makes a request for "/adminnotes/", therefore you can send back 403, because it is forbidden.
This way your application stays HTTP compliant, and unprivileged users won't learn much about protected data.

What "privacy" is protected by hiding from users the existence of a particular thread?
I'd say that returning either 403 or 404 on a thread they cannot access is OK. Returning 403 on a thread that does not exist is a bad idea.

No website in the world does what you are suggesting, so by this example we see that it is probably best to follow the standard and return 404 when the resource does not exist and 403 when it is forbidden.

I don't see why you worry about privacy issue from the URL. In the case of stackoverflow, you can put any text after the QuestionID number. For example, Return "correct" error code, or protect privacy? still comes back to this question.

Don't forget that a 404 can also technically be revealing information. For example, you could tell who didn't have adminnotes. Depending on the circumstances, this could be just as bad as indicating that the resource did exist.
In my opinion, errors should not lie. If you give a 404, it should always be the case that the resource does not exist.
If you're dealing with sensitive information, then you can always say that the user doesn't have permission for the resource. This doesn't necessarily require that the resource exists. A client may not have permission to even know if the resource exists. Therefore you would need to provide a permission denied error for any combination of /adminnotes/.
That said, the official spec seems to disagree, here's what the official rfc says about the errors at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html:
10.4.4 403 Forbidden
The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.
10.4.5 404 Not Found
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
I'm no expert, but I think it's crappy to give a "not found", when a resource may exist. I'd prefer a "forbidden", without a guarantee that the resource exists, implying that you would need to authenticate somehow in order to find out.

Lets say you did return a "page not found" error when you detect that the user does not have the correct access rights. A malicious person with the intent of hacking will soon figure out that you would return this in place of the access denied.
But the real users who mistype a url or use a wrong login etc would be confused and it would take no end of explanations and release notes to explain your position to the customers, TAC etc. In exchange for what ?
The intention is good, but i'm afraid this policy you propose might not work out the way you wanted it to.

My suggestion is to:
If Not Exists_Thread then return 404
If Not User_Can_Access_to_this_Thread then return 403

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string