IRIs: Shall http or https be used? - protocols

While reading the JSON-LD specification I noticed that a lot of vocabularies have http as a protocol instead of https (see section 2. Conformance). This seems odd to me. When opening in a browser there is often a http -> https redirect (like for http://www.w3.org/ns/prov# ).
My question: Is there a best practice which protocol shall be used for IRIs? Am I right, that the fact that http occurs so often in 2. Conformance is that those vocabularies are relatively old?!

HTTPS is HTTP over TLS.
In the context of vocabularies and namespaces these URIs are just opaque strings, they will never be visited so whether the scheme is http or https doesn't matter for security. Maybe software will compare those strings if they match to know what type of document it is, but nothing will visit those URIs.
Just use whatever URI is defined in the standards that you follow, use the same exactly. So if the standard says http:// use that, and if it says https:// then use that, don't change them. Know that it has no impact on security.
Older standards use the http protocol, and newer standards may use the https protocol, but it really doesn't matter, they're just opaque strings to match against, not to connect to.

Related

What's special about localhost for browsers?

As far as I know, browsers treat localhost and other domains differently. Like, not everything works on localhost. Do you know what specifically is different? What does "localhost" means in this context? Anything that resolves to 127.0.0.0/8? What about port/scheme?
As far as I know, browsers treat localhost and other domains differently. Like, not everything works on localhost.
Quite the opposite. http://localhost is considered a secure origin by many browsers, so you can do development with many features that would normally be disabled without HTTPS.
See When is a context considered secure?:
A context is considered secure when it meets certain minimum standards
of authentication and confidentiality defined in the Secure Contexts
specification. A particular document is considered to be in a secure
context when it is the active document of a top-level browsing context
(basically, a containing window or tab) that is a secure context.
For example, even for a document delivered over TLS within an
, its context is not considered secure if it has an ancestor
that was not also delivered over TLS.
However, it’s important to note that if a non-secure context causes a
new window to be created (with or without specifying noopener), then
the fact that the opener was insecure has no effect on whether the new
window is considered secure. That’s because the determination of
whether or not a particular document is in a secure context is based
only on considering it within the top-level browsing context with
which it is associated — and not whether a non-secure context happened
to be used to create it.
Locally-delivered resources such as those with http://127.0.0.1 URLs,
http://localhost and http://*.localhost URLs (e.g.
http://dev.whatever.localhost/), and file:// URLs are also considered
to have been delivered securely.
Note: Firefox 84 and later support http://localhost and
http://*.localhost URLs as trustworthy origins (earlier versions did
not, because localhost was not guaranteed to map to a local/loopback
address).
Resources that are not local, to be considered secure, must meet the
following criteria:
must be served over https:// or wss:// URLs
the security properties of the network channel used to deliver the resource must not be considered deprecated
According to the following article the special things about localhost are:
Although it's HTTP, it's generally treated as HTTPS.
You can't set a cookie on localhost that is Secure, or SameSite:none, or has the __Host prefix.
You can't reproduce mixed-content issues.
Browsers don't (Not all browsers) rely on DNS resolvers for localhost and subdomains.
And "localhost" in this context means localhost and subdomains, with or without a custom port.

Could automatically redirecting all HTTP traffic to HTTPS inadvertently encourage a man-in-the-middle attack vector?

If your web server implements HTTPs, it's common practice to 301 redirect all http://* URLs to their https:// equivalents.
However, it occurs to me that this means that the client's original HTTP request (and any data contained in it) remains fully unencrypted, and only the response is encrypted. Does automatically "upgrading" all insecure requests on the server end effectively encourage clients to continue sending data to insecure HTTP endpoints, more or less downgrade attacking myself?
I realize I can't stop a client from insecurely sending any data to any endpoint, but does the practice of automatically redirecting HTTP to HTTPS "condone" the client doing so? Would it be better practice to instead outright reject all HTTP traffic that could contain sensitive data and make it the browser's responsibility to attempt or recommend the upgrade to HTTPS?
This is indeed a known issue, and HTTP Strict Transport Security (HSTS)—released in 2012—aims to solve it. It is an HTTP header field which takes the form:
Strict-Transport-Security: max-age=<seconds> [; includeSubDomains]
HSTS informs the browser via that all connections to a given domain must be "upgraded" to https, even if they were specified as non-secure http`:
The UA MUST replace the URI scheme with "https"
This applies to all future connections to the domain (including following links), for the duration of the max-age specified in the header.
However this does leave open a potential vulnerability on the user's first visit to a domain with HSTS (if the header were stripped by an attacker). Google Chrome, Mozilla Firefox, Internet Explorer and Microsoft Edge attempt to limit this problem by including a "pre-loaded" list of HSTS sites.
So this preloaded list has all popular websites, All you may see in this chromium link, this list is humongous(10M), thereby solving aforementioned problem to a certain extent.

Possible to allow HTTP requests from HTTPS website?

I have installed a (non wildcard) SSL certificate so my website can use HTTPS. When I try to request resources from HTTP urls I get error-message like:
Mixed Content: The page at 'https://example.com/' was loaded over
HTTPS, but requested an insecure stylesheet
'http://resources.example.com/style.css'. This request has been
blocked; the content must be served over HTTPS.
I get that it probably is a bad practice according to all kinds of opinions people might have when it comes to mix http and https, but I only ask for static resources that I don't regard as critical over http.
Tried to google "allow http requests from https with iis" and similar, but can't find a clear answer. Is there a way around this, is it solvable the same way CORS is?
Sorry if the question isn't very smart and if the answer is obvious, but I lack quite some knowledge some when it comes to networking stuff.
stylesheet ... static resources that I don't regard as critical over http.
CSS can include script and script can alter the page, so it is considered critical.
..."allow http requests from https with iis" ...
The decision to deny mixed content is done within the browser. There is no setting which will allow the browser to include mixed content. The behavior on what is considered mixed content differs between browsers and versions, look here for more information from a year ago.
... is it solvable the same way CORS is?
The security model of CORS cares about same origin policy and a server may decide that a specific other side might do a CORS request. But in this case it is the question if the content might be modified in transit by anybody (i.e. man-in-the-middle attack).

Is there any reason not to serve https content on a page served over http?

I currently have image content being served on a domain that is only accessible over https. What is the downside of serving an image with an https path on a page accessed over http? Are there any caching considerations? I'm using an HttpRuntime.Cache object to store the absolute image path, which is retrieved from a database.
I assume there is no benefit to using protocol-relative URLs if the image is only accessible over https?
Is there a compelling reason why I should set up a separate virtual directory to also serve the image content over http?
If the content served over HTTPS within the HTTP page isn't particularly sensitive and could equally be served over HTTP, there is no downside (perhaps some performance issues, not necessarily much, and lack of caching, depending on how your server is configured: you can cache some HTTPS content).
If the content server over HTTPS is sufficiently sensitive to motivate the usage of HTTPS, this is really bad practice.
Checking that HTTPS is used and used correctly is solely the responsibility of the client and its user (this is why automatic redirections from HTTP to HTTPS are only partly useful, for example). Although some of it has to do with the technicalities of certificate verification, a lot of the security offered by HTTPS comes from the fact that the user:
expects to be using HTTPS (otherwise they could easily be downgraded),
is able to verify the validity of the certificate: green/blue bar, corresponding to the host name on which they expect to be.
The first point can be addressed by HTTP Strict Transport Security, from a technical point of view.
The second needs used interaction. If you go to your bank's website, it must not only be a site with a valid certificate, but you should also check that it's indeed the domain name of your bank, for example.
Embedding HTTPS content in an HTTP page defeats this, since the user can't check which site is being used, and that HTTPS is used at all in fact. To some extent, embedding HTTPS content from a third party in an HTTPS page also presents this problem (this is one of the problems with 3-D Secure, which may well be served using HTTPS, but using an iframe doesn't make which site is actually used visible.)

Is is possible to restrict a requesting domain at the application level?

I wonder how some video streaming sites can restrict videos to be played only on certain domains. More generally, how do some websites only respond to requests from certain domains.
I've looked at http://en.wikipedia.org/wiki/List_of_HTTP_header_fields and saw the referrer field that might be used, but I understand that HTTP headers can be spoofed (can they?)
So my question is, can this be done at the application level? By application, I mean, for example, web applications deployed on a server, not a network router's operating system.
Any programming language would work for an answer. I'm just curious how this is done.
If anything's unclear, let me know. Or you can use it as an opportunity to teach me what I need to know to clearly specify the question.
HTTP Headers regarding ip-information are helpful (because only a smaller portion is faked) but is not reliable. Usually web-applications are using web-frameworks, which give you easy access to these.
Some ways to gain source information:
originating ip-address from the ip/tcp network stack itself: Problem with it is that this server-visible address must not match the real-clients address (it could come from company-proxy, anonymous proxy, big ISP... ).
HTTP X-Forwarded-For Header, proxies are supposed to set this header to solve the mentioned problem above, but it also can be faked or many anonymous proxies aren't setting it at all.
apart from ip-source information you also can use machine identifiers (some use the User-Agent Header. Several sites for instance store this machine identifiers and store it inside flash cookies, so they can reidentify a recalling client to block it. But same story: this is unreliable and can be faked.
The source problem is that you need a lot of security-complexity to securely identify a client (e.g. by authentication and client based certificates). But this is high effort and adds a lot of usability problem, so many sites don't do it. Most often this isn't an issue, because only a small portion of clients are putting some brains to fake and access server.
HTTP Referer is a different thing: It shows you from which page a user was coming. It is included by the browser. It is also unreliable, because the content can be corrupted and some clients do not include it at all (I remember several IE browser version skipping Referer).
These type of controls are based on the originating IP address. From the IP address, the country can be determined. Finding out the IP address requires access to low-level protocol information (e.g. from the socket).
The referrer header makes sense when you click a link from one site to another, but a typical HTTP request built with a programming library doesn't need to include this.

Resources