I was reading about IDN homograph atack and didn't find exactly stated does browsers encode in punycode only domain or rest of the URL is included (path and query). So my question is does one of popular browsers (FF, IE, Chrome, Safari, Opera) encode rest of the URL (IRI to be exact) with punycode ?
Only the domain name part is encoded with punycode. This is due to the restrictions imposed on the allowable characters in a (traditional) domain name. The path part of the URL has no such restrictions, so UTF-8 is often used.
Related
While reading the JSON-LD specification I noticed that a lot of vocabularies have http as a protocol instead of https (see section 2. Conformance). This seems odd to me. When opening in a browser there is often a http -> https redirect (like for http://www.w3.org/ns/prov# ).
My question: Is there a best practice which protocol shall be used for IRIs? Am I right, that the fact that http occurs so often in 2. Conformance is that those vocabularies are relatively old?!
HTTPS is HTTP over TLS.
In the context of vocabularies and namespaces these URIs are just opaque strings, they will never be visited so whether the scheme is http or https doesn't matter for security. Maybe software will compare those strings if they match to know what type of document it is, but nothing will visit those URIs.
Just use whatever URI is defined in the standards that you follow, use the same exactly. So if the standard says http:// use that, and if it says https:// then use that, don't change them. Know that it has no impact on security.
Older standards use the http protocol, and newer standards may use the https protocol, but it really doesn't matter, they're just opaque strings to match against, not to connect to.
I have a domain that contains an Umlaut like that:
https://spaß.de
"ß" may be transliterated to "ss".
Now when I copy and paste this text into Chrome or any other browser, I'm taken to
https://spass.de
However, the 2 domains have different owners, and the owner of "https://spaß.de" will never get any traffic to his site.
Is there any authority that redirects Umlaut domains to transliterated domains, or is the browser responsible for this? I have tried all kinds of browsers, they all show the same behaviour.
Thank you for any insights.
In your case it's the browser. Browsers consider ß ss Ss sS and SS to all have the same semantic meaning when processing web addresses. You can't reliably distinguish between them in a domain name for a web site. If you want to know the details, this process is called Nameprep.
https://en.wikipedia.org/wiki/Nameprep
On the other hand, u and ü are considered to not have the same semantic meaning. A domain name should not have an actual ü in it, but browsers can display, as an example, xn--bcher-kva as bücher. The encoding scheme is called Punycode.
https://en.wikipedia.org/wiki/Punycode
Our site just rolled a new version, and now pages have Unicode in the url. I see that Rails have properly url escaped these UTF-8 characters when rendering the anchor tags.
/regions/%E4%B8%AD%E5%BD%B0%E6%8A%95/
However I still see a lot of traffic with incorrectly encoded urls:
/regions/%A4%A4%B9%FC%A7%EB/
Apparently this is the same address, but encoded in something other than UTF-8, and then url escaped.
Question
I am wondering if there is any old browser, which will take correctly escaped url, unescape it to get UTF-8, encode it in some other encoding, and then url-escape it when requesting the server?
Otherwise I don't know how to explain these traffic.
I have tested in Internet Explorer 6 and 7. I also tested the "Always send URLs as UTF-8" option. None of the combination caused incorrect encoded request.
I am guessing this was by some web crawler which handles the decoding but not encoding.
Is there any length limitation of query string in ios phonegap?
Thanks!!!
Although the specification of the HTTP protocol does not specify any maximum length, practical limits are imposed by web browser and server software.
Microsoft Internet Explorer (Browser)
Microsoft states that the maximum length of a URL in Internet Explorer is 2,083 characters, with no more than 2,048 characters in the path portion of the URL. In my tests, attempts to use URLs longer than this produced a clear error message in Internet Explorer.
Firefox (Browser)
After 65,536 characters, the location bar no longer displays the URL in Windows Firefox 1.5.x. However, longer URLs will work. I stopped testing after 100,000 characters.
Safari (Browser)
At least 80,000 characters will work. I stopped testing after 80,000 characters.
Opera (Browser)
At least 190,000 characters will work. I stopped testing after 190,000 characters. Opera 9 for Windows continued to display a fully editable, copyable and pasteable URL in the location bar even at 190,000 characters.
Apache (Server)
My early attempts to measure the maximum URL length in web browsers bumped into a server URL length limit of approximately 4,000 characters, after which Apache produces a "413 Entity Too Large" error. I used the current up to date Apache build found in Red Hat Enterprise Linux 4. The official Apache documentation only mentions an 8,192-byte limit on an individual field in a request.
Microsoft Internet Information Server
The default limit is 16,384 characters (yes, Microsoft's web server accepts longer URLs than Microsoft's web browser). This is configurable.
Perl HTTP::Daemon (Server)
Up to 8,000 bytes will work. Those constructing web application servers with Perl's HTTP::Daemon module will encounter a 16,384 byte limit on the combined size of all HTTP request headers. This does not include POST-method form data, file uploads, etc., but it does include the URL. In practice this resulted in a 413 error when a URL was significantly longer than 8,000 characters. This limitation can be easily removed. Look for all occurrences of 16x1024 in Daemon.pm and replace them with a larger value. Of course, this does increase your exposure to denial of service attacks.
Recommendations
Extremely long URLs are usually a mistake. URLs over 2,000 characters will not work in the most popular web browser. Don't use them if you intend your site to work for the majority of Internet users.
References : http://www.boutell.com/newfaq/misc/urllength.html
Is it browser dependent? Also, do different web stacks have different limits on how much data they can get from the request?
RFC 2616 (Hypertext Transfer Protocol — HTTP/1.1) states there is no limit to the length of a query string (section 3.2.1). RFC 3986 (Uniform Resource Identifier — URI) also states there is no limit, but indicates the hostname is limited to 255 characters because of DNS limitations (section 2.3.3).
While the specifications do not specify any maximum length, practical limits are imposed by web browser and server software. Based on research which is unfortunately no longer available on its original site (it leads to a shady seeming loan site) but which can still be found at Internet Archive Of Boutell.com:
Microsoft Edge (Browser)
The limit appears to be around 81578 characters. See URL Length limitation of Microsoft Edge
Chrome
It stops displaying the URL after 64k characters, but can serve more than 100k characters. No further testing was done beyond that.
Firefox (Browser)
After 65,536 characters, the location bar no longer displays the URL in Windows Firefox 1.5.x. However, longer URLs will work. No further testing was done after 100,000 characters.
Safari (Browser)
At least 80,000 characters will work. Testing was not tried beyond that.
Opera (Browser)
At least 190,000 characters will work. Stopped testing after 190,000 characters. Opera 9 for Windows continued to display a fully editable,
copyable and pasteable URL in the location bar even at 190,000 characters.
Microsoft Internet Explorer (Browser)
Microsoft states that the maximum length of a URL in Internet Explorer is 2,083 characters, with no more than 2,048 characters in the path portion of the URL. Attempts to use URLs longer than this produced a clear error message in Internet Explorer.
Apache (Server)
Early attempts to measure the maximum URL length in web browsers bumped into a server URL length limit of approximately 4,000 characters, after which Apache produces a "413 Entity Too Large" error. The current up to date Apache build found in Red Hat Enterprise Linux 4 was used. The official Apache documentation only mentions an 8,192-byte limit on an individual field in a request.
Microsoft Internet Information Server (Server)
The default limit is 16,384 characters (yes, Microsoft's web server accepts longer URLs than Microsoft's web browser). This is configurable.
Perl HTTP::Daemon (Server)
Up to 8,000 bytes will work. Those constructing web application servers with Perl's HTTP::Daemon module will encounter a 16,384 byte limit on the combined size of all HTTP request headers. This does not include POST-method form data, file uploads, etc., but it does include the URL. In practice this resulted in a 413 error when a URL was significantly longer than 8,000 characters. This limitation can be easily removed. Look for all occurrences of 16x1024 in Daemon.pm and replace them with a larger value. Of course, this does increase your exposure to denial of service attacks.
Recommended Security and Performance Max: 2048 CHARACTERS
Although officially there is no limit specified by RFC 2616, many security protocols and recommendations state that maxQueryStrings on a server should be set to a maximum character limit of 1024. While the entire URL, including the querystring, should be set to a max of 2048 characters. This is to prevent the Slow HTTP Request DDOS/DOS attack vulnerability on a web server. This typically shows up as a vulnerability on the Qualys Web Application Scanner and other security scanners.
Please see the below example code for Windows IIS Servers with Web.config:
<system.webServer>
<security>
<requestFiltering>
<requestLimits maxQueryString="1024" maxUrl="2048">
<headerLimits>
<add header="Content-type" sizeLimit="100" />
</headerLimits>
</requestLimits>
</requestFiltering>
</security>
</system.webServer>
This would also work on a server level using machine.config.
This is just for windows operating system based servers, I'm not sure if there is a similar issue on apache or other servers.
Note: Limiting query string and URL length may not completely prevent Slow HTTP Requests DDOS attack but it is one step you can take to prevent it.
Adding a reference as requested in the comments:
https://www.raiseupwa.com/writing-tips/what-is-the-limit-of-query-string-in-asp-net/
Different web stacks do support different lengths of http-requests. I know from experience that the early stacks of Safari only supported 4000 characters and thus had difficulty handling ASP.net pages because of the USER-STATE. This is even for POST, so you would have to check the browser and see what the stack limit is. I think that you may reach a limit even on newer browsers. I cannot remember but one of them (IE6, I think) had a limit of 16-bit limit, 32,768 or something.