space in url; did browser got smarter or server? - browser

It looks like today you no longer to have to encode spaces by %20 in your html links or image links. For example, suppose you have this image at 〔http://example.com/i/my house.jpg〕. Notice the space there. In your html code, you can just do this:
<img src="http://example.com/i/my house.jpg" alt="my house">
It work in all current version of browsers. Though, what i'm not sure is that whether the browser encodes it before requesting the url, or a particular server will do the right with with paths with space? (apache)
Addendum:
sorry about the confusion. My real question is about HTTP protocol.
I'll leave this one as is and mark Answered.
I posted a new question here.
does HTTP protocol require space be encoded in file path?

The browser makes the correction.
You still have to encode the spaces though. Just because it works in the browsers you use doesn't make it valid, and doesn't mean it will work everywhere.
You can see a list of reserved characters and other characters that should be encoded here: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
RFC1738 specifically states:
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
RFC2396 takes place over RFC1738 and expounds on space usage in URLs:
The space character is excluded because significant spaces may
disappear and insignificant spaces may be introduced when URI are
transcribed or typeset or subjected to the treatment of word-
processing programs. Whitespace is also used to delimit URI in many
contexts.

Related

Web pages served by local IIS showing black diamonds with question marks

I'm having an issue in a .NET application where pages served by local IIS display random characters (mostly black diamonds with white question marks in them). This happens in Chrome, Firefox, and Edge. IE displays the pages correctly for some reason.
The same pages in production and in lower pre-prod environments work in all my browsers. This is strictly a local issue.
Here's what I've tried:
Deleted code and re-cloned (also tried switching branches)
Disabled all browser extensions
Ran in incognito mode
Rebooted (you never know)
Deleted temporary ASP.NET files
Looked for corrupt fonts on machine but didn't find any
Other Information:
Running IIS 10.0.17134.1
.NET MVC application with Knockout
I realize there are several other posts regarding black diamonds with question marks, but none of them seem to address my issue.
Please let me know if you need more information.
Thanks for your help!
You are in luck. The explicit purpose of � is to indicate that character encodings are being misused. When users see that, they'll know that we've messed up and lost some of their text data, and we'll know that, at one or more points, our processing and/or configuration is wrong.
(Fonts are not at issue [unless there as no font available to render �]. When there is no font available for a character, it's usually rendered as a white-filled rectangle.)
Character encoding fundamentals are simple: use a sufficient character set (say Unicode), pick an appropriate encoding (say UTF-8), encode text with it to obtain bytes, tell every program and person that gets the bytes that they represent text and which encoding is used. The encoding might be understood from a standard, convention, or specification.
Your editor does the actual encoding.
If the file is part of a project or similar system, a project file might store the intended encoding for all or each text file in the project. If your editor is an IDE, it should understand how the project does that.
Your compiler needs the know the encoding of each text file you give it. A project system would communicate what it knows.
HTML provides an optional way to communicate the encoding. Example: <meta charset="utf-8">. An HTML-aware editor should not allow this indicator to be different than the encoding it uses when saving the file. An HTML-aware editor might discover this indicator when opening the file and use the specified encoding to read the file.
HTTP uses another optional way: Content-Type response header. The web server emits this either statically or in conjunction with code that it runs, such as ASP.NET.
Web browsers use the HTTP way if given.
XHR (AJAX, etc) uses HTTP along with JavaScript processing. If needed the JavaScript processing should apply the HTTP and HTML rules, as appropriate. Note: If the content is JSON, the current RFC requires the encoding to be UTF-8.
No one or thing should have to guess.
Diagnostics
Which character encoding did you intend to use? This century, UTF-8 is so much the norm that if you choose to use a different one, you should have a good reason and document it (for others and your future self).
Compare the bytes in the file with the text you expect it to represent. Does it use the entended encoding? Use an editor or tool that shows bytes in hex.
As suggested by #snakecharmerb, what does the server send? Use a web browser's F12 network tab.
What does the HTTP response header say, if anything?
What does the HTML meta tag say, if anything?
What is the HTML doctype, if any?

URL rewriting in address bar

I have no idea where to find out about why if I paste this url into the address bar of Chrome, Safari, and Firefox it will be translated into a different URL with a couple of accented characters.
It results in a crypto currency phishing site so beware.
I'm trying to find the logic (javascript) or whatever that causes this translation
The base url is http://www.xn--shapehit-ez9c7y.com
I apologise if this is the wrong site to ask the question on.
This is called PunyCode, which is a way to represent Unicode within the ASCII character set. This allows websites to have names with foreign characters, such as in Chinese or Arabic. While this is incredibly useful, it can also be used for deceptive impersonation (often maliciously, as noted in your question).
Different browsers treat PunyCode differently. For example, Safari and Edge will not attempt to covert PunyCode, and will show the full 'strange' URLs.
However, according to Sophos,
Chrome and Firefox won’t automatically decode punycode URLs if they mix multiple alphabets or languages, on the grounds that such text strings are highly unlikely in real life and therefore suspicious. But both Chrome and Firefox will autoconvert punycode URLs that contain all their characters in the same language.
A security researcher called Xudong Zheng actually registered the domain xn--80ak6aa92e.com, which translates to аррӏе in 'Russian'. When visited in Chrome or Firefox, it looks identical to apple.com in the URL:
Fortunately his site simply warns of this forgery, but it could easily have been used maliciously.
Hope this helps :)

Permalinks: Why are hypens(-) used over underscores(_) to replace spaces(and other unwanted characters)?

In some large websystems I have come across lately, friendly permalinks, e.g. part of a HTML path that is based on a (often user-specified) string rather than a numerical number, spaces(and other unwanted/disallowed characters that would otherwise need to be url-escaped) are replaced by hyphens (-), and not by underscores (_).
An example:
in the URL http://example.com/blog/this-is-my-first-post, this-is-my-first-post is a friendly permalink. Using underscores, this would be http://example.com/blog/this_is_my_first_post
Is this only a personal preference, or is there a technical reason to use hyphens over underscores?
Hypothetical possibilities I thought of:
Maybe it matters for Search Engine Optimalization?
Maybe it is actually important for how HTML paths are interpreted?
Maybe there is a historical reason?
What I do know:
Hyphens are treated as word-breaks in most (if not all?) computer systems/programs, e.g. use ctrl+left/ctrl+right to move in a sentence_that_uses_underscores vs a sentence-that-uses-hyphens.
In normal text that a user enters (e.g. names for objects or blogposts), usage of actual hyphens is higher than underscores.
Could someone shine some light on this?
Google has spoken:
Consider using punctuation in your URLs. The URL http://www.example.com/green-dress.html is much more useful to us than http://www.example.com/greendress.html. We recommend that you use hyphens (-) instead of underscores (_) in your URLs.
https://support.google.com/webmasters/answer/76329?hl=en

Skip $ character from normal text

This is very simple question but I unfortunately didn't get any clue to get rid from this error. I am writing a blog post about jquery ajax where I need to write a symbol $. I am using Mathjax for writing mathematical notation. As I write $ (for example $.getJSON), the Mathjax library decodes this as LaTeX commands. Anybody knows how to skip that $ character so that MathJax library behaves it as normal $?
By default, MathJax does not use the single dollar as a delimiter for in-line math (for exactly the reason you indicate), and so your configuration must be explicitly enabling it. See the text2jax dpcumentation for details of how this is done.
You have several options:
Remove the configuration that turns on dollar signs for in-line math. (This won't be good if you have already used them for in-line math in other posts)
Enable the processEscapes option so that you can use \$ to get a dollar without it being used as a math delimiter
If your blog allows you to enter raw HTML, you could use <span>$</span> to prevent MathJax from using the dollar as a delimiter (math can't contain HTML tags, so this dollar will not match up with another one, so won't be used as a delimiter).
Put your code examples inside <pre> or <code> containers, as MathJax (at least by default) doesn't process math within these. Your configuration may have changed that, however, so check the skipTags setting in your configuration.
Any of these should allow you to do what you need.

Arabic and other Right-to-left slugs ok?

I'm creating a multi-lingual site, where an item has a slug for each of the sites languages.
For Arabic slugs (and I assume any other right-to-left languages) it acts strange when you try to highlight it. The cursor moves opposite while in the RTL text..etc. This isn't a terribly big deal, but it made me think that maybe it's not "normal" or "ok".
I've seen some sites with Arabic slugs, but I've also seen completely Arabic sites that still use English slugs.
Is there a benefit one way or the other? A suggested method? Is doing it the way I am ok? Any suggestions welcome.
I suppose that by "slug" you mean a direct permanent URL to a page. If you don't, you can ignore the rest of this answer :)
Such URLs will work, but avoid them if you can. The fact that it's right-to-left is actually not the worst problem. The worst problem with any non-ASCII URL is that in a lot of contexts it will show like this: https://ar.wikipedia.org/wiki/%D9%86%D9%87%D8%B1_%D9%81%D8%A7%D8%B1%D8%AF%D8%A7%D8%B1 (it's just a link to a random article in the Arabic Wikipedia). You will see a long trail of percent signs and numbers, even if the title is short, because each non-ASCII characters will turn to about six ASCII characters. This gets very ugly when you have to paste it in an email, for example.
I write a blog in Hebrew and I manually change the slug of every post to some ASCII name.

Resources